<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Python Excels</title>
	<atom:link href="http://pythonexcels.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://pythonexcels.com</link>
	<description>Doing cool stuff with Python</description>
	<lastBuildDate>Thu, 02 May 2013 06:33:14 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Concordance 2</title>
		<link>http://pythonexcels.com/concordance-2/</link>
		<comments>http://pythonexcels.com/concordance-2/#comments</comments>
		<pubDate>Thu, 02 May 2013 06:33:14 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/?p=123</guid>
		<description><![CDATA[In the last post, I introduced the concordance and how it can be used to examine, understand, and identify problems content. Just as a reminder, a concordance is a list of words used in a body of work, with their &#8230; <a href="http://pythonexcels.com/concordance-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>In the <a href="http://pythonexcels.com/building-a-concordance-with-python/">last post</a>, I introduced the concordance and how it can be used to examine, understand, and identify problems content. Just as a reminder, a concordance is a list of words used in a body of work, with their immediate contexts [http://en.wikipedia.org/wiki/Concordance]. For example, I can use a concordance to understand the different usages of the word &#8220;reactor&#8221; in a Wikipedia article on the Fukushima Nuclear Disaster:</p>
<pre>&gt; <b>python concordance.py '.{30}(?i)reactor.{30}' fukushima.txt</b>
arch 2011 of the four damaged reactor buildings Date 11 March 2011
es six separate boiling water reactors originally designed by Gener
O). At the time of the quake, Reactor 4 had been de-fueled while 5
the earthquake, the remaining reactors 1-3 shut down automatically
olant water through a nuclear reactor for several days in order to
wn. As the pumps stopped, the reactors overheated due to the normal
 first few days after nuclear reactor shutdown (smaller amounts of
, only prompt flooding of the reactors with seawater could have coo
ause it would ruin the costly reactors permanently. Flooding with s
 the water boiled away in the reactors and the water levels in the
...</pre>
<p>The Python code that generates this information is short and simple. There are two command line arguments: the expression used to search the text and the name of the text file. The Python program reads the input text, then uses the findall method from the re library to generate a list of matches. A simple for loop iterates over the results and prints them. Here is the core of the program:</p>
<pre>import re
import sys
fp = open(sys.argv[2])
txt = fp.read()
fp.close()
for matchstr in re.findall(sys.argv[1],txt):
    print matchstr</pre>
<p>This script is short because most of the processing is happening in the re.findall function when evaluating the regular expression. Using this powerful capability, you can create a wide variety of input expressions to match the content in your text file. Here are just a few examples:</p>
<table>
<tr>
<th>Expression</th>
<th>Meaning</th>
</tr>
<tr>
<td nowrap>day</td>
<td>Match any string containing &#8220;day&#8221;</td>
</tr>
<tr>
<td nowrap>[Dd]ay</td>
<td>Match any string containing &#8220;day&#8221; or &#8220;Day&#8221;</td>
</tr>
<tr>
<td nowrap>(?i)day</td>
<td>Case insensitive match for any string containing &#8220;day&#8221; with any combination of upper- and lower-case characters</td>
</tr>
<tr>
<td nowrap>(?i)\bday\b</td>
<td>Case insensitive match only the word &#8220;day&#8221;, not Sunday, days, etc</td>
</tr>
<tr>
<td nowrap>.{10}day</td>
<td>Match any string containing &#8220;day&#8221; and any 10 characters before it</td>
</tr>
<tr>
<td nowrap>.{10}day.{10}</td>
<td>Match any string containing &#8220;day&#8221; and any 10 characters before and after</td>
</tr>
<tr>
<td nowrap>.(?:the |a )day</td>
<td>Match &#8220;the day&#8221; or &#8220;a day&#8221;, but not &#8220;someday&#8221;</td>
</tr>
<tr>
<td nowrap>on \w+ day</td>
<td>Match &#8220;on __ day&#8221; where &#8220;__&#8221; is any single word</td>
</tr>
<tr>
<td nowrap>on.{,30} reactor</td>
<td>Match &#8220;on&#8221; followed by reactor within 30 characters</td>
</tr>
</table>
<p>At first glance, regular expressions might seem bewildering and incomprehensible. But creating a simple template with examples can help you towards the productive use of regular expressions. Another issue is running a Python program on the command line, which is not a common use model on a Windows computer in 2013. In the next post I&#8217;ll explore how to make this powerful script more user-friendly and accessible.</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/concordance-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a Concordance with Python</title>
		<link>http://pythonexcels.com/building-a-concordance-with-python/</link>
		<comments>http://pythonexcels.com/building-a-concordance-with-python/#comments</comments>
		<pubDate>Tue, 09 Apr 2013 07:12:46 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/?p=109</guid>
		<description><![CDATA[As a Python scripter who is now a Technical Writer, I&#8217;ve found an amazing number of uses of Python in my job. Python is wonderfully suited for all sorts of text processing and manipulation. I heartily recommend that any Technical &#8230; <a href="http://pythonexcels.com/building-a-concordance-with-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>As a Python scripter who is now a Technical Writer, I&#8217;ve found an amazing number of uses of Python in my job. Python is wonderfully suited for all sorts of text processing and manipulation. I heartily recommend that any Technical Writer become familiar with a scripting language like Python to simplify their document production flow, improve the quality of their documentation and increase productivity.</p>
<p>One way that scripted text processing can help your documentation is to identify inconsistencies and style problems in your writing. In my first assignment as a Technical Writer, I was handed a user guide and told to &#8220;clean it up.&#8221; Documents that are passed from writer to writer tend to gather cruft over time and inherit different writing styles from different writers. How do you get a quantitative understanding of the update task, other than reading the document and noting the problems one-by-one? To answer this question, I wrote a concordance generator.</p>
<p>A concordance is a list of words used in a body of work, with their immediate contexts.<br />
[<a href="http://en.wikipedia.org/wiki/Concordance_(publishing)">http://en.wikipedia.org/wiki/Concordance_(publishing)</a>]. Using a concordance, you can view word usage in context and quickly spot inconsistencies. I&#8217;ve found that this technique is much, much more effective than using the word processor&#8217;s search feature to look through the document (although the latest version of Microsoft Word does contain improvements in this area.)</p>
<p>As an example, I&#8217;ll use the text from a random Wikipedia article. I chose the page describing the Fukushima nuclear disaster at <a href="http://en.wikipedia.org/wiki/Fukushima_Daiichi_nuclear_disaster">http://en.wikipedia.org/wiki/Fukushima_Daiichi_nuclear_disaster</a> and saved the text into a file called fukushima.txt. Using my concordance tool, I first examined how the word &#8220;reactor&#8221; was used in the text:</p>
<pre>&gt; <b>python concordance.py '.{30}(?i)reactor.{30}' fukushima.txt</b>
arch 2011 of the four damaged reactor buildings Date 11 March 2011
es six separate boiling water reactors originally designed by Gener
O). At the time of the quake, Reactor 4 had been de-fueled while 5
the earthquake, the remaining reactors 1-3 shut down automatically
olant water through a nuclear reactor for several days in order to
wn. As the pumps stopped, the reactors overheated due to the normal
 first few days after nuclear reactor shutdown (smaller amounts of
, only prompt flooding of the reactors with seawater could have coo
ause it would ruin the costly reactors permanently. Flooding with s
 the water boiled away in the reactors and the water levels in the
...</pre>
<p>The concordance.py file contains the actual program script which we&#8217;ll look at in a later post. The arguments to the program are the match string and a list of files to process. The string is a regular expression that is applied to the input text. If you&#8217;ve never seen a regular expression, it can be a bit intimidating. However, if you stick to a few basic patterns, you&#8217;d be amazed at what you can accomplish. The <code>".{30}"</code> element says &#8220;match any character (.) thirty times.&#8221; This provides context by matching any thirty characters before and any thirty characters after the substring of interest. The <code>"(?i)"</code> element tells the matcher to ignore case sensitivity. The program reads the fukushima.txt file and prints out every match for the expression you provide.</p>
<p>Here&#8217;s a more interesting example. I want to see how the terms &#8220;shutdown&#8221; and &#8220;shut down&#8221; are used in context. In a typical word processor, I would probably search for &#8220;shut&#8221; to make sure I catch both cases. Using concordance.py, I can modify my search expression to capture exactly what I want:</p>
<pre>&gt; <b>python concordance.py '.{30}(?i)shut.?down.{30}' fukushima.txt</b>
ed while 5 and 6 were in cold shutdown for planned maintenance.[8] I
e, the remaining reactors 1-3 shut down automatically and emergency g
from melting down after being shut down. As the pumps stopped, the re
ew days after nuclear reactor shutdown (smaller amounts of this heat
...</pre>
<p>The <code>"shut.?down"</code> expression searches for &#8220;shut&#8221;, followed optionally by any character, followed by &#8220;down&#8221;. This term matches the strings &#8220;shut down&#8221;, &#8220;shut-down&#8221;, and &#8220;shutdown&#8221;. If you&#8217;re a skeptic, you&#8217;re probably thinking &#8220;Meh, I could just search for &#8216;shut&#8217; and find all these cases anyway.&#8221;</p>
<p>Here&#8217;s a trickier example. The Fukushima plant involved a number of different buildings or &#8220;units&#8221; which contained reactors. Your first thought might be to search for &#8220;unit&#8221;:</p>
<pre>&gt; <b>python concordance.py '.{30}(?i)unit.{30}' fukushima.txt</b>
...
tance of nuclear power in the United States was eroded sharply f
 Switzerland, Taiwan, and the United States. Much of the help an
 The multiple nuclear reactor units involved in the Fukushima Da
d exposed fuel pools at three units.[79] On 21 December 2011, th
bine and reactor buildings of units 1 and 3 of contaminated water by
...</pre>
<p>Unfortunately, &#8220;unit&#8221; also matches &#8220;United&#8221;. You could add a space after unit, but you would miss &#8220;units&#8221;. You really just wanted to see &#8220;unit&#8221; or &#8220;units&#8221; followed by a number. This expression should do the trick:</p>
<pre>&gt; <b>python concordance.py '.{30}(?i)units? \d.{30}' fukushima.txt</b>
...
ark I containment, as used in Units 1 to 5. Key: DW, dry well enclo
ectric Power Company (TEPCO). Unit 1 is a 439 MWe type (BWR3) reac
2 Kern County earthquake.[31] Units 2 and 3 are both 784 MWe type B
ed operating in July 1974 and Unit 3 in March 1976. The earthquake
...</pre>
<p>The <code>"s?"</code> term optionally matches the letter s. As a result, the expression matches both &#8220;unit&#8221; and &#8220;units&#8221;. The <code>"\d"</code> element means &#8220;match any digit 0 through 9. The matcher can now find &#8220;unit 1&#8243;, &#8220;units 2 and 3&#8243;, &#8220;Units 4,5, and 6&#8243;, but does match &#8220;United States&#8221;. This is the behavior we want.</p>
<p>This is a lot to swallow for one post. In later posts, I&#8217;ll dive into what makes concordance.py tick, and how we can simplify the application to be useful for non-experts.</p>
<p>&#8212; Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/building-a-concordance-with-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing Quality Technical Information</title>
		<link>http://pythonexcels.com/writing-quality-technical-information/</link>
		<comments>http://pythonexcels.com/writing-quality-technical-information/#comments</comments>
		<pubDate>Sat, 30 Mar 2013 00:29:48 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/?p=104</guid>
		<description><![CDATA[In the UC Santa Cruz Extension program for Technical Writing and Communication, we looked at several different resources for suggested writing styles and guidelines (the Stanford Writing in the Sciences course by Coursera was too brief to cover many different &#8230; <a href="http://pythonexcels.com/writing-quality-technical-information/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>In the UC Santa Cruz Extension program for Technical Writing and Communication, we looked at several different resources for suggested writing styles and guidelines (the Stanford Writing in the Sciences course by Coursera was too brief to cover many different sources.) The book I found most useful was<a href="http://www.amazon.com/Developing-Quality-Technical-Information-Handbook/dp/0131477498"> Developing Quality Technical Information: A Handbook for Writers and Editors</a>, by Gretchen Hargis, Michelle Carey, and others. The authors are members of the technical writing staff at IBM, and write about the methodologies they use to author technical documentation. <a href="http://everypageispageone.com/series/tyranny-of-the-terrible-troica-task-concept-and-reference-reconsidered/">Not everyone agrees with the IBM way for technical documentation</a>, especially <a href="http://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</a>, but the guidelines form a great starting point.</p>
<p>Hopefully you can take some time to read the book and pick up some tips on writing techniques and style. I especially like the generous use of examples to illustrate the different points made in the descriptions of best practices. The introduction provides a framework for the rest of the book and boils down to these key concepts:</p>
<p><a href="http://pythonexcels.com/blog/wp-content/uploads/2013/03/Easy2.png"><img class="alignnone size-large wp-image-106" alt="Easy2" src="http://pythonexcels.com/blog/wp-content/uploads/2013/03/Easy2-1024x272.png" width="584" height="155" /></a></p>
<p>Easy to use, easy to understand, and easy to find, it sounds simple. Actually, much of the discussion of documentation best practices follow common sense rules. Who could argue for unclear, inaccurate, hard to search documentation? On the other hand, I did find that the examples helped me to understand where problems can creep into my documentation.</p>
<p>Overall, I highly recommend this book for technical writers and others who write technical documentation.</p>
<p>&#8211; Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/writing-quality-technical-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coursera Writing in the Sciences</title>
		<link>http://pythonexcels.com/coursera-writing-in-the-sciences/</link>
		<comments>http://pythonexcels.com/coursera-writing-in-the-sciences/#comments</comments>
		<pubDate>Sat, 23 Mar 2013 00:47:43 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/?p=100</guid>
		<description><![CDATA[I guess I’m a glutton for punishment. I recently completed the Writing in the Sciences course, a free Stanford course offered through Coursera.org. Coursera is an organization co-founded by Professors Andrew Ng and Daphne Koller at Stanford, with a mission &#8230; <a href="http://pythonexcels.com/coursera-writing-in-the-sciences/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I guess I’m a glutton for punishment. I recently completed the <a href="https://www.coursera.org/course/sciwrite">Writing in the Sciences</a> course, a free Stanford course offered through <a href="http://www.coursera.org">Coursera.org</a>. Coursera is an organization co-founded by Professors Andrew Ng and Daphne Koller at Stanford, with a mission to offer online, university-level courses for no charge. Coursera courses comprise a set of video lectures, computer-gradable quizzes and homework assignments in an 8 to 10 week format. At the end of the course, you receive a certificate of completion, but no Stanford credit. Coursera is a fantastic resource for anyone wishing to broaden their skills in any number of areas.</p>
<p>For a technical writer focusing on software and computer-related content, the focus on health and science writing in this course might not appear compelling. From the website for the course, “This course trains scientists to become more effective, efficient, and confident writers &#8230; Kristin Sainani (née Cobb) is a clinical assistant professor at Stanford University and also a health and science writer.” Well, I’m not a scientist, and I’m not involved with heath and science writing, why do I need this course?</p>
<p>As it turns out, the first four weeks of the class are broadly applicable to anyone that is authoring technical content for an audience. Professor Sainani&#8217;s lectures on editing were particularly superb; she is a brutal editor and encourages her students to really dive into the material, search for the essential meaning, and extract the cruft from your writing.</p>
<p>For technical writers of software or hardware products, the concerns Kristin mentions in science writing strongly overlap with your care-abouts. Overuse of acronyms, wordiness, lack of clarity, etc. are exactly the same writing issues we struggle with. The class finished in November 2012, keep watching the Coursera site for information on a future session.</p>
<p>&#8212; Dan</p>
<p><a href="http://pythonexcels.com/blog/wp-content/uploads/2013/03/writingsciences.png"><img class="alignnone size-medium wp-image-101" alt="writingsciences" src="http://pythonexcels.com/blog/wp-content/uploads/2013/03/writingsciences-219x300.png" width="219" height="300" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/coursera-writing-in-the-sciences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Certified</title>
		<link>http://pythonexcels.com/certified/</link>
		<comments>http://pythonexcels.com/certified/#comments</comments>
		<pubDate>Sat, 16 Mar 2013 00:09:14 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/?p=91</guid>
		<description><![CDATA[When I first started Technical Writing, my boss asked me to take a grammar class at University of California Santa Cruz (UCSC) Extension. I enjoyed the class, and decided to pursue the full certificate. Ten classes and two years later, &#8230; <a href="http://pythonexcels.com/certified/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>When I first started Technical Writing, my boss asked me to take a grammar class at University of California Santa Cruz (UCSC) Extension. I enjoyed the class, and decided to pursue the full certificate. Ten classes and two years later, I completed the entire program and received my certificate.</p>
<p><a href="http://pythonexcels.com/blog/wp-content/uploads/2013/03/UCSC.png"><img class="alignnone size-medium wp-image-95" alt="UCSC" src="http://pythonexcels.com/blog/wp-content/uploads/2013/03/UCSC-300x231.png" width="300" height="231" /></a></p>
<p>You can read more about the certificate program at http://www.ucsc-extension.edu/programs/technical-writing. To receive a certificate, you must complete 7 required and 3 elective classes. I chose the following classes:</p>
<ul>
<li>Information Architecture</li>
<li>Grammar and Style for Technical Communicators</li>
<li>Technical Communication: An Introduction to the Profession</li>
<li>Technical Writers’ Workshop</li>
<li>Writing Successful Instructions, Procedures and Policies</li>
<li>Developing Technical Information from Plan to Completion</li>
<li>Minimalist Design for Documentation</li>
<li>Graphic Design Fundamentals</li>
<li>Content Management</li>
<li>DITA Authoring, Introduction</li>
<li>Final Project</li>
</ul>
<p>Was it worth it? Some classes were certainly more informative and interesting than others. There were some repetition of material, but the feedback from the instructors and other students really helped me to hone my writing. It was also helpful to gain exposure to newer topics in technical writing, such as DITA. Overall, I’m glad I invested the time, it was definitely worth the effort.</p>
<p>It helped that my company was paying the tuition of around $600 per class. Surprisingly, I’m one of the rare exceptions in our group at work to take advantage of this opportunity. If you’re considering the investment in a Technical Writing certificate, you might also want to check out a recent discussion on the Linkedin Software User Assistance forum.</p>
<p><a href="http://www.linkedin.com/groups/Technical-Writing-Certificates-Are-they-1276817.S.145103688">http://www.linkedin.com/groups/Technical-Writing-Certificates-Are-they-1276817.S.145103688</a></p>
<p>Thanks —- Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/certified/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Change is Good</title>
		<link>http://pythonexcels.com/change-is-good/</link>
		<comments>http://pythonexcels.com/change-is-good/#comments</comments>
		<pubDate>Sat, 09 Mar 2013 00:57:48 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/?p=89</guid>
		<description><![CDATA[To allow more time to help my wife at her restaurant, I&#8217;ve made a career change to a position with more flexible hours. Now I&#8217;m working as Technical Writer in a large electronic design software company, as part of a &#8230; <a href="http://pythonexcels.com/change-is-good/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>To allow more time to help my wife at her restaurant, I&#8217;ve made a career change to a position with more flexible hours. Now I&#8217;m working as Technical Writer in a large electronic design software company, as part of a team that creates leading-edge documentation for the company&#8217;s flagship product. Being a huge consumer of technical writing, I know how difficult it is to create compelling, concise, accurate and readable documentation. On a daily basis, I&#8217;m googling for information on some programming issue, or referring to <a href="http://safaribooks.com">safaribooksonline.com</a> in search of a more elegant solution to solve a problem I&#8217;ve encountered. On the other hand, the more technical writing I consume, the more I question how the content is generated.</p>
<p>Until the recession, the field was filled with Technical Writers who were expert in crafting grammatically correct sentences, weaving them together into coherent paragraphs, organizing paragraphs into sections, gathering the sections to create chapters, and folding the chapters into a book. Yet, I wonder if most technical writers have ever done a sustained deep dive into a technical manual in order to master a technical subject or software tool. Yes, technical writing should be grammatically correct and use good sentence structure. But in my experience, the writing itself is much less important than the accuracy of the material, the completeness of the content, the organization of the material, and the quality of the examples. The writers of Developing Quality Technical Information: A Handbook for Writers and Editors certainly understood this when they advocated a minimalist style. Readers don’t want to wade through pages of text, they just want to get to the command, options, menu pick, process or technique that will get the job done for them.</p>
<p>Like lines of code in program, less is always better if it can serve the same function. Leave the superfluous adverbs and adjectives to the Marketing department, and let’s focus on the core information our users need. Strive to make the writing accurate and precise, eliminate the unnecessary cruft.</p>
<p>I have an audacious goal for you writers out there: make the accuracy of your content a key concern. Despite twenty five years of inertia, it’s happening where I work, and I want to share the secrets with you. In the coming posts, I’ll share with you some simple tools and techniques to raise the quality of your documentation. By sharing this information with others in your organization, you can help them at the same time.</p>
<p>&#8212; Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/change-is-good/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ninety Six Spreadsheets</title>
		<link>http://pythonexcels.com/ninety-six-spreadsheets/</link>
		<comments>http://pythonexcels.com/ninety-six-spreadsheets/#comments</comments>
		<pubDate>Sat, 22 Sep 2012 23:22:28 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/blog/?p=80</guid>
		<description><![CDATA[Thank goodness for Python. My wife asked me this morning, &#8220;Honey, can you give me the history of raises for Steve Smithfield and Jeff Johnson&#8221;. I told her I&#8217;ll look into it, and thought how I might use Python to &#8230; <a href="http://pythonexcels.com/ninety-six-spreadsheets/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Thank goodness for Python. My wife asked me this morning, &#8220;Honey, can you give me the history of raises for Steve Smithfield and Jeff Johnson&#8221;. I told her I&#8217;ll look into it, and thought how I might use Python to tackle the problem.</p>
<p>I have the entire history of payroll captured across ninety six spreadsheets, one for each pay period.</p>
<p><img src="/blog/_images/20120922_payroll.png" alt="" /></p>
<p>To manually click through each spreadsheet, locate the pay rate for the Steve and Jeff, write it down, and go to the next spreadsheet would take about 30 seconds per spreadsheet, or almost 50 minutes. I decided to invest 10 minutes in a Python script I could use over and over. The script basically opens every .xls file in the local directory and creates a list of employee names in the spreadsheet. If Steve or Jeff are found in the list, their salary is appended to a list. After all the spreadsheets are read, the script prints out the results. The spreadsheets are named &#8220;Timesheet_20120101.xls&#8221; for January 1, 2012, &#8220;Timesheet_20120515.xls&#8221; for May 15, 2012, etc.</p>
<p>Here is the completed script</p>
<script src="https://gist.github.com/3768201.js"></script><noscript><pre><code class="language-python python">#
# payroll_steve_jeff.py
# Report payrates for two employees across multiple spreadsheets
#
import win32com.client as win32
import glob
import os

xlfiles = sorted(glob.glob(&quot;*.xls&quot;))
print &quot;Reading %d files...&quot;%len(xlfiles)

steve = []
jeff = []
cwd = os.getcwd()
excel = win32.gencache.EnsureDispatch('Excel.Application')
for xlfile in xlfiles:
    wb = excel.Workbooks.Open(cwd+&quot;\\&quot;+xlfile)
    ws = wb.Sheets('PAYROLL')
    xldata = ws.UsedRange.Value
    names = [r[1] for r in xldata]
    if u'SMITHFIELD, STEVE' in names:
        indx = names.index(u'SMITHFIELD, STEVE')
        steve.append(xldata[indx][4])
    else:
        steve.append(0)

    if u'JOHNSON, JEFF' in names:
        indx = names.index(u'JOHNSON, JEFF')
        jeff.append(xldata[indx][4])
    else:
        jeff.append(0)
    wb.Close()


print &quot;File,Jeff,Steve&quot;
for i in range(len(xlfiles)):
    print &quot;%s,%0.2f,%0.2f&quot;%(xlfiles[i],jeff[i],steve[i])
excel.Application.Quit()

    
</code></pre></noscript>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/ninety-six-spreadsheets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A User Friendly Experience</title>
		<link>http://pythonexcels.com/a-user-friendly-experience/</link>
		<comments>http://pythonexcels.com/a-user-friendly-experience/#comments</comments>
		<pubDate>Sun, 07 Feb 2010 13:00:57 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/blog/?p=76</guid>
		<description><![CDATA[Let’s be honest for a second, when was the last time you saw a Windows user running something from the command prompt? Well, I do it occasionally, but I can’t say I remember seeing a non-IT person using the command &#8230; <a href="http://pythonexcels.com/a-user-friendly-experience/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Let’s be honest for a second, when was the last time you saw a Windows user running something from the command prompt? Well, I do it occasionally, but I can’t say I remember seeing a non-IT person using the command prompt recently. So if you’re going to offer your users a Windows program, you better give them an icon to click and let them drag stuff onto it. And if something goes wrong, you better have a decent error message. This post will take the Pivot Table generation script developed in the <a href="http://pythonexcels.com/blog/extending-pivot-table-data/">Extending Pivot Table Data</a> post and turn it into a user friendly Windows program with better flexibility and improved user experience.</p>
<p>The scripts developed previously could be run at the command line or by double clicking on the icon for the script line this.</p>
<p><img alt="Command Line" src="http://pythonexcels.com/blog/_images/20100207_commandexe1.png" /> <img alt="Script Icon" src="http://pythonexcels.com/blog/_images/20100207_erpicon.png" /></p>
<p>This works because the input file name, ABCDCatering.xls, is hard coded within the script. In the real world, your users have folders containing dozens of randomly named spreadsheets. If a user accidentally provides a corrupt spreadsheet, the program should keep cranking through the other files and let the user recover the damaged file later. The script developed in the last post needs some enhancements to make it more user friendly, including:</p>
<ul>
<li>Provide support for multiple randomly named input spreadsheets</li>
<li>Add some simple message boxes and drag-and-drop support</li>
<li>Improve the error checking and error recovery to give the user feedback when something goes wrong</li>
</ul>
<p>To keep things concise, this version of the script only allows the user to run the program by dragging and dropping files onto the program icon. Enhancing the script to also support command line operation is left as an exercise for the user. Let’s work through each of the usability issues below:</p>
<p><strong>Multiple File Support</strong></p>
<p>As I mentioned, Windows XP/Vista/7 users typically don’t interact with the command prompt. Instead, programs are run by clicking on their icons, either from the desktop, a folder, or the Start menu. A user specifies spreadsheets or document files by opening them in the application or dragging them onto the program icon on the desktop or in the Explorer window. You can also add the file names after the program name at the command prompt if needed.</p>
<p>To process multiple files, the program needs to process command line args, which are already conveniently available in the <code>sys.argv</code> list. Note that the first argument <code>sys.argv[0]</code> is used for the script name. The <code>runexcel</code> function is modified to pass <code>sys.argv</code> to the runexcel function, which loops through each of the input files.</p>
<pre>if __name__ == "__main__":
    runexcel(sys.argv)

for fname in args[1:]:
    # Process spreadsheet files</pre>
<p>The <code>for</code> loop wraps the <code>wb = excel.Workbooks.Open(fname)</code> call, the <code>wb.SaveAs()</code> call, and everything in between so each workbook is processed within the loop. After the loop finishes, a check for errors is made. If any errors occurred a warning and message box are issued.</p>
<p><strong>Primitive GUI Support</strong></p>
<p>Adding message boxes and providing basic drag-and-drop support adds a level of familiarity for Windows users. Python supports a large number of GUI frameworks, see<a href="http://wiki.python.org/moin/GuiProgramming">http://wiki.python.org/moin/GuiProgramming</a> for a comprehensive list. Building a complete graphic interface for this script is beyond the scope of this article, and isn’t really necessary anyway. Instead, you can add support for simple message boxes using the MessageBoxA function built into Windows. The basic pattern for calling a message box using this technique is to <code>import ctypes</code> and call <code>windll.user32.MessageBoxA</code>:</p>
<pre>from ctypes import *
windll.user32.MessageBoxA(None,"My Message Box","Program Name",0)</pre>
<p>This simple code produces a message box with the text “My Message Box”, an OK button, and “Program Name” as the top banner. When Python encounters <code>windll.user32.MessageBoxA()</code>, program execution pauses until the user clicks the OK button.</p>
<p><img alt="Messsage Box" src="http://pythonexcels.com/blog/_images/20100207_messagebox.png" /></p>
<p><strong>Improve Error Checking</strong></p>
<p>Lots of problems can happen when reading user spreadsheet data. The user can forget to specify an input file. They could try to have the script read a Word document or other non-spreadsheet file type. The spreadsheet might be corrupted. You need to bulletproof your script and guard against potential issues, both known and unknown.</p>
<p>Previous versions of the script made limited use of the <code>try/except</code> pattern to catch errors.</p>
<pre>try:
    wb = excel.Workbooks.Open('ABCDCatering.xls')
except:
    print "Failed to open spreadsheet ABCDCatering.xls"
    sys.exit(1)</pre>
<p>erppivotdragdrop.py makes more liberal use of <code>try/except</code>, wrapping more of the program code in the <code>try</code> block. If an error occurs, it can be handled more cleanly with nice warning messages. The downside of using <code>try/except</code> is that you lose the traceback message telling you where the error occurred. To get this information back, use the <code>traceback</code> module and the <code>traceback.print_exc()</code>function. One usage is to call <code>traceback.print_exc()</code> in the <code>except</code> block like this:</p>
<pre>import traceback
try:
  a = 1/0
except:
  # Do error recovery
  traceback.print_exc()</pre>
<p>Now exceptions are caught, handled, and a more detailed traceback is still available.</p>
<p><strong>Running the script</strong></p>
<p>Let’s test out the script. First, copy the script to the desktop and drag the ABCDCatering.xls spreadsheet onto the icon. Python starts running in the command window and begins processing the file you dragged. If everything ran successfully, you’ll see a series of messages and the “Finished” message box.</p>
<p><img alt="Finished" src="http://pythonexcels.com/blog/_images/20100207_noerror.png" /></p>
<p>If a problem occurred, a message is displayed in the command window. At the end of the run, the message box is displayed letting you know that something bad happened and that you should review the error messages.</p>
<p><img alt="Error Message" src="http://pythonexcels.com/blog/_images/20100207_haserror.png" /></p>
<p>The completed script is too long to reproduce here, please go <a href="https://github.com/pythonexcels/examples/blob/master/erppivotdragdrop.py">here</a> to view the complete script.</p>
<p><strong>Prerequisites</strong></p>
<p>Python (refer to <a href="http://www.python.org/">http://www.python.org</a>)</p>
<p>Win32 Python module (refer to <a href="http://sourceforge.net/projects/pywin32">http://sourceforge.net/projects/pywin32</a>)</p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong></p>
<p>Source for the program erppivotextended.py and spreadsheet file ABCDCatering.xls are available at <a href="http://github.com/pythonexcels/examples">http://github.com/pythonexcels/examples</a></p>
<p>Thanks — Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/a-user-friendly-experience/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extending Pivot Table Data</title>
		<link>http://pythonexcels.com/extending-pivot-table-data/</link>
		<comments>http://pythonexcels.com/extending-pivot-table-data/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 13:00:36 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/blog/?p=72</guid>
		<description><![CDATA[As shown in the last post, Automating Pivot Tables with Python, Python and Excel can help you quickly clean up a spreadsheet, organize data and build useful reports in very few lines of code. Another useful data preparation technique is &#8230; <a href="http://pythonexcels.com/extending-pivot-table-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>As shown in the last post, <a href="http://pythonexcels.com/blog/automating-pivot-tables-with-python/">Automating Pivot Tables with Python</a>, Python and Excel can help you quickly clean up a spreadsheet, organize data and build useful reports in very few lines of code. Another useful data preparation technique is to build new columns of information based on the available data. For example, you could add an industry segment column to group company names by industry, or add an item type column to group sales items by category. While Excel does have some functions to help with adding new data fields, automation with Python eliminates the tedium of clicking column names and entering formulas.</p>
<p>Excel does provide a function for calculating new values within a pivot table. One example is extending a pivot table containing pricing and quantity data to compute an average selling price. For example, given the table below:</p>
<p><img src="http://pythonexcels.com/blog/_images/20091203_salesbyqtr.png" alt="Sales by Quarter" /></p>
<p>a new label called “ASP”, which is the Net Booking divided by the Quantity, can be added quickly and easily with Excel’s Calculated Field capability.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091203_calcfield.png" alt="Insert Calculated Field" /></p>
<p>This feature is handy for adding labels on the fly that require a simple calculation.</p>
<p>In other cases, deriving the new field may not be so simple, yet needs to be performed each time the spreadsheet is updated. Python can programmatically add new data fields to the source table so that the data is ready for viewing whenever the pivot table is opened.</p>
<p>The script developed last time automated the data cleanup and pivot table generation tasks. Doing some further analysis based on the output spreadsheet, I created a chart of the Top 10 Customers for ABCD Catering:</p>
<p><img src="http://pythonexcels.com/blog/_images/20091203_top10chart.png" alt="Top 10 Customers Chart" /></p>
<p>Note that some of the company names are 15 characters or longer in length and occupy much of the chart space. It would be nice to have a shorter “nickname” for each company that could be used in the charts. One solution is to cut and paste the pivot table data, then modify the Company Name information by hand. Unfortunately, this would be very tedious. Another approach is to automate the process in the script and create a new column derived from a comprehensive reference table of company names and nicknames. The downside is that maintaining the list could be an issue as the business grows and the list of customers grows longer. A third method is to create an algorithm that uses the first word in the company name wherever possible, and uses a defined nickname for other special cases. “Sun Microsystems” becomes “Sun” and “Cisco Systems” becomes “Cisco”, while other company names such as “Hewlett-Packard” could be listed in a lookup with a nickname such as “HP”. The snippet below shows how this is done.</p>
<pre>logolookup = {'Applied Materials':'AMAT', 'Electronic Arts':'EA',
              'Hewlett-Packard':'HP', 'KLA-Tencor':'KLA'}
if ("Company Name" in newdata[0]):
    cindx = newdata[0].index("Company Name")
    newdata[0][cindx+1:cindx+1] = ["Logo Name"]
    for rcnt in range(1,len(newdata)):
        if newdata[rcnt][cindx] in logolookup:
            newdata[rcnt][cindx+1:cindx+1] = [logolookup[newdata[rcnt][cindx]]]
        else:
            newname = newdata[rcnt][cindx].split()[0]
            newdata[rcnt][cindx+1:cindx+1] = [newname]
            logolookup[newdata[rcnt][cindx]] = newname</pre>
<p>This code begins with a simple lookup for company names and can be easily extended as special case company names are added. Next, the column location of the “Company Name” field is identified and the new header “Logo Name” is inserted after “Company Name” in the list using the :file:` list[index:index]` construct. The :file:` for` loop iterates over each row in the table, checking whether the company name for that row exists in the :file:` logolookup` dictionary, then inserting the abbreviated name. If not found, then the original company name is :file:` split()` into words and the first word used as the new abbreviated name. Finally, the :file:` logolookup` dictionary is updated with the new abbreviated name.</p>
<p>After running the program, the new column “Logo Name” has been inserted after “Company Name” and contains the shortened company names.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091203_withlogo.png" alt="Company Name and Logo Name" /></p>
<p>The new “Logo Name” column can be used in the previous pivot table and chart, replacing the “Company Name” field and producing a cleaner chart with less area used for displaying company name information.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091203_top10wlogo.png" alt="New Top 10 Customers Chart" /></p>
<p>Another use of this technique is to add a label for “Food Category” based on the type of food purchased. For example, the food items sold by ABCD Catering are: Caesar Salad, Cheese Pizza, Cheeseburger, Chocolate Sundae, Churro, Hamburger, Hot Dog, Pepperoni Pizza, Potato Chips and Soda. Let’s say that your manager wants to track the sales of different food categories, such as Burger, Dessert, HotDog, Drink, Pizza, Salad and Snack. Using the same technique outlined above, this code will add a column for Food Category with the appropriate entry for each food item:</p>
<pre>foodlookup = {'Caesar Salad':'Salad', 'Cheese Pizza':'Pizza',
              'Cheeseburger':'Burger', 'Chocolate Sundae':'Dessert',
              'Churro':'Snack', 'Hamburger':'Burger', 'Hot Dog':'HotDog',
              'Pepperoni Pizza':'Pizza', 'Potato Chips':'Snack',
              'Soda':'Drink'}
if ("Food Name" in newdata[0]):
    cindx = newdata[0].index("Food Name")
    newdata[0][cindx+1:cindx+1] = ["Food Category"]
    for rcnt in range(1,len(newdata)):
        if newdata[rcnt][cindx] in foodlookup:
            newdata[rcnt][cindx+1:cindx+1] = [foodlookup[newdata[rcnt][cindx]]]
        else:
            newdata[rcnt][cindx+1:cindx+1] = ['UNDEFINED']</pre>
<p>If a food item is not found in the lookup, the category is labeled UNDEFINED. This is an indication that there is a problem with the script and the lookup for food categories needs to be extended.</p>
<p>The section of the script which creates the pivot tables can be easily extended to build a new table based on the newly created label “Food Category”:</p>
<pre># What food category had the highest unit sales in Q4?
ptname = addpivot(wb,src,
         title="Unit Sales by Food Category",
         filters=("Fiscal Quarter",),
         columns=(),
         rows=("Food Category",),
         sumvalue="Sum of Quantity",
         sortfield=("Food Category",win32c.xlDescending))
wb.Sheets("Unit Sales by Food Category").PivotTables(ptname).PivotFields("Fiscal Quarter").CurrentPage = "2009-Q4"</pre>
<p>Based on the output spreadsheet, the best selling food category in Q4 based on quantity is “Snack”, with sales of 13700 units.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091203_foodcategory.png" alt="Sales by Food Category" /></p>
<p>Here is the completed erppivotextended.py script, also available on <a href="http://github.com/pythonexcels/examples">GitHub</a></p>
<script src="https://gist.github.com/3767934.js"></script><noscript><pre><code class="language-python python">#
# erppivotextended.py:
# Load raw EPR data, clean up header info,
# insert additional data fields and build 5 pivot tables
#
import win32com.client as win32
win32c = win32.constants
import sys
import itertools
tablecount = itertools.count(1)

def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=&quot;&quot;):
    &quot;&quot;&quot;Build a pivot table using the provided source location data
    and specified fields
    &quot;&quot;&quot;
    newsheet = wb.Sheets.Add()
    newsheet.Cells(1,1).Value = title
    newsheet.Cells(1,1).Font.Size = 16

    # Build the Pivot Table
    tname = &quot;PivotTable%d&quot;%tablecount.next()

    pc = wb.PivotCaches().Add(SourceType=win32c.xlDatabase,
                                 SourceData=sourcedata)
    pt = pc.CreatePivotTable(TableDestination=&quot;%s!R4C1&quot;%newsheet.Name,
                             TableName=tname,
                             DefaultVersion=win32c.xlPivotTableVersion10)
    wb.Sheets(newsheet.Name).Select()
    wb.Sheets(newsheet.Name).Cells(3,1).Select()
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1

    wb.ActiveSheet.PivotTables(tname).AddDataField(
        wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
        sumvalue,
        win32c.xlSum)
    if len(sortfield) != 0:
        wb.ActiveSheet.PivotTables(tname).PivotFields(sortfield[0]).AutoSort(sortfield[1], sumvalue)
    newsheet.Name = title

    # Uncomment the next command to limit output file size, but make sure
    # to click Refresh Data on the PivotTable toolbar to update the table
    # newsheet.PivotTables(tname).SaveData = False

    return tname

def runexcel():
    &quot;&quot;&quot;Open the spreadsheet ABCDCatering.xls, clean it up,
    and add pivot tables
    &quot;&quot;&quot;
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    excel.Visible = True
    try:
        wb = excel.Workbooks.Open('ABCDCatering.xls')
    except:
        print &quot;Failed to open spreadsheet ABCDCatering.xls&quot;
        sys.exit(1)
    ws = wb.Sheets('Sheet1')
    xldata = ws.UsedRange.Value
    newdata = []
    for row in xldata:
        if len(row) == 13 and row[-1] is not None:
            newdata.append(list(row))
    lasthdr = &quot;Col A&quot;
    for i,field in enumerate(newdata[0]):
        if field is None:
            newdata[0][i] = lasthdr + &quot; Name&quot;
        else:
            lasthdr = newdata[0][i]

    logolookup = {'Applied Materials':'AMAT', 'Electronic Arts':'EA',
                  'Hewlett-Packard':'HP', 'KLA-Tencor':'KLA'}
    if (&quot;Company Name&quot; in newdata[0]):
        cindx = newdata[0].index(&quot;Company Name&quot;)
        newdata[0][cindx+1:cindx+1] = [&quot;Logo Name&quot;]
        for rcnt in range(1,len(newdata)):
            if newdata[rcnt][cindx] in logolookup:
                newdata[rcnt][cindx+1:cindx+1] = [logolookup[newdata[rcnt][cindx]]]
            else:
                newname = newdata[rcnt][cindx].split()[0]
                newdata[rcnt][cindx+1:cindx+1] = [newname]
                logolookup[newdata[rcnt][cindx]] = newname
            
    foodlookup = {'Caesar Salad':'Salad', 'Cheese Pizza':'Pizza',
                  'Cheeseburger':'Burger', 'Chocolate Sundae':'Dessert',
                  'Churro':'Snack', 'Hamburger':'Burger', 'Hot Dog':'HotDog',
                  'Pepperoni Pizza':'Pizza', 'Potato Chips':'Snack',
                  'Soda':'Drink'}
    if (&quot;Food Name&quot; in newdata[0]):
        cindx = newdata[0].index(&quot;Food Name&quot;)
        newdata[0][cindx+1:cindx+1] = [&quot;Food Category&quot;]
        for rcnt in range(1,len(newdata)):
            if newdata[rcnt][cindx] in foodlookup:
                newdata[rcnt][cindx+1:cindx+1] = [foodlookup[newdata[rcnt][cindx]]]
            else:
                newdata[rcnt][cindx+1:cindx+1] = ['UNDEFINED']
            
    rowcnt = len(newdata)
    colcnt = len(newdata[0])
    wsnew = wb.Sheets.Add()
    wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(rowcnt,colcnt)).Value = newdata
    wsnew.Columns.AutoFit()

    src = &quot;%s!R1C1:R%dC%d&quot;%(wsnew.Name,rowcnt,colcnt)

    # What were the total sales in each of the last four quarters?
    addpivot(wb,src,
             title=&quot;Sales by Quarter&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # What are the sales for each food item in each quarter?
    addpivot(wb,src,
             title=&quot;Sales by Food Item&quot;,
             filters=(),
             columns=(&quot;Food Name&quot;,),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # Who were the top 10 customers for ABCD Catering in 2009?
    addpivot(wb,src,
             title=&quot;Top 10 Customers&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Company Name&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Company Name&quot;,win32c.xlDescending))

    # Who was the highest producing sales rep for the year?
    addpivot(wb,src,
             title=&quot;Top Sales Reps&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Sales Rep Name&quot;,&quot;Company Name&quot;),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Sales Rep Name&quot;,win32c.xlDescending))

    # What food item had the highest unit sales in Q4?
    ptname = addpivot(wb,src,
             title=&quot;Unit Sales by Food&quot;,
             filters=(&quot;Fiscal Quarter&quot;,),
             columns=(),
             rows=(&quot;Food Name&quot;,),
             sumvalue=&quot;Sum of Quantity&quot;,
             sortfield=(&quot;Food Name&quot;,win32c.xlDescending))
    wb.Sheets(&quot;Unit Sales by Food&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;

    # What food category had the highest unit sales in Q4?
    ptname = addpivot(wb,src,
             title=&quot;Unit Sales by Food Category&quot;,
             filters=(&quot;Fiscal Quarter&quot;,),
             columns=(),
             rows=(&quot;Food Category&quot;,),
             sumvalue=&quot;Sum of Quantity&quot;,
             sortfield=(&quot;Food Category&quot;,win32c.xlDescending))
    wb.Sheets(&quot;Unit Sales by Food Category&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;

    if int(float(excel.Version)) &gt;= 12:
        wb.SaveAs('newABCDCatering.xlsx',win32c.xlOpenXMLWorkbook)
    else:
        wb.SaveAs('newABCDCatering.xls')
    excel.Application.Quit()

if __name__ == &quot;__main__&quot;:
    runexcel()</code></pre></noscript>
<p><strong>Prerequisites</strong></p>
<p>Python (refer to <a href="http://www.python.org/">http://www.python.org</a>)</p>
<p>Win32 Python module (refer to <a href="http://sourceforge.net/projects/pywin32">http://sourceforge.net/projects/pywin32</a>)</p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong></p>
<p>Source for the program erppivotextended.py and spreadsheet file ABCDCatering.xls are available at <a href="http://github.com/pythonexcels/examples">http://github.com/pythonexcels/examples</a></p>
<p>Thanks — Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/extending-pivot-table-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automating Pivot Tables with Python</title>
		<link>http://pythonexcels.com/automating-pivot-tables-with-python/</link>
		<comments>http://pythonexcels.com/automating-pivot-tables-with-python/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 13:00:08 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pythonexcels.com/blog/?p=68</guid>
		<description><![CDATA[In the last post I explained the basic concept behind Pivot Tables and provided some examples. Pivot tables are an easy-to-use tool to derive some basic business intelligence from your data. As discussed last time, there are occasions when you’ll need to &#8230; <a href="http://pythonexcels.com/automating-pivot-tables-with-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>In the <a href="http://www.pythonexcels.com/2009/11/introducing-pivot-tables">last post</a> I explained the basic concept behind Pivot Tables and provided some examples. Pivot tables are an easy-to-use tool to derive some basic business intelligence from your data. As discussed last time, there are occasions when you’ll need to do interactive data mining by changing column and row fields. But in my experience, it’s handy to have my favorite reports built automatically, with the reports ready to go as soon as I open the spreadsheet. In this post I’ll develop and explain the code to create a set of pivot tables automatically in worksheet.</p>
<p>The goal of this exercise is to automate the generation of pivot tables from the last post, and save them to a new Excel file.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091123_reports.png" alt="Pivot Tables" /></p>
<p>I started with the file <code>newABCDCatering.xls</code> from the previous post and record the macro to create this simple pivot table showing Net Bookings by Sales Rep and Food Name for the last four quarters.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091123_setup.png" alt="Net Bookings" /></p>
<p>Captured in Excel 2007, the recorded macro looks like this:</p>
<script src="https://gist.github.com/3767886.js"></script><noscript><pre><code class="language- ">'
' Macro2 Macro
'

'
    Selection.CurrentRegion.Select
    Sheets.Add
    ActiveWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:= _
        &quot;Sheet2!R1C1:R791C13&quot;, Version:=xlPivotTableVersion10).CreatePivotTable _
        TableDestination:=&quot;Sheet3!R3C1&quot;, TableName:=&quot;PivotTable1&quot;, DefaultVersion _
        :=xlPivotTableVersion10
    Sheets(&quot;Sheet3&quot;).Select
    Cells(3, 1).Select
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Fiscal Year&quot;)
        .Orientation = xlPageField
        .Position = 1
    End With
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Fiscal Quarter&quot;)
        .Orientation = xlColumnField
        .Position = 1
    End With
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Sales Rep Name&quot;)
        .Orientation = xlRowField
        .Position = 1
    End With
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Food Name&quot;)
        .Orientation = xlRowField
        .Position = 2
    End With
    ActiveSheet.PivotTables(&quot;PivotTable1&quot;).AddDataField ActiveSheet.PivotTables( _
        &quot;PivotTable1&quot;).PivotFields(&quot;Net Booking&quot;), &quot;Sum of Net Booking&quot;, xlSum
End Sub</code></pre></noscript>
<p>The post <a href="http://pythonexcels.com/blog/mapping-excel-vb-macros-to-python/">Mapping Excel VB Macros to Python</a> covered a technique for recording a Visual Basic macro and porting it to Python. Using that approach, you could simply turn on the macro recorder and generate all the required tables, producing a long script with lots of redundancy. A better approach is to build a general purpose function that can be used over and over to generate the pivot tables.</p>
<p>Looking at the macro, you see lines specifying the <code>Orientation</code> of the field name, such as <code>.Orientation = xlRowField</code> and <code>.Orientation = xlColumnField</code>. A pivot table has four basic areas for fields:</p>
<ul>
<li>Report Filter (<code>.Orientation = xlPageField</code>)</li>
<li>Column area (<code>.Orientation = xlColumnField</code>)</li>
<li>Row area (<code>.Orientation = xlRowField</code>)</li>
<li>Values area (<code>PivotTables().AddDataField()</code>)</li>
</ul>
<p>Each of these supports multiple fields (column fields for <code>Sales Rep Name</code> and <code>Food Name</code> were added in the example). The ordering of the fields changes the appearance of the table.</p>
<p>A general pattern should be apparent in this macro. First, the pivot table is created with the <code>ActiveWorkbook.PivotCaches.Create()</code> statement. Next, the columns and rows are configured with a series of <code>ActiveSheet.PivotTables("PivotTable1").PivotFields()</code> statements. Finally, the field used in the <code>Values</code> section of the table is configured using the <code>ActiveSheet.PivotTables("PivotTable1").AddDataField</code> statement. The general purpose function will need to contain all of these constructs. Note the parts that can’t be hard-coded: the source of the data, <code>"Sheet2!R1C1:R791C13"</code>, and destination for the table, <code>"Sheet3!R3C1"</code> need to be determined based on the characteristics of the source data and can’t be hard coded in the general solution.</p>
<p>In Python, this pattern can be reduced to the following loop that covers fields for the Report Filter, Columns and Rows:</p>
<pre>def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=""):
    """Build a pivot table using the provided source location data
    and specified fields
    """
    ...
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
        wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1
    ...</pre>
<p>Processing the Values field is more or less copied from the Visual Basic. To keep things simple in this example, this code is limited to adding “Sum of” values only, and doesn’t handle other Summarize Value functions such as Count, Min, Max, etc.</p>
<pre>wb.ActiveSheet.PivotTables(tname).AddDataField(
    wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
    sumvalue,
    win32c.xlSum)</pre>
<p>The actual values for <code>filters</code>, <cite>columns</cite> and <code>rows</code> in the function are defined in the call to the function. The complete function creates a new sheet within the workbook, then adds an empty pivot table to the sheet and builds the table using the field information provided. For example, to answer the question: <em>What were the total sales in each of the last four quarters?</em>, the pivot table is built with the following call to the <code>addpivot</code> function:</p>
<pre># What were the total sales in each of the last four quarters?
addpivot(wb,src,
         title="Sales by Quarter",
         filters=(),
         columns=(),
         rows=("Fiscal Quarter",),
         sumvalue="Sum of Net Booking",
         sortfield=())</pre>
<p>which defines a pivot table using the row header “Fiscal Quarter” and data value “Sum of Net Booking”. The title “Sales by Quarter” is used to name the sheet itself.</p>
<p>To make the output spreadsheet more understandable, the title parameter passed into the function and used as a title in each worksheet and as the tab name.</p>
<p><img src="http://pythonexcels.com/blog/_images/20091123_titletabsbq.png" alt="Title Tabs" /></p>
<p>The complete script is shown below. Caveats:</p>
<ul>
<li>This script has been modified to run on both Excel 2007 and Excel 2003 and has been tested on those versions.</li>
<li>Adding pivot tables increases the size of the output Excel file, which can be mitigated by disabling caching of pivot table data. Line 48 of the script contains the command <code>newsheet.PivotTables(tname).SaveData = False</code>, which has been commented out. Uncommenting this command will reduce the size of the output Excel file, but will require that the pivot table be refreshed before use by clicking on Refresh Data on the PivotTable toolbar.</li>
</ul>
<script src="https://gist.github.com/3767898.js"></script><noscript><pre><code class="language-python python">#
# erpdatapivot.py:
# Load raw EPR data, clean up header info and
# build 5 pivot tables
#
import win32com.client as win32
win32c = win32.constants
import sys
import itertools
tablecount = itertools.count(1)

def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=&quot;&quot;):
    &quot;&quot;&quot;Build a pivot table using the provided source location data
    and specified fields
    &quot;&quot;&quot;
    newsheet = wb.Sheets.Add()
    newsheet.Cells(1,1).Value = title
    newsheet.Cells(1,1).Font.Size = 16

    # Build the Pivot Table
    tname = &quot;PivotTable%d&quot;%tablecount.next()

    pc = wb.PivotCaches().Add(SourceType=win32c.xlDatabase,
                                 SourceData=sourcedata)
    pt = pc.CreatePivotTable(TableDestination=&quot;%s!R4C1&quot;%newsheet.Name,
                             TableName=tname,
                             DefaultVersion=win32c.xlPivotTableVersion10)
    wb.Sheets(newsheet.Name).Select()
    wb.Sheets(newsheet.Name).Cells(3,1).Select()
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1

    wb.ActiveSheet.PivotTables(tname).AddDataField(
        wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
        sumvalue,
        win32c.xlSum)
    if len(sortfield) != 0:
        wb.ActiveSheet.PivotTables(tname).PivotFields(sortfield[0]).AutoSort(sortfield[1], sumvalue)
    newsheet.Name = title

    # Uncomment the next command to limit output file size, but make sure
    # to click Refresh Data on the PivotTable toolbar to update the table
    # newsheet.PivotTables(tname).SaveData = False

    return tname

def runexcel():
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    #excel.Visible = True
    try:
        wb = excel.Workbooks.Open('ABCDCatering.xls')
    except:
        print &quot;Failed to open spreadsheet ABCDCatering.xls&quot;
        sys.exit(1)
    ws = wb.Sheets('Sheet1')
    xldata = ws.UsedRange.Value
    newdata = []
    for row in xldata:
        if len(row) == 13 and row[-1] is not None:
            newdata.append(list(row))
    lasthdr = &quot;Col A&quot;
    for i,field in enumerate(newdata[0]):
        if field is None:
            newdata[0][i] = lasthdr + &quot; Name&quot;
        else:
            lasthdr = newdata[0][i]
    rowcnt = len(newdata)
    colcnt = len(newdata[0])
    wsnew = wb.Sheets.Add()
    wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(rowcnt,colcnt)).Value = newdata
    wsnew.Columns.AutoFit()

    src = &quot;%s!R1C1:R%dC%d&quot;%(wsnew.Name,rowcnt,colcnt)

    # What were the total sales in each of the last four quarters?
    addpivot(wb,src,
             title=&quot;Sales by Quarter&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # What are the sales for each food item in each quarter?
    addpivot(wb,src,
             title=&quot;Sales by Food Item&quot;,
             filters=(),
             columns=(&quot;Food Name&quot;,),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # Who were the top 10 customers for ABCD Catering in 2009?
    addpivot(wb,src,
             title=&quot;Top 10 Customers&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Company Name&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Company Name&quot;,win32c.xlDescending))

    # Who was the highest producing sales rep for the year?
    addpivot(wb,src,
             title=&quot;Top Sales Reps&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Sales Rep Name&quot;,&quot;Company Name&quot;),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Sales Rep Name&quot;,win32c.xlDescending))

    # What food item had the highest unit sales in Q4?
    ptname = addpivot(wb,src,
             title=&quot;Unit Sales by Food&quot;,
             filters=(&quot;Fiscal Quarter&quot;,),
             columns=(),
             rows=(&quot;Food Name&quot;,),
             sumvalue=&quot;Sum of Quantity&quot;,
             sortfield=(&quot;Food Name&quot;,win32c.xlDescending))
    wb.Sheets(&quot;Unit Sales by Food&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;

    if int(float(excel.Version)) &gt;= 12:
        wb.SaveAs('newABCDCatering.xlsx',win32c.xlOpenXMLWorkbook)
    else:
        wb.SaveAs('newABCDCatering.xls')
    excel.Application.Quit()

if __name__ == &quot;__main__&quot;:
    runexcel()</code></pre></noscript>
<p><strong>Prerequisites</strong></p>
<p>Python (refer to <a href="http://www.python.org/">http://www.python.org</a>)</p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong></p>
<p>Source for the program erpdatapivot.py and input spreadsheet file ABCDCatering.xls are available at <a href="http://github.com/pythonexcels/examples">http://github.com/pythonexcels/examples</a></p>
<p>Thanks — Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://pythonexcels.com/automating-pivot-tables-with-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
