Automating Pivot Tables with Python

In the last post I explained the basic concept behind Pivot Tables and provided some examples. Pivot tables are an easy-to-use tool to derive some basic business intelligence from your data. As discussed last time, there are occasions when you’ll need to do interactive data mining by changing column and row fields. But in my experience, it’s handy to have my favorite reports built automatically, with the reports ready to go as soon as I open the spreadsheet. In this post I’ll develop and explain the code to create a set of pivot tables automatically in worksheet.

The goal of this exercise is to automate the generation of pivot tables from the last post, and save them to a new Excel file.

Pivot Tables

I started with the file newABCDCatering.xls from the previous post and record the macro to create this simple pivot table showing Net Bookings by Sales Rep and Food Name for the last four quarters.

Net Bookings

Captured in Excel 2007, the recorded macro looks like this:

The post Mapping Excel VB Macros to Python covered a technique for recording a Visual Basic macro and porting it to Python. Using that approach, you could simply turn on the macro recorder and generate all the required tables, producing a long script with lots of redundancy. A better approach is to build a general purpose function that can be used over and over to generate the pivot tables.

Looking at the macro, you see lines specifying the Orientation of the field name, such as .Orientation = xlRowField and .Orientation = xlColumnField. A pivot table has four basic areas for fields:

  • Report Filter (.Orientation = xlPageField)
  • Column area (.Orientation = xlColumnField)
  • Row area (.Orientation = xlRowField)
  • Values area (PivotTables().AddDataField())

Each of these supports multiple fields (column fields for Sales Rep Name and Food Name were added in the example). The ordering of the fields changes the appearance of the table.

A general pattern should be apparent in this macro. First, the pivot table is created with the ActiveWorkbook.PivotCaches.Create() statement. Next, the columns and rows are configured with a series of ActiveSheet.PivotTables("PivotTable1").PivotFields() statements. Finally, the field used in the Values section of the table is configured using the ActiveSheet.PivotTables("PivotTable1").AddDataField statement. The general purpose function will need to contain all of these constructs. Note the parts that can’t be hard-coded: the source of the data, "Sheet2!R1C1:R791C13", and destination for the table, "Sheet3!R3C1" need to be determined based on the characteristics of the source data and can’t be hard coded in the general solution.

In Python, this pattern can be reduced to the following loop that covers fields for the Report Filter, Columns and Rows:

def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=""):
    """Build a pivot table using the provided source location data
    and specified fields
    """
    ...
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
        wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1
    ...

Processing the Values field is more or less copied from the Visual Basic. To keep things simple in this example, this code is limited to adding “Sum of” values only, and doesn’t handle other Summarize Value functions such as Count, Min, Max, etc.

wb.ActiveSheet.PivotTables(tname).AddDataField(
    wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
    sumvalue,
    win32c.xlSum)

The actual values for filters, columns and rows in the function are defined in the call to the function. The complete function creates a new sheet within the workbook, then adds an empty pivot table to the sheet and builds the table using the field information provided. For example, to answer the question: What were the total sales in each of the last four quarters?, the pivot table is built with the following call to the addpivot function:

# What were the total sales in each of the last four quarters?
addpivot(wb,src,
         title="Sales by Quarter",
         filters=(),
         columns=(),
         rows=("Fiscal Quarter",),
         sumvalue="Sum of Net Booking",
         sortfield=())

which defines a pivot table using the row header “Fiscal Quarter” and data value “Sum of Net Booking”. The title “Sales by Quarter” is used to name the sheet itself.

To make the output spreadsheet more understandable, the title parameter passed into the function and used as a title in each worksheet and as the tab name.

Title Tabs

The complete script is shown below. Caveats:

  • This script has been modified to run on both Excel 2007 and Excel 2003 and has been tested on those versions.
  • Adding pivot tables increases the size of the output Excel file, which can be mitigated by disabling caching of pivot table data. Line 48 of the script contains the command newsheet.PivotTables(tname).SaveData = False, which has been commented out. Uncommenting this command will reduce the size of the output Excel file, but will require that the pivot table be refreshed before use by clicking on Refresh Data on the PivotTable toolbar.

Prerequisites

Python (refer to http://www.python.org)

Microsoft Excel (refer to http://office.microsoft.com/excel)

Source Files and Scripts

Source for the program erpdatapivot.py and input spreadsheet file ABCDCatering.xls are available at http://github.com/pythonexcels/examples

Thanks — Dan