About dan

Just another Python hacker

Cleaning Up Corporate ERP Data

The previous posts have used Excel and Python to create and manipulate small spreadsheets. In reality, Python and Excel are especially well suited to tackling large data sets. This post will illustrate some techniques for cleaning up data downloaded from corporate ERP systems such as SAP and Oracle, and getting it ready for some serious data mining with Excel.

In this example, a fictional company called ABCD Catering has recorded sales and order history for 2009 in their corporate ERP system. ABCD Catering provides catering services to leading Silicon Valley companies, providing the best in hamburgers, hot dogs, churros, sodas and other comfort food. Your boss has asked you to examine this data and answer some questions and produce charts representing some of the data:

  • What were the total sales in each of the last four quarters?
  • What are the sales for each food item in each quarter?
  • Who were the top 10 customers for ABCD catering in Q1?
  • Who was the highest producing sales rep for the year?
  • What food item had the highest unit sales in Q4?

Generating this information typically involves running five separate reports in the system. Since your boss is looking for this same information at the end of each quarter, you want to simplify your life and your bosses by automating the report. Using Python and Excel, you can download a spreadsheet copy of the raw data, process it, generate the key figures and charts and save them to a spreadsheet.

Take a look at the data in ABCDCatering.xls:

ABCDCatering.xls

The spreadsheet contains some header information, then a large table of records for each order. Each record contains the fiscal year and quarter, food item, company name, order data, sales representative, booking and order quantity for each order. The data needs some work before you can use it in a pivot table. First, the data in rows 1 through 11 must be ignored, it’s meaningless for the pivot table. Also, some columns do not have a proper header and must be corrected before the data can be used. The good news is that after some minor massaging, this data will be ideally suited for processing with a pivot table in Excel. Close the spreadsheet and get ready to build the reports.

The program begins with the standard boilerplate: import the win32 module and start Excel. If you have questions on this, please refer to Basic Excel Driving with Python and Python Excel Mini Cookbook.

#
# erpdata.py: Load raw EPR data and clean up header info
#
import win32com.client as win32
import sys
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True

Next, open the spreadsheet ABCDCatering.xls with some exception handling. The try/exceptclause attempts to open the file with the Workbooks.Open() method, and exits gracefully if the file is missing or some other problem occurred. Lastly, the variable ws is set to the spreadsheet containing the data.

try:
    wb = excel.Workbooks.Open('ABCDCatering.xls')
except:
    print "Failed to open spreadsheet ABCDCatering.xls"
    sys.exit(1)
ws = wb.Sheets('Sheet1')

An easy way to load the entire spreadsheet into Python is the UsedRange method. The following command:

xldata = ws.UsedRange.Value

grabs all the data in the Sheet1 worksheet and copies it into a tuple named xldata. Once inside Python, the data can be manipulated and placed back into the spreadsheet with minimal calls to the COM interface, resulting in faster, more efficient processing.

To delete rows, add columns and do other operations on the data, it must be converted to or copied to a list. The approach used here is to examine the data row by row, discarding the non essential header rows and copying everything else to a new list. The first step is to remove the rows that are not part of the column header row or record data. If you are using Python to generate the program interactively, you can investigate the data in the xldata tuple and display the data for the first record (xldata[0]) and header record (xldata[11]):

Data Records

The length of both rows is 13, though xldata[0] contains many elements with a value of None. The following code checks the length of the data and skips any rows shorter then 13 fields or rows that contain None in the last field. Note that this code assumes that the actual data in the table always contains complete records, true in this dataset but you should always understand the characteristics of the data you’re working on.

newdata = []
for row in xldata:
    if row[-1] is not None and len(row) == 13:
        newdata.append(row)

The newdata list now contains the header and data rows from the spreadsheet, but the header row is still not complete. All column headers must contain text in order to use this data in a pivot table. Unfortunately, the spreadsheet downloads produced by the ERP system have the column label over the numberical identifier for the item, while the text column header is blank. You can see that for the “Food” and “Company” data below.

Food and Company Columns

One approach that works for this data is to scan the header and insert a column header based on the contents of the previous column. For example, the label for column F could be “Company Name”, created by simply appending the text ” Name” to the column header “Company” from the prior column. Using this simple algorithm, the column header row can be filled out and the spreadsheet made ready for pivot table conversion. A more complex lookup could be used as well, but the simple algorithm described here will scale if new fields are added to the report.

for i,field in enumerate(newdata[0]):
  if field is None:
    newdata[0][i] = lasthdr + " Name"
  else:
    lasthdr = newdata[0][i]

Now the data is ready for insertion back into the spreadsheet. To enable comparison between the new data set and the original, create a new sheet in the workbook, write the data to the new sheet and autofit the columns.

wsnew = wb.Sheets.Add()
wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(len(newdata),len(newdata[0]))).Value = newdata
wsnew.Columns.AutoFit()

The last step is to save the worksheet to a new file and quit Excel. The Excel version is checked in order to save the data in the correct spreadsheet format. Version 12 corresponds to Excel 2007, which uses the .xlsx file extension. You also have to specify the constant xlOpenXMLWorkbook to define the type of output Excel file. Earlier version of Excel use the .xlsextension, and because the input file was .xls format, no output format specifier is needed for users of older versions of Excel.

if int(float(excel.Version)) >= 12:
    wb.SaveAs('newABCDCatering.xlsx',win32.constants.xlOpenXMLWorkbook)
else:
    wb.SaveAs('newABCDCatering.xls')
excel.Application.Quit()

If the file newABCDCatering.xlsx or newABCDCatering.xls already exists in My Documents, you will see the following popup when you run the script.

Error Message

Click “Yes” to overwrite the spreadsheet file. To run the script cleanly, erase the file newABCDCatering.xlsx or newABCDCatering.xls and try the script again.

After running the script, open the file newABCDCatering.xlsx or newABCDCatering.xls and view the contents. Note that the extraneous header information has been removed and blank column header information has been inserted programmatically as described earlier.

New Spreadsheet

The new spreadsheet is ready for use in a pivot table, which will be covered in the next post. Here is the complete script, also available at github.

Prerequisites

Python (refer to http://www.python.org)

Win32 Python module (refer to http://sourceforge.net/projects/pywin32)

Microsoft Excel (refer to http://office.microsoft.com/excel)

Source Files and Scripts

Source for the program erpdata.py and spreadsheet file ABCDCatering.xls are available athttp://github.com/pythonexcels/examples

Thanks — Dan

Mapping Excel VB Macros to Python Revisited

The last post introduced a technique for recording a Visual Basic macro within Excel and migrating it to Python. This exercise will build on those techniques while leveraging Python for more of the work.

This example creates two tables from scratch – a simple multiplication table and a table of random numbers – and applies conditional formatting to the numbers using some of the new features in Excel 2007 (unfortunately this exercise won’t be compatible with older versions of Excel). Begin by starting the Python IDLE interface. Next, start Excel as you’ve done in the previous exercises. For this exercise, add a workbook using the Workbooks.Add() construct, and set the ws variable to point to the first worksheet in the workbook.

_images/20091019_startexcelidle.png

After typing these command in IDLE, you’ll see the Excel window that contains an empty spreadsheet. To build the multiplication table, use Python to populate the column and row headers. There are a number of ways to do this, for this exercise you’ll pass an list of column header and row header values to Excel. A row of data is defined by using the ws.Range().Valuestatement with a list or tuple on right hand side of the equals sign. Rather than explicitly defining the list as [1,2,3,4,5,6,7,8,9,10], you can use a functional programming statement containing a range() statement to populate the values: [i for i in range(1,11)]. The complete statement is ws.Range("B2:K2").Value = [i for i in range(1,11)]. Defining a single column of data is a bit trickier, you must define a list of single element lists or tuples. One way to do this is to use Python’s zip() function to transpose the flat list into a list of tuples. The complete statement is ws.Range("B2:B11").Value = zip([i for i in range(1,11)]). The statements for completing the column and row headers are shown below.

_images/20091019_headers.png

At this point the column and row headers will appear in the Excel spreadsheet.

_images/20091019_headersexcel1.png

To define the product values for each cell in the table, create a formula to multiply the column and row header for a single cell, then used Excel to autofill the remaining cells. Looking at the spreadsheet, the product for cell C3 is cell B3 multiplied by cell C2, or 2 times 2 which equals 4. In terms of Excel, the formula is =B3*C2. To use Excel’s autofill capability, you need to anchor the row and column in the formula by preceding it with the $ character. In other words, the formula you want to use is =$B3*C$2. Once that formula is entered, the expansion to fill the remaining cells is done in two steps. First, programmatically select the cell and drag it fill the row. Next, select the newly autofilled row and drag the new row down to fill in the remaining rows. Since this was demonstrated in the last post, please refer to that post if you need more information. The equivalent Python code to implement the autofill is shown below.

_images/20091019_autofill.png

The spreadsheet will now contain the complete multiplication table.

_images/20091019_autofillexcel.png

To help illustrate conditional formatting, create another table of random integers between 1 and 100. Excel’s RAND() function will generate a random number between 0 and 1, the formula we want is =INT(RAND()*100). The ws.Range().Formula construct can be used to fill a range with the same identical formula.

The Excel spreadsheet should now contain both the multiplication and random number tables.

Now that the data is ready, conditional formatting can be applied. Even though you invoked Excel from Python, you still can manipulate the spreadsheet using the Excel interface, and even record macros. You need to record a macro like you did in the last post in order to capture the VB commands, so click on Record Macro in the Developer tab, then click OK in the popup dialog.

Select all the cells in the range B2:K22. In the Home tab, select Conditional Formatting->Color Scales->Red-Yellow-Blue Color Scale.

The spreadsheet should now show a color background for each of the selected cells containing a value. Now select cell A1, then stop the macro by clicking Stop Recording in the Developer tab.

Your spreadsheet will now have conditional formatting applied and will look something like this, with cells containing numbers near 100 colored in Red, cells with a value of 50 in Yellow, and cells with a value of 1 in Blue, with a shade of these colors for values in between:

As an aside, you can update the random numbers in the lower table by hitting the F9 key to force a spreadsheet recalculation.

To continue, open the macro just created by selecting Macros from the Developer tab, select the name of the macro you just captured and click Edit. The macro should look something like this:

Though the macro contains some very long method names, plus some With statements, the porting will be very straightforward. Here are some guidelines to keep in mind while migrating this code to Python.

  • Selection is preceded by excel.

Remember that Selection is a method at the Excel Application level, you need to precede it with excel. in this example.

  • Range is preceded by ws.

Range is a method at the Worksheet level, which is defined earlier as ws. in this example.

  • Function calls require () in Python

Unlike VB, any function calls must by followed by () in Python.

  • With statements must be expanded

The three With blocks in this macro need to be expanded, which can be done with temporary variables or by copying the statement following the With keyword. For example, the first Withblock:

With Selection.FormatConditions(1).ColorScaleCriteria(1).FormatColor
    .Color = 13011546
    .TintAndShade = 0
End With

can be written in Python as

excel.Selection.FormatConditions(1).ColorScaleCriteria(1).FormatColor.Color = 13011546
excel.Selection.FormatConditions(1).ColorScaleCriteria(1).FormatColor.TintAndShade = 0

or by using a temporary variable as

x = excel.Selection.FormatConditions(1).ColorScaleCriteria(1).FormatColor
x.Color = 13011546
x.FormatColor.TintAndShade = 0

Temporary variables were created to make the script more concise. In particular, the statement [csc1,csc2,csc3] = [excel.Selection.FormatConditions(1).ColorScaleCriteria(n) for n in range(1,4)] was used to create three temporary variables for the three ColorScaleCriteria methods. The equivalent Python text representing the macro is shown here:

To save the spreadsheet and close Excel, use the SaveAs and Quit methods as shown below.

Here is the
complete conditionalformatting.py script
. The line excel.Visible = True has been commented out. Unless you are developing the script, you typically want Excel to run invisibly in the background.

Prerequisites

Python (refer to http://www.python.org)

Win32 Python module (refer to http://sourceforge.net/projects/pywin32)

Microsoft Excel (refer to http://office.microsoft.com/excel)

Source Files and Scripts

Source for the program conditionalformatting.py script is available at http://github.com/pythonexcels/examples

That’s all for now, thanks — Dan

Mapping Excel VB Macros to Python

A handy feature in Excel is the ability to quickly record a Visual Basic (VB) macro and save it. It’s also fairly simple to take a captured VB macro, tweak it slightly and use it in your Python scripts. I’ve used this capability dozens of times over the years to capture a sequence of operations that modify a spreadsheet and build a pivot table or chart, then integrate the macro into a Python script. It wasn’t always apparent to me how I could take the macro and use it within Python or other scripting languages beside Visual Basic, but I now have a pretty good grasp and want to share the technique with you.

This post illustrates how to capture a simple set of operations in a macro, examine the Visual Basic macro, port it to Python and run it. I’m using the MultiplicationTable.xlsx file as a starting point, it’s a simple 10×10 multiplication table that will be expanded and reformatted. The first step is to capture the macro in Excel using Record Macro. In Excel 2007 the Developer tab that contains the Record Macro button is turned off by default, you will need to enable it by selecting “Excel Options” from the ribbon menu, then select “Popular” in the left hand column and select the “Show Developer tab in the Ribbon” checkbox as shown here.

20091012_exceloptions

Starting with a simple spreadsheet containing a table of data, click on the “Developer” tab, then “Record Macro”.

20091012_recordmacro

If you’re using an older version of Excel, select Tools->Macro->Record New Macro from the menu as shown here.

20091012_recordmacroexcel2003

The goal is to expand the existing table to a 15×15 table, adjust the column width to make the table appear more square and save the new spreadsheet. Now that the macro is recording, the first step is to select the last row of data and expanding it by dragging it down an additional 5 rows. First, select the data:

20091012_selectrow

then dragged to create 5 new rows of data.

20091012_dragrow

Do the same select and drag operation for the last column of data to create 5 new columns.

20091012_selectcolumn

20091012_dragcolumn

Now you have a 15×15 multiplication table. To resize the columns, select the headers for columns B through P, click the right mouse and select “Column Width”.

20091012_columnwidth

Enter “4” as the new column width and click OK. The spreadsheet will now look like this:

20091012_resizecolumns

Now stop capturing the macro by clicking on Stop Recording

20091012_stoprecording

If you’re using an older version of Excel, select Tools->Macro->Stop Recording from the menu bar.

To view the macro, click on the View Macros button

20091012_viewmacros

For older versions of Excel, select Tools->Macro->Macros

Select the macro you just recorded (this should be Macro1, but if you were experimenting you may have other macros, so select the highest numbered macro) and click Edit.

20091012_editmacro

This will open your macro in the Microsoft Visual Basic GUI, and it should look something like this

Sub Macro1()
'
' Macro1 Macro
'

'
    Range("B11:K11").Select
    Selection.AutoFill Destination:=Range("B11:K16"), Type:=xlFillDefault
    Range("B11:K16").Select
    Range("K2:K16").Select
    Selection.AutoFill Destination:=Range("K2:P16"), Type:=xlFillDefault
    Range("K2:P16").Select
    Columns("B:P").Select
    Selection.ColumnWidth = 4
End Sub

Don’t worry if there are some extra or redundant lines in your macro, they can be removed as the script is ported. Now we’re ready to fire up Python and integrate this macro into a script.

Porting

To get started, start the Python Integrated Development Environment (IDLE), then open the spreadsheet with the 10×10 multiplication table by entering the following four commands (make sure the file “MultiplicationTable.xlsx” is in your My Documents folder.

20091012_introidle

Your screen should now look like this:

20091012_introfullscreen

These are boilerplate commands you’ll be using in each exercise to invoke and interface to Excel. The first two commands, import win32com.client as win32, and excel =win32.gencache.EnsureDispatch( 'Excel.Application'), import the win32 module and open the Excel process. The command wb = excel.Workbooks.Open('MultiplicationTable.xlsx') opens the worksheet. In general, you’ll need a excel.Workbooks.Open() or excel.Workbooks.Add() command to open an existing workbook or create a new workbook. The command excel.Visible = Truemakes Excel visible on the screen, rather than running as a hidden process in the background.

Looking at the Macro1 macro, the first command is Range("B11:K11").Select. The Rangevariable name is within the context of the Worksheet, so you need to create a container for operations on the worksheet. The command ws = wb.Worksheets('Sheet1') will do the trick.

20091012_wsworksheet

If you noticed, I made a typo when entering the command and typed Worksheet instead of Worksheets. Don’t panic if you make a mistake as I did, in most cases you can simply retype the correct command and continue on.

Once the variable pointing to the worksheet is defined, append the macro command to ws. and try it. Note that Select is a function and requires the open and close parenthesis pair in order to operate correctly. This pattern may be used for every Range().Select line in the macro.

20091012_wsb11k11select

If you bring the worksheet to the foreground, you’ll see that the range B11:K11 has been selected. The next task is to autofill the 5 rows below using the Selection.AutoFillDestination:=Range("B11:K16"),  Type:=xlFillDefault construct. Selection is a method at the Excel Application level, you need to precede it with excel. in this example. The arguments Destination:=Range("B11:K16"),  Type:=xlFillDefault must be provided to the function, either using the keyword arguments Destination and Type, or by using positional notation. To make your programs as robust as possible, you should include the keywords, but it’s not strictly required and I don’t use that pattern in this example.

The definition for the constant xlFillDefault is contained in win32.constants, you can access this value by specifying win32.constants.xlFillDefault. I’ve seen many examples where the developer replaces this with the actual value (0 in this case). My preference is to avoid replacing Excel variables with numbers in my scripts, I believe that including the variable names increases the clarity of the script. My preference is to use the fully specified name wherever possible, but if you have to replace the variable with the actual value, you can always use the Object Browser in the VB window to figure out the correct value (open the Object Browser by pressing F2, or by selecting View->Object Browser from the menu in the VB window.

20091012_vbobjectbrowser

Combining these translations, the full Python command is excel.Selection.AutoFill(Destination=ws.Range("B11:K16"), Type=win32.constants.xlFillDefault ), or excel.Selection.AutoFill( ws.Range("B11:K16"), win32.constants.xlFillDefault) as I’ve used in the example.

20091012_idlefillrow

Occasionally you’ll make a mistake when capturing a macro and record extraneous, unnecessary commands. The command Range("B11:K16").Select isn’t needed and can be ignored. The next two macro commands, Range("K2:K16").Select and Selection.AutoFillDestination:=Range("K2:P16"),  Type:=xlFillDefault, are translated in the same way as the Select and AutoFill commands discussed earlier.

The commands Range("K2:K16").Select and Selection.AutoFill Destination:=Range("K2:P16"), Type:=xlFillDefault are translated in the same fashion as the earlier Select and AutoFillcommands as shown below.

20091012_idlefillcolumn

The worksheet is now expanded to the full 15×15 table and looks like this:

20091012_worksheetfilled

The next section of the macro selects columns B through P and sets their width to 4. The statement Columns("B:P").Select is a property of the worksheet, so prefix it with the ws.identifier and add the parenthesis to make it a Python function call. In the next statement,Selection is a property of excel, so prefix it as such. The translated statements are shown below.

20091012_idlecolumnwidth

The Excel spreadsheet is now complete, the multiplication table has been expanded to 15×15 and the columns have been resized to 4. At this point, translation of the macro is complete, but the modified file has to be saved. To write the file and quit Excel, use the SaveAs and Quit methods as shown below.

20091012_idlesavequit

For your reference, here is the complete Python script.

#
# make15x15.py
# Expand an existing 10x10 multiplication table and resize columns
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open('MultiplicationTable.xlsx')
excel.Visible = True
ws = wb.Worksheets('Sheet1')
ws.Range("B11:K11").Select()
excel.Selection.AutoFill(ws.Range("B11:K16"),win32.constants.xlFillDefault)
ws.Range("K2:K16").Select()
excel.Selection.AutoFill(ws.Range("K2:P16"),win32.constants.xlFillDefault)
ws.Columns("B:P").Select()
excel.Selection.ColumnWidth = 4
wb.SaveAs('NewMultiplicationTable.xlsx')
excel.Application.Quit()

If this is the first time you’ve ported an Excel macro from VB to Python, congratulations! Please note that in this example, things are kept simple and there is absolutely no error checking or exception handling used here. Normally you would need to provide at least a minimal level of error checking and exception handling in your script so that common errors (missing input file, can’t invoke Excel, etc) are caught and handled nicely. Also, this example was developed using Excel 2007, but you can run this code verbatim on earlier versions of Excel if you change the .xlsx file extension to .xls throughout the script.

Some Porting Guidelines

  • Prefix the Range().Select statements with the variable name pointing to the worksheet (ws in this example)
  • Append () to any functions
  • Prefix the Selection statements with the variable name for the Excel spreadsheet (excel in this example)
  • Prefix the Columns statements with the variable name for the worksheet(ws in this example)

Porting Reference Table for this example

Note that I didn’t capture the Workbooks.Open() or Workbooks.SaveAs lines in the VB script, it’s left as an exercise for the reader to research those commands.

VB Python
import win32com.client as win32
excel = win32.gencache.EnsureDispatch(‘Excel.Application’)
wb = excel.Workbooks.Open(‘MultiplicationTable.xlsx’)
wb = excel.Workbooks.Open(‘MultiplicationTable.xlsx’)
excel.Visible = True
ws = wb.Worksheets(‘Sheet1’)
Range(“B11:K11”).Select ws.Range(“B11:K11”).Select()
Range(“B11:K11”).Select ws.Range(“B11:K11”).Select()
Selection.AutoFill Destination:=Range(“B11:K16”), Type:=xlFillDefault excel.Selection.AutoFill(ws.Range(“B11:K16”),win32.constants.xlFillDefault)
Range(“K2:K16”).Select ws.Range(“K2:K16”).Select()
Selection.AutoFill Destination:=Range(“K2:P16”), Type:=xlFillDefault excel.Selection.AutoFill(ws.Range(“K2:P16”),win32.constants.xlFillDefault)
Range(“K2:P16”).Select ws.Columns(“B:P”).Select()
Columns(“B:P”).Select ws.Columns(“B:P”).Select()
Selection.ColumnWidth = 4 excel.Selection.ColumnWidth = 4
excel.Application.Quit()

Prerequisites

Python (refer to http://www.python.org)

Win32 Python module (refer to http://sourceforge.net/projects/pywin32)

Microsoft Excel (refer to http://office.microsoft.com/excel)

Source Files and Scripts

Source for the program make15x15.py and data text file are available at http://github.com/pythonexcels/examples

That’s all for now, thanks — Dan

Python Excel Mini Cookbook

To get you started, I’ve illustrated a number of common tasks you can do with Python and Excel. Each program below is a self contained example, just copy it, paste it and run it. A few things to note:

  • These examples were tested in Excel 2007, they should work fine in earlier versions as well after changing the extension of the file within the wb.SaveAs() statement from .xlsx to .xls
  • If you’re new to this, I recommend typing these examples by hand into IDLE, IPython or the Python interpreter, then watching the effect in Excel as you enter the commands. To make Excel visible add the line excel.Visible = True after the excel =win32.gencache.EnsureDispatch('Excel.Application') line in the script
  • These are simple examples with no error checking. Make sure the output files doesn’t exist before running the script. If the script crashes, it may leave a copy of Excel running in the background. Open the Windows Task Manager and kill the background Excel process to recover.
  • These examples contain no optimization. You typically wouldn’t use a for loop to iterate through data in individual cells, it’s provided here for illustration only.

Open Excel, Add a Workbook

The following script simply invokes Excel, adds a workbook and saves the empty workbook.

#
# Add a workbook and save (Excel 2007)
# For older versions of excel, use the .xls file extension
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
wb.SaveAs('add_a_workbook.xlsx')
excel.Application.Quit()

Open an Existing Workbook

This script opens an existing workbook and displays it (note the statement excel.Visible =True). The file workbook1.xlsx must already exist in your “My Documents” directory. You can also open spreadsheet files by specifying the full path to the file as shown below. Using r'in the statement r'C:\myfiles\excel\workbook2.xlsx' automatically escapes the backslash characters and makes the file name a bit more concise.

#
# Open an existing workbook
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open('workbook1.xlsx')
# Alternately, specify the full path to the workbook 
# wb = excel.Workbooks.Open(r'C:\myfiles\excel\workbook2.xlsx')
excel.Visible = True

Add a Worksheet

This script creates a new workbook with three sheets, adds a fourth worksheet and names it MyNewSheet.

#
# Add a workbook, add a worksheet,
# name it 'MyNewSheet' and save
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets.Add()
ws.Name = "MyNewSheet"
wb.SaveAs('add_a_worksheet.xlsx')
excel.Application.Quit()

Ranges and Offsets

This script illustrates different techniques for addressing cells by using the Cells() and Range()operators. Individual cells can be addressed using Cells(row,column), where row is the row number, column is the column number, both start from 1. Groups of cells can be addressed using Range(), where the argument in the parenthesis can be a single cell denoted by its textual name (eg "A2"), a group noted by a textual name with a colon (eg "A3:B4") or a group denoted with two Cells() identifiers (eg ws.Cells(1,1),ws.Cells(2,2)). The Offsetmethod provides a way to address a cell based on a reference to another cell.

#
# Using ranges and offsets
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Cells(1,1).Value = "Cell A1"
ws.Cells(1,1).Offset(2,4).Value = "Cell D2"
ws.Range("A2").Value = "Cell A2"
ws.Range("A3:B4").Value = "A3:B4"
ws.Range("A6:B7,A9:B10").Value = "A6:B7,A9:B10"
wb.SaveAs('ranges_and_offsets.xlsx')
excel.Application.Quit()

Autofill Cell Contents

This script uses Excel’s autofill capability to examine data in cells A1 and A2, then autofill the remaining column of cells through A10.

#
# Autofill cell contents
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1").Value = 1
ws.Range("A2").Value = 2
ws.Range("A1:A2").AutoFill(ws.Range("A1:A10"),win32.constants.xlFillDefault)
wb.SaveAs('autofill_cells.xlsx')
excel.Application.Quit()

Cell Color

This script illustrates adding an interior color to the cell using Interior.ColorIndex. Column A, rows 1 through 20 are filled with a number and assigned that ColorIndex.

#
# Add an interior color to cells
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
for i in range (1,21):
    ws.Cells(i,1).Value = i
    ws.Cells(i,1).Interior.ColorIndex = i
wb.SaveAs('cell_color.xlsx')
excel.Application.Quit()

Column Formatting

This script creates two columns of data, one narrow and one wide, then formats the column width with the ColumnWidth property. You can also use the Columns.AutoFit() function to autofit all columns in the spreadsheet.

#
# Set column widths
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1:A10").Value = "A"
ws.Range("B1:B10").Value = "This is a very long line of text"
ws.Columns(1).ColumnWidth = 1
ws.Range("B:B").ColumnWidth = 27
# Alternately, you can autofit all columns in the worksheet
# ws.Columns.AutoFit()
wb.SaveAs('column_widths.xlsx')
excel.Application.Quit()

Copying Data from Worksheet to Worksheet

This script uses the FillAcrossSheets() method to copy data from one location to all other worksheets in the workbook. Specifically, the data in the range A1:J10 is copied from Sheet1 to sheets Sheet2 and Sheet3.

#
# Copy data and formatting from a range of one worksheet
# to all other worksheets in a workbook
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1:J10").Formula = "=row()*column()"
wb.Worksheets.FillAcrossSheets(wb.Worksheets("Sheet1").Range("A1:J10"))
wb.SaveAs('copy_worksheet_to_worksheet.xlsx')
excel.Application.Quit()

Format Worksheet Cells

This script creates two columns of data, then formats the font type and font size used in the worksheet. Five different fonts and sizes are used, the numbers are formatted using a monetary format.

#
# Format cell font name and size, format numbers in monetary format
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")

for i,font in enumerate(["Arial","Courier New","Garamond","Georgia","Verdana"]):
    ws.Range(ws.Cells(i+1,1),ws.Cells(i+1,2)).Value = [font,i+i]
    ws.Range(ws.Cells(i+1,1),ws.Cells(i+1,2)).Font.Name = font
    ws.Range(ws.Cells(i+1,1),ws.Cells(i+1,2)).Font.Size = 12+i

ws.Range("A1:A5").HorizontalAlignment = win32.constants.xlRight
ws.Range("B1:B5").NumberFormat = "$###,##0.00"
ws.Columns.AutoFit()
wb.SaveAs('format_cells.xlsx')
excel.Application.Quit()

Setting Row Height

This script illustrates row height. Similar to column height, row height can be set with the RowHeight method. You can also useAutoFit() to automatically adjust the row height based on cell contents.

#
# Set row heights and align text within the cell
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1:A2").Value = "1 line"
ws.Range("B1:B2").Value = "Two\nlines"
ws.Range("C1:C2").Value = "Three\nlines\nhere"
ws.Range("D1:D2").Value = "This\nis\nfour\nlines"
ws.Rows(1).RowHeight = 60
ws.Range("2:2").RowHeight = 120
ws.Rows(1).VerticalAlignment = win32.constants.xlCenter
ws.Range("2:2").VerticalAlignment = win32.constants.xlCenter

# Alternately, you can autofit all rows in the worksheet
# ws.Rows.AutoFit()

wb.SaveAs('row_height.xlsx')
excel.Application.Quit()

Prerequisites

Python (refer to http://www.python.org)

Win32 Python module (refer to http://sourceforge.net/projects/pywin32)

Microsoft Excel (refer to http://office.microsoft.com/excel)

Source Files and Scripts

Source for the program and data text file are available athttp://github.com/pythonexcels/examples

That’s all for now, thanks — Dan

Basic Excel Driving with Python

Now it’s getting interesting. Reading and writing spreadsheets with XLRD and XLWT is sufficient for many tasks, and you don’t even need a copy of Excel to do it. But to really open up your data and fully wring all the information possible from it, you’ll need Excel and its powerful set of functions, pivot tables and charting.

For starters, let’s do some simple operations using Python to invoke Excel, add a spreadsheet, insert some data, then save the results to a spreadsheet file. You can play along at home by following my lead and entering the program text exactly as I’ve described below. My exercises and screen shots are done with Excel 2007, but all the commands presented here also work fine for Excel 2003. A prerequisite for this exercise is Python, the Win32 module and a copy of Microsoft Excel.

Here is the complete script we’ll be entering using IDLE, the Python interactive development tool.

#
# driving.py
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True
wb = excel.Workbooks.Add()
ws = wb.Worksheets('Sheet1')
ws.Name = 'Built with Python'
ws.Cells(1,1).Value = 'Hello Excel'
print ws.Cells(1,1).Value
for i in range(1,5):
    ws.Cells(2,i).Value = i  # Don't do this
ws.Range(ws.Cells(3,1),ws.Cells(3,4)).Value = [5,6,7,8]
ws.Range("A4:D4").Value = [i for i in range(9,13)]
ws.Cells(5,4).Formula = '=SUM(A2:D4)'
ws.Cells(5,4).Font.Size = 16
ws.Cells(5,4).Font.Bold = True

What follows is a step-by-step guide to entering this script and monitoring the result.

  1. Open the Python IDLE interface from the Start menu

IDLE Startup

IDLE is the Python IDE built with the tkinter GUI toolkit, as quoted from the Python IDLE documentation, and gives you an interactive interface to enter, run and save Python programs. IDLE isn’t strictly necessary for this exercise, you could use any shell command window, or a tool such as IPython or the MS Windows command line interface.

  1. Import the win32 module

importwin32

If the import command was successful, you’ll see the “>>>” prompt returned. If there was a problem, such as not having the win32 module installed correctly, you’ll see Import Error: Nomodule named win32com.client. In that case, install the appropriate win32 module from the web site.

  1. Start Excel

The command win32.gencache.EnsureDispatch('Excel.Application') attaches to an Excel process that is already running, or starts Excel if it’s not. If you see the “>>>” prompt, Excel has been started or linked successfully. At this point you won’t see Excel, but if you check your task manager you can confirm that the process is running.

  1. Make Excel Visible

Setting the Visible flag with excel.Visible = True makes the Excel window appear. At this point, Excel does not contain any workbooks or worksheets, we’ll add those in the next step.

  1. Add a workbook, select the sheet “Sheet1” and rename it

idlewbws

Excel needs a workbook to serve as a container for the worksheets. A new workbook containing 3 sheets is added with command wb = excel.Workbooks.Add(). The command ws =wb.Worksheets('Sheet1') assigns ws to the sheet named Sheet1, and the command ws.Name ='Built with Python' changes the name of Sheet1 to “Built with Python”. Your screen should now look something like this:

excelblank

  1. Add some text into the first cell

idlehello

Now the setup is complete and you can add data to the spreadsheet. There are several options for addressing cells and blocks of data in Excel, I’ll cover a few of them here. You can address individual cells with the Cells(row,column).Value pattern, where row and column are integer values representing the row and column location for the cell. Note that row and column counts begin from one, not zero. Use .Value to add text, numbers and date information to the cell and use .Formula for entering an Excel formula into the cell location.

After typing these commands, you’ll see the “Hello Excel” text in your Excel worksheet, and see the text printed in the IDLE window as well. Of course, Python can set values in the spreadsheet as well as query data from the spreadsheet.

excelhello

  1. Populate the second row with data by using a for loop

idle_for_loop

In many cases you’ll have lists of data to insert into or extract from the worksheet. Wrapping the Cells(row,column).Value pattern with a loop seems like a natural approach, but in reality this maximizes the communication overhead between Python and Excel and results in very inefficient and slow code. It’s much better to transfer lists than individual elements whenever possible as shown in the next section. After this command, your Excel spreadsheet will look like this:

For loop results in excel

  1. Populate the third and fourth rows of data

Range insertion

A better approach to populating or extracting blocks of data is to use the Range().Value  pattern. With this construct you can efficiently transfer a one- or two-dimensional blocks of data. In the first example, cells (3,1) through (3,4) are assigned to the list [5,6,7,8]. The next line uses the Excel-style cell address “A4:D4” to assign the results of the operation [ifor i in range(9,13)]. In some cases, it may be more intuitive to use the Excel-style naming. The Excel sheet now looks like this:

Excel data insertion

  1. Assign a formula to sum the numbers just added

Formula insertion

You can insert Excel formulas into cells using the .Formula pattern. The formula is the same as if you were to enter it in Excel: =SUM(A2:D4). In this example, the sum of 12 numbers in rows 2,3 and 4 is generated. Your Excel sheet should now look like the screenshot below.

Excel Formula

  1. Change the formatting of the formula cell **

Formatting data

As a final exercise, the format of the formula cell is changed to point size 16 with a bold typeface. You can change any of dozens of attributes for the various cells in the worksheet through Python. Your spreadsheet should now look like this.

Formatting result in Excel

Hopefully you did this exercise interactively, typing the commands and monitoring the result in Excel. You can also cut to the chase and run this script to generate the result. When the script exits, you’ll be left with an open Excel spreadsheet just as shown in the last screenshot above.

Prerequisites

Source Files and Scripts

Source for the program and data text file are available at http://github.com/pythonexcels/examples/tree/master

References

Core Python Programming

Wesley Chun’s book has a chapter on Programming Microsoft Office with Win32 COM

http://groups.google.com/group/python-excel

Though this group mainly covers questions on the excellent XLRD, XLWT and XLUTILS modules, there is also some discussion on interfacing to Excel using Win32 COM

Stack Overflow

Stack Overflow is a great resource for getting questions answered on a variety of programming topics, including Python

Thanks everyone — Dan