How to read & write Excel using Python


Reading and writing Excel Files using Python

As a programmer or tester who work with python, we often need to get data from excel or we need to write data to excel in our code; there are many packages out there that help you with that exact task. However, the one we would recommend is openpyxl because it is simple, effective and it works with excel formulas too. This blog post contains information that one needs to know to work with openpyxl.

Installation

As usual we need to install the package before we use it in our code and the simplest way to install openpyxl is to use pip as shown below.

pip install openpyxl

Creating a workbook

A workbook is a openpyxl object that contains all the data in an excel file, you can create a new workbook from scratch (without any data) or you can create a workbook from an excel file that already exists.

from openpyxl import Workbook, load_workbook

#creating an empty workbook
new_workbook=Workbook()

#creating a workbook from file
wokbook_from_file=load_workbook(file_name)
# is the path and the name of the excel file as a string

If you create a empty workbook it will have one sheet with sheet name “Sheet”, you can access this sheet with active attribute of the workbook.

default_sheet=new_workbook.active

Creating sheets and sheet manipulation

To create a new sheet use the method create_sheet()

new_sheet=new_workbook.create_sheet(sheet_name)
# is the name of the sheet to be given to the given sheet

'''creating a sheet with  as sheet name that is the same as another sheet that already exists will create a sheet with  as sheet name, where the unique_number is sequentially generated number starting at 1'''

To copy the content of one sheet to a new sheet, ie to create a new sheet with content of an existing sheet use the copy_worksheet() method.

copy_sheet= new_workbook.copy_worksheet(new_sheet)

#to copy a sheet you must pass the source sheet object not just its name

The names of all the sheets in a workbook can be accessed with the attribute sheetnames

copy_sheet= new_workbook.copy_worksheet(new_sheet)

#to copy a sheet you must pass the source sheet object not just its name

The name of any give sheet can be accessed and changed using the title attribute of the sheet.

print(new_sheet.title)
new_sheet.title=new_title

Accessing sheet by sheet name

If you have a workbook that contains a sheet to which you only know the name of, the sheet can be accessed just like you access the value in a dict to which the key is known.

sheet= new_workbook[sheet_name]
# is the name of the sheet as a string.
# just like in a dict if a sheet with  as name is not present in the workbook a KeyError is #raised

Iterating through a workbook

If you wish to iterate through all the sheets in a workbook, you can do that by iterating the workbook as you would iterate a list.

for sheet in new_workbook:
	print(sheet.title)

Reading and writing to cell(s)

Now onto the important part, To access a cell and to write data to cell, use the slice operator as you would in a dict.

#assign value to a single cell
sheet['A1']=10
#notice the key to a cell is a string, that is the same as the name of the cell as it would appear in excel.

#get value from a single cell
print(sheet['A1'].value)
#notice you need to get the value attribute of a cell to get the data stored in that cell.

You can access value in many cells by using any one of the following methods which suits your need.

cell_range=sheet['A1':'D4']
#this will return all the cells as a tuple of rows in a tuple, to access the data from the cell inside this use.

for row in sheet['A1':'D4']:
	for cell in row:
		print(cell.value)
#this will print all the value from A1 to D4 in the order A1,A2,...A4,B1,...B4,C1,...C4,D1,...D4
#to access an entire column
row_a=sheet['A']
for cell in row_a:
	print(cell.value)

#similarly you can access an entire row
row_1=sheet[1]
for cell in row_1:
	print(cell.value)

#to access multiple columns
for col in sheet['A':'B']:
	for cell in col:
		print(cell.value)

#to access multiple rows
for row in sheet[1:5]:
	for cell in row:
		print(cell.value)

To access and set value to a cell use the meths cell().

#access single cell
cell_a1=sheet.cell(row_index,col_index)
#where row_index is int that indicates the nth row in which the cell is located
# and  is int that indicates the column in which the cell is located, 1 for A , 2 for B, …
#so for A1 cell row_index =1 and col_index=1
print(cell_a1.value)

#to set value to a cell
sheet.cell(row_index,col_index,value)
#where row_index is int that indicates the nth row in which the cell is located,
# col_index is int that indicates the column in which the cell is located, 1 for A , 2 for B, …
#and value is the value to be assigned to the cell

To iterate through rows or columns, iter_rows() and iter_cols() can be used respectively.

#iterate rows
for row in sheet.iter_rows(min_row=1,max_row=10,min_col=1,max_col=10):
	for cell in row:
		print(cell.value)
# min_ro, max_row, min_col and max_col are starting row, final row, starting column and final column #respectively; all the values are int and the parameters should be entered as keyword argument

#iterate columns
for col in sheet.iter_cols(min_row=1,max_row=10,min_col=1,max_col=10):
	for cell in col:
		print(cell.value)
# min_ro, max_row, min_col and max_col are starting row, final row, starting column and final column #respectively; all the values are int and the parameters should be entered as keyword argument
#the only difference between iter_rows() and iter_cols() is that iter_rows() returns a tuple of row tuple and iter_cols() returns a tuple of column tuple.

To add new row to a sheet use the append method.

sheet.append([‘col1’,’col2’])
sheet.append(1,2)

Inserting and deleting rows or columns

Not all the package allows for insertion and deletion but it is a time saving feature that you will not know the importance of until you use it. This package makes it easy to do just simple use insert_rows(), insert_cols(),delete_rows() and delete_cols().

sheet.insert_rows(index,number_of_rows)
#inserts number_of_rows rows in position index

sheet.delete_cols(index,number_of_cols)
#deletes number_of_cols from index position, ie deletes cols from index to #index+number_of_cols

Saving a file

All the action that were done to and with the data were done in memory to commit the same information to storage.(i.e) to write the data to a file, you must not forget to save it by using the save method of workbook.

new_workbook.save(file_name_and_extension)

#where the  is the name and extension, example “test.xlsx”
# the extension must be xlsx if you want to open the file MS Excel.

Interacting with Pandas

For those who do not know, pandas is a python package provides a very useful data structure called data frame. This post will not go in to details about pandas but only provide information on how to opnenpyxl and pandas interact.

To create a workbook from dataframe, you need to import dataframe_to_row method.

from openpyxl.utils.dataframe import dataframe_to_row 
new_workbook=Workbook()
defalut_sheet=new_workbook.active

for row in dataframe_to_row (,index=True,header=True):
	defalut_sheet.append(row)

	To convert a workbook to a dataframe, follow these steps.

import pandas as pd
data=[]
for row in sheet.values:
	data.append(row)
df=pd.DataFrame(data)

Read-only and write-only modes

When you need to work with a large data sets read only and write only modes will be very useful.

Read only mode allows you to read a large file without moving the entire data contained within it to memory. To read a file in read only mode you need to make the read_only flag True while reading a file.

wokbook_from_large_file=load_workbook(file_name,read_only =True)

#in red only mode the formulas will not be evaluated but appear as row strings.

Similarly when you want to dump a lot of data to a file use write only mode. To create a write only workbook make the write_only flag True. Unlike normal workbook a write only workbook will not have a default sheet and all the sheets should be added; data cannot be read from a write only workbook and write only workbooks can only be saved only once.

new_workbook=Workbook(write_only =True)

#data can only be written to a write only workbook using append() method.


Leave a Reply

Your email address will not be published. Required fields are marked *