pass openpyxl data to pandas pass openpyxl data to pandas pandas pandas

pass openpyxl data to pandas


A couple of things. First, your code is only ever going to get you one line, because you overwrite the values every time it passes an if test. for example,

  if len(namelist) == 2:        lastname = namelist[1]

This assigns a string to the variable lastname. You are not appending to a list, you are just assigning a string. Then when you make your dataframe,df = pd.DataFrame({'personID':id,'lastName':lastname,... your using this value, so the dataframe will only ever hold that string. Make sense? If you must do this using openpyexcel, try something like:

lastname = [] #create an empty listif len(namelist) == 2:    lastname.append(namelist[1]) #add the name to the list

However, I think your life will ultimately be much easier if you just figure out how to do this with pandas. It is in fact quite easy. Try something like this:

import pandas as pd#read exceldf = pd.read_excel('myInputFilename.xlsx', encoding = 'utf8')#write to exceldf.to_excel('MyOutputFile.xlsx')


FWIW openpyxl 2.4 makes it pretty easy to convert all or part of an Excel sheet to a Pandas Dataframe: ws.values is an iterator for all that values in the sheet. It also has a new ws.iter_cols() method that will allow you to work directly with columns.

It's currently (April 2016) available as an alpha version and can be installed using pip install -U --pre openpyxl

The code would then look a bit like this:

sheet["B1"] = "firstName"sheet["C1"] = "middleName"sheet["D1"] = "lastName"for row in sheet.iter_rows(min_row=2, max_col=2):    id_cell, name = row    fullname = name.value.strip()    namelist = fullname.split()    firstname = namelist[0]    lastname = namelist[-1]    middlename = ""    if len(namelist) >= 3:        middlename = namelist[1]    if len(namelist) == 4:        lastname = " ".join(namelist[-2:])    if middlename in ('Del', 'El', 'Van', 'Da'):        lastname = " ".join([middlename, lastname])        middlename = None    name.value = firstname    name.offset(column=1).value = middlename    name.offset(column=2).value = lastnamewb.save("output.xlsx")