How to convert OpenDocument spreadsheets to a pandas DataFrame?
This is available natively in pandas 0.25. So long as you have odfpy installed (conda install odfpy OR pip install odfpy) you can do
pd.read_excel("the_document.ods", engine="odf")
You can read ODF (Open Document Format .ods
) documents in Python using the following modules:
Using ezodf, a simple ODS-to-DataFrame converter could look like this:
import pandas as pdimport ezodfdoc = ezodf.opendoc('some_odf_spreadsheet.ods')print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))for sheet in doc.sheets: print("-"*40) print(" Sheet name : '%s'" % sheet.name) print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )# convert the first sheet to a pandas.DataFramesheet = doc.sheets[0]df_dict = {}for i, row in enumerate(sheet.rows()): # row is a list of cells # assume the header is on the first row if i == 0: # columns as lists in a dictionary df_dict = {cell.value:[] for cell in row} # create index for the column headers col_index = {j:cell.value for j, cell in enumerate(row)} continue for j, cell in enumerate(row): # use header instead of column index df_dict[col_index[j]].append(cell.value)# and convert to a DataFramedf = pd.DataFrame(df_dict)
P.S.
ODF spreadsheet (*.ods files) support has been requested on the
pandas
issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.ezodf
was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in thispandas
fork.- there are also some brute force methods to read directly from the XML code (here)
Here is a quick and dirty hack which uses ezodf module:
import pandas as pdimport ezodfdef read_ods(filename, sheet_no=0, header=0): tab = ezodf.opendoc(filename=filename).sheets[sheet_no] return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]] for col in tab.columns()})
Test:
In [92]: df = read_ods(filename='fn.ods')In [93]: dfOut[93]: a b c0 1.0 2.0 3.01 4.0 5.0 6.02 7.0 8.0 9.0
NOTES:
- all other useful parameters like
header
,skiprows
,index_col
,parse_cols
are NOT implemented in this function - feel free to update this question if you want to implement them ezodf
depends onlxml
make sure you have it installed