How to convert OpenDocument spreadsheets to a pandas DataFrame?

python pandas libreoffice dataframe opendocument

This is available natively in pandas 0.25. So long as you have odfpy installed (conda install odfpy OR pip install odfpy) you can do

pd.read_excel("the_document.ods", engine="odf")

python pandas libreoffice dataframe opendocument

You can read ODF (Open Document Format .ods) documents in Python using the following modules:

Using ezodf, a simple ODS-to-DataFrame converter could look like this:

import pandas as pdimport ezodfdoc = ezodf.opendoc('some_odf_spreadsheet.ods')print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))for sheet in doc.sheets:    print("-"*40)    print("   Sheet name : '%s'" % sheet.name)    print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )# convert the first sheet to a pandas.DataFramesheet = doc.sheets[0]df_dict = {}for i, row in enumerate(sheet.rows()):    # row is a list of cells    # assume the header is on the first row    if i == 0:        # columns as lists in a dictionary        df_dict = {cell.value:[] for cell in row}        # create index for the column headers        col_index = {j:cell.value for j, cell in enumerate(row)}        continue    for j, cell in enumerate(row):        # use header instead of column index        df_dict[col_index[j]].append(cell.value)# and convert to a DataFramedf = pd.DataFrame(df_dict)

P.S.

ODF spreadsheet (*.ods files) support has been requested on the pandas issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.
ezodf was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in this pandas fork.
there are also some brute force methods to read directly from the XML code (here)

python pandas libreoffice dataframe opendocument

Here is a quick and dirty hack which uses ezodf module:

import pandas as pdimport ezodfdef read_ods(filename, sheet_no=0, header=0):    tab = ezodf.opendoc(filename=filename).sheets[sheet_no]    return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]]                         for col in tab.columns()})

Test:

In [92]: df = read_ods(filename='fn.ods')In [93]: dfOut[93]:     a    b    c0  1.0  2.0  3.01  4.0  5.0  6.02  7.0  8.0  9.0

NOTES:

all other useful parameters like header, skiprows, index_col, parse_cols are NOT implemented in this function - feel free to update this question if you want to implement them
ezodf depends on lxml make sure you have it installed

CodeHunter

How to convert OpenDocument spreadsheets to a pandas DataFrame?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last