How to write to an existing excel file without overwriting data (using pandas)? How to write to an existing excel file without overwriting data (using pandas)? python python

How to write to an existing excel file without overwriting data (using pandas)?


Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:

import pandasfrom openpyxl import load_workbookbook = load_workbook('Masterfile.xlsx')writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') writer.book = book## ExcelWriter for some reason uses writer.sheets to access the sheet.## If you leave it empty it will not know that sheet Main is already there## and will create a new sheet.writer.sheets = dict((ws.title, ws) for ws in book.worksheets)data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])writer.save()


UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel() and pd.ExcelWriter() have been changed - a new if_sheet_exists parameter has been introduced, which has invalidated the function below.

Here you can find an updated version of the append_df_to_excel(), which is working for Pandas 1.3.0+.


Here is a helper function:

import osfrom openpyxl import load_workbookdef append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,                       truncate_sheet=False,                        **to_excel_kwargs):    """    Append a DataFrame [df] to existing Excel file [filename]    into [sheet_name] Sheet.    If [filename] doesn't exist, then this function will create it.    @param filename: File path or existing ExcelWriter                     (Example: '/path/to/file.xlsx')    @param df: DataFrame to save to workbook    @param sheet_name: Name of sheet which will contain DataFrame.                       (default: 'Sheet1')    @param startrow: upper left cell row to dump data frame.                     Per default (startrow=None) calculate the last row                     in the existing DF and write to the next row...    @param truncate_sheet: truncate (remove and recreate) [sheet_name]                           before writing DataFrame to Excel file    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`                            [can be a dictionary]    @return: None    Usage examples:    >>> append_df_to_excel('d:/temp/test.xlsx', df)    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',                           index=False)    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',                            index=False, startrow=25)    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)    """    # Excel file doesn't exist - saving and exiting    if not os.path.isfile(filename):        df.to_excel(            filename,            sheet_name=sheet_name,             startrow=startrow if startrow is not None else 0,             **to_excel_kwargs)        return        # ignore [engine] parameter if it was passed    if 'engine' in to_excel_kwargs:        to_excel_kwargs.pop('engine')    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')    # try to open an existing workbook    writer.book = load_workbook(filename)        # get the last row in the existing Excel sheet    # if it was not specified explicitly    if startrow is None and sheet_name in writer.book.sheetnames:        startrow = writer.book[sheet_name].max_row    # truncate sheet    if truncate_sheet and sheet_name in writer.book.sheetnames:        # index of [sheet_name] sheet        idx = writer.book.sheetnames.index(sheet_name)        # remove [sheet_name]        writer.book.remove(writer.book.worksheets[idx])        # create an empty sheet [sheet_name] using old index        writer.book.create_sheet(sheet_name, idx)        # copy existing sheets    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}    if startrow is None:        startrow = 0    # write out the new sheet    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)    # save the workbook    writer.save()

Tested with the following versions:

  • Pandas 1.2.3
  • Openpyxl 3.0.5


With openpyxlversion 2.4.0 and pandasversion 0.19.2, the process @ski came up with gets a bit simpler:

import pandasfrom openpyxl import load_workbookwith pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:    writer.book = load_workbook('Masterfile.xlsx')    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])#That's it!