How to write to an existing excel file without overwriting data (using pandas)?
Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter
gives a clue that something like this might work out:
import pandasfrom openpyxl import load_workbookbook = load_workbook('Masterfile.xlsx')writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') writer.book = book## ExcelWriter for some reason uses writer.sheets to access the sheet.## If you leave it empty it will not know that sheet Main is already there## and will create a new sheet.writer.sheets = dict((ws.title, ws) for ws in book.worksheets)data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])writer.save()
UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel()
and pd.ExcelWriter()
have been changed - a new if_sheet_exists
parameter has been introduced, which has invalidated the function below.
Here you can find an updated version of the append_df_to_excel()
, which is working for Pandas 1.3.0+.
Here is a helper function:
import osfrom openpyxl import load_workbookdef append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, truncate_sheet=False, **to_excel_kwargs): """ Append a DataFrame [df] to existing Excel file [filename] into [sheet_name] Sheet. If [filename] doesn't exist, then this function will create it. @param filename: File path or existing ExcelWriter (Example: '/path/to/file.xlsx') @param df: DataFrame to save to workbook @param sheet_name: Name of sheet which will contain DataFrame. (default: 'Sheet1') @param startrow: upper left cell row to dump data frame. Per default (startrow=None) calculate the last row in the existing DF and write to the next row... @param truncate_sheet: truncate (remove and recreate) [sheet_name] before writing DataFrame to Excel file @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()` [can be a dictionary] @return: None Usage examples: >>> append_df_to_excel('d:/temp/test.xlsx', df) >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False) >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False) >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25) (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile) """ # Excel file doesn't exist - saving and exiting if not os.path.isfile(filename): df.to_excel( filename, sheet_name=sheet_name, startrow=startrow if startrow is not None else 0, **to_excel_kwargs) return # ignore [engine] parameter if it was passed if 'engine' in to_excel_kwargs: to_excel_kwargs.pop('engine') writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a') # try to open an existing workbook writer.book = load_workbook(filename) # get the last row in the existing Excel sheet # if it was not specified explicitly if startrow is None and sheet_name in writer.book.sheetnames: startrow = writer.book[sheet_name].max_row # truncate sheet if truncate_sheet and sheet_name in writer.book.sheetnames: # index of [sheet_name] sheet idx = writer.book.sheetnames.index(sheet_name) # remove [sheet_name] writer.book.remove(writer.book.worksheets[idx]) # create an empty sheet [sheet_name] using old index writer.book.create_sheet(sheet_name, idx) # copy existing sheets writer.sheets = {ws.title:ws for ws in writer.book.worksheets} if startrow is None: startrow = 0 # write out the new sheet df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs) # save the workbook writer.save()
Tested with the following versions:
- Pandas 1.2.3
- Openpyxl 3.0.5
With openpyxl
version 2.4.0
and pandas
version 0.19.2
, the process @ski came up with gets a bit simpler:
import pandasfrom openpyxl import load_workbookwith pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer: writer.book = load_workbook('Masterfile.xlsx') data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])#That's it!