Better way for merge (update\insert) pandas dataframes
Option 1: use indicator=True
as part of merge
:
df_out = df_current_source.merge(df_new_source, on=['index1', 'index2'], how='outer', indicator=True)df_out['A'] = np.where(df_out['_merge'] == 'both', df_out['A_y'], df_out.A_x.add(df_out.A_y, fill_value=0)).astype(int)df_out[['A', 'index1', 'index2']]
Output:
A index1 index20 1 1 41 2 2 52 5 3 63 4 2 74 6 4 5
Option 2: use combined_first
with set_index
df_new_source.set_index(['index1', 'index2'])\ .combine_first(df_current_source.set_index(['index1', 'index2']))\ .reset_index()\ .astype(int)
Output:
index1 index2 A0 1 4 11 2 5 22 2 7 43 3 6 54 4 5 6
Check this link join or merge with overwrite in pandas. You can use combine_first:
combined_dataframe = df_new_source.set_index('A').combine_first(df_current_source.set_index('A'))combined_dataframe.reset_index()
Output
A index1 index2 0 1 1.0 4.0 1 2 2.0 5.0 2 3 2.0 7.0 3 5 3.0 6.0 4 6 4.0 5.0