Better way for merge (update\insert) pandas dataframes Better way for merge (update\insert) pandas dataframes pandas pandas

Better way for merge (update\insert) pandas dataframes


Option 1: use indicator=True as part of merge:

df_out = df_current_source.merge(df_new_source,                                  on=['index1', 'index2'],                                  how='outer', indicator=True)df_out['A'] = np.where(df_out['_merge'] == 'both',                       df_out['A_y'],                       df_out.A_x.add(df_out.A_y, fill_value=0)).astype(int)df_out[['A', 'index1', 'index2']]

Output:

   A  index1  index20  1       1       41  2       2       52  5       3       63  4       2       74  6       4       5

Option 2: use combined_first with set_index

df_new_source.set_index(['index1', 'index2'])\             .combine_first(df_current_source.set_index(['index1', 'index2']))\             .reset_index()\             .astype(int)

Output:

   index1  index2  A0       1       4  11       2       5  22       2       7  43       3       6  54       4       5  6


Check this link join or merge with overwrite in pandas. You can use combine_first:

combined_dataframe = df_new_source.set_index('A').combine_first(df_current_source.set_index('A'))combined_dataframe.reset_index()

Output

    A  index1  index2 0  1   1.0    4.0 1  2   2.0    5.0 2  3   2.0    7.0 3  5   3.0    6.0 4  6   4.0    5.0