JOIN two dataframes on common column in python JOIN two dataframes on common column in python pandas pandas

JOIN two dataframes on common column in python


Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))   id name  count  price  rating0   1    a     10  100.0     1.01   2    b     20  200.0     2.02   3    c     30  300.0     3.03   4    d     40    NaN     NaN4   5    e     50  500.0     5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))   id name  count  price  rating0   1    a     10  100.0     1.01   2    b     20  200.0     2.02   3    c     30  300.0     3.03   4    d     40    NaN     NaN4   5    e     50  500.0     5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price'])print (df1)   id name  count  price0   1    a     10  100.01   2    b     20  200.02   3    c     30  300.03   4    d     40    NaN4   5    e     50  500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')         .drop(['id1', 'rating'], axis=1))   id name  count  price0   1    a     10  100.01   2    b     20  200.02   3    c     30  300.03   4    d     40    NaN4   5    e     50  500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')         .drop('id1', axis=1))   id name  count  price0   1    a     10  100.01   2    b     20  200.02   3    c     30  300.03   4    d     40    NaN4   5    e     50  500.0


join utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left' dataframe.

Strategy:

  • set_index on df2 to be id1
  • use join with df as the left dataframe and id as the on parameter. Note that I could have set_index('id') on df to avoid having to use the on parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.

df.join(df2.set_index('id1'), on='id')   id name  count  price  rating0   1    a     10  100.0     1.01   2    b     20  200.0     2.02   3    c     30  300.0     3.03   4    d     40    NaN     NaN4   5    e     50  500.0     5.0

If you only want price from df2

df.join(df2.set_index('id1')[['price']], on='id')   id name  count  price0   1    a     10  100.01   2    b     20  200.02   3    c     30  300.03   4    d     40    NaN4   5    e     50  500.0