Merge two dataframes by index
Use merge
, which is an inner join by default:
pd.merge(df1, df2, left_index=True, right_index=True)
Or join
, which is a left join by default:
df1.join(df2)
Or concat
), which is an outer join by default:
pd.concat([df1, df2], axis=1)
Samples:
df1 = pd.DataFrame({'a':range(6), 'b':[5,3,6,9,2,4]}, index=list('abcdef'))print (df1) a ba 0 5b 1 3c 2 6d 3 9e 4 2f 5 4df2 = pd.DataFrame({'c':range(4), 'd':[10,20,30, 40]}, index=list('abhi'))print (df2) c da 0 10b 1 20h 2 30i 3 40
# Default inner joindf3 = pd.merge(df1, df2, left_index=True, right_index=True)print (df3) a b c da 0 5 0 10b 1 3 1 20# Default left joindf4 = df1.join(df2)print (df4) a b c da 0 5 0.0 10.0b 1 3 1.0 20.0c 2 6 NaN NaNd 3 9 NaN NaNe 4 2 NaN NaNf 5 4 NaN NaN# Default outer joindf5 = pd.concat([df1, df2], axis=1)print (df5) a b c da 0.0 5.0 0.0 10.0b 1.0 3.0 1.0 20.0c 2.0 6.0 NaN NaNd 3.0 9.0 NaN NaNe 4.0 2.0 NaN NaNf 5.0 4.0 NaN NaNh NaN NaN 2.0 30.0i NaN NaN 3.0 40.0
You can use concat([df1, df2, ...], axis=1) in order to concatenate two or more DFs aligned by indexes:
pd.concat([df1, df2, df3, ...], axis=1)
Or merge for concatenating by custom fields / indexes:
# join by _common_ columns: `col1`, `col3`pd.merge(df1, df2, on=['col1','col3'])# join by: `df1.col1 == df2.index`pd.merge(df1, df2, left_on='col1' right_index=True)
or join for joining by index:
df1.join(df2)
By default:join
is a column-wise left joinpd.merge
is a column-wise inner joinpd.concat
is a row-wise outer join
pd.concat
:
takes Iterable arguments. Thus, it cannot take DataFrames directly (use [df,df2]
)
Dimensions of DataFrame should match along axis
Join
and pd.merge
:
can take DataFrame arguments