Merge two DataFrames based on multiple keys in pandas
To merge by multiple keys, you just need to pass the keys in a list to pd.merge
:
>>> pd.merge(a, b, on=['A', 'B']) A B value1 value20 1 1 23 0.101 1 2 34 0.202 2 1 2342 0.133 2 2 333 0.33
In fact, the default for pd.merge
is to use the intersection of the two DataFrames' column labels, so pd.merge(a, b)
would work equally well in this case.
According to the most recent pandas documentation, the on parameter accepts either a label
or list
on the field name and must be found in both data frames. Here is an MWE for its use:
a = pd.DataFrame({'A':['0', '0', '1','1'],'B':['0', '1', '0','1'], 'v':True, False, False, True]})b = pd.DataFrame({'A':['0', '0', '1','1'], 'B':['0', '1', '0','1'],'v':[False, True, True, True]})result = pd.merge(a, b, on=['A','B'], how='inner', suffixes=['_and', '_or'])>>> result A B v_and v_or0 0 0 True False1 0 1 False True2 1 0 False True3 1 1 True True
on : label or listColumn or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
Check out latest pd.merge documentation for further details.