Union of two pandas DataFrames
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}result = df1.merge(df2, on=['A'], how='outer', indicator='B')result['B'] = result['B'].map(m)result A B0 a df1, df21 b df12 c df2
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})df2 = pd.DataFrame({'A':['a','c']})df1['col1']='df1'df2['col2']='df2'df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')df['B']=df['col1']+','+df['col2']df['B'] = df['B'].str.strip(',')df=df[['A','B']]df A B0 a df1,df21 b df12 c df2
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) \ .groupby('A') \ .aggregate(list) \ .reset_index()
The result will be:
A source0 a [df1, df2]1 b [df1]2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.