Combine two pandas Data Frames (join on a common column) Combine two pandas Data Frames (join on a common column) python python

Combine two pandas Data Frames (join on a common column)


You can use merge to combine two dataframes into one:

import pandas as pdpd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')

where on specifies field name that exists in both dataframes to join on, and howdefines whether its inner/outer/left/right join, with outer using 'union of keys from both frames (SQL: full outer join).' Since you have 'star' column in both dataframes, this by default will create two columns star_x and star_y in the combined dataframe. As @DanAllan mentioned for the join method, you can modify the suffixes for merge by passing it as a kwarg. Default is suffixes=('_x', '_y'). if you wanted to do something like star_restaurant_id and star_restaurant_review, you can do:

 pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer', suffixes=('_restaurant_id', '_restaurant_review'))

The parameters are explained in detail in this link.


Joining fails if the DataFrames have some column names in common. The simplest way around it is to include an lsuffix or rsuffix keyword like so:

restaurant_review_frame.join(restaurant_ids_dataframe, on='business_id', how='left', lsuffix="_review")

This way, the columns have distinct names. The documentation addresses this very problem.

Or, you could get around this by simply deleting the offending columns before you join. If, for example, the stars in restaurant_ids_dataframe are redundant to the stars in restaurant_review_frame, you could del restaurant_ids_dataframe['stars'].


In case anyone needs to try and merge two dataframes together on the index (instead of another column), this also works!

T1 and T2 are dataframes that have the same indices

import pandas as pdT1 = pd.merge(T1, T2, on=T1.index, how='outer')

P.S. I had to use merge because append would fill NaNs in unnecessarily.