pandas three-way joining multiple dataframes on columns pandas three-way joining multiple dataframes on columns python python

pandas three-way joining multiple dataframes on columns


Assumed imports:

import pandas as pd

John Galt's answer is basically a reduce operation. If I have more than a handful of dataframes, I'd put them in a list like this (generated via list comprehensions or loops or whatnot):

dfs = [df0, df1, df2, dfN]

Assuming they have some common column, like name in your example, I'd do the following:

df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)

That way, your code should work with whatever number of dataframes you want to merge.

Edit August 1, 2016: For those using Python 3: reduce has been moved into functools. So to use this function, you'll first need to import that module:

from functools import reduce


You could try this if you have 3 dataframes

# Merge multiple dataframesdf1 = pd.DataFrame(np.array([    ['a', 5, 9],    ['b', 4, 61],    ['c', 24, 9]]),    columns=['name', 'attr11', 'attr12'])df2 = pd.DataFrame(np.array([    ['a', 5, 19],    ['b', 14, 16],    ['c', 4, 9]]),    columns=['name', 'attr21', 'attr22'])df3 = pd.DataFrame(np.array([    ['a', 15, 49],    ['b', 4, 36],    ['c', 14, 9]]),    columns=['name', 'attr31', 'attr32'])pd.merge(pd.merge(df1,df2,on='name'),df3,on='name')

alternatively, as mentioned by cwharland

df1.merge(df2,on='name').merge(df3,on='name')


This is an ideal situation for the join method

The join method is built exactly for these types of situations. You can join any number of DataFrames together with it. The calling DataFrame joins with the index of the collection of passed DataFrames. To work with multiple DataFrames, you must put the joining columns in the index.

The code would look something like this:

filenames = ['fn1', 'fn2', 'fn3', 'fn4',....]dfs = [pd.read_csv(filename, index_col=index_col) for filename in filenames)]dfs[0].join(dfs[1:])

With @zero's data, you could do this:

df1 = pd.DataFrame(np.array([    ['a', 5, 9],    ['b', 4, 61],    ['c', 24, 9]]),    columns=['name', 'attr11', 'attr12'])df2 = pd.DataFrame(np.array([    ['a', 5, 19],    ['b', 14, 16],    ['c', 4, 9]]),    columns=['name', 'attr21', 'attr22'])df3 = pd.DataFrame(np.array([    ['a', 15, 49],    ['b', 4, 36],    ['c', 14, 9]]),    columns=['name', 'attr31', 'attr32'])dfs = [df1, df2, df3]dfs = [df.set_index('name') for df in dfs]dfs[0].join(dfs[1:])     attr11 attr12 attr21 attr22 attr31 attr32name                                          a         5      9      5     19     15     49b         4     61     14     16      4     36c        24      9      4      9     14      9