Get the mean across multiple Pandas DataFrames Get the mean across multiple Pandas DataFrames numpy numpy

Get the mean across multiple Pandas DataFrames


Assuming the two dataframes have the same columns, you could just concatenate them and compute your summary stats on the concatenated frames:

import numpy as npimport pandas as pd# some random data framesdf1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))# concatenate themdf_concat = pd.concat((df1, df2))print df_concat.mean()# x   -0.163044# y    2.120000# dtype: float64print df_concat.median()# x   -0.192037# y    2.000000# dtype: float64

Update

If you want to compute stats across each set of rows with the same index in the two datasets, you can use .groupby() to group the data by row index, then apply the mean, median etc.:

by_row_index = df_concat.groupby(df_concat.index)df_means = by_row_index.mean()print df_means.head()#           x    y# 0 -0.850794  1.5# 1  0.159038  1.5# 2  0.083278  1.0# 3 -0.540336  0.5# 4  0.390954  3.5

This method will work even when your dataframes have unequal numbers of rows - if a particular row index is missing in one of the two dataframes, the mean/median will be computed on the single existing row.


I go similar as @ali_m, but since you want one mean per row-column combination, I conclude differently:

df1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))df = pd.concat([df1, df2])foo = df.groupby(level=1).mean()foo.head()          x    y0  0.841282  2.51  0.716749  1.02 -0.551903  2.53  1.240736  1.54  1.227109  2.0


As per Niklas' comment, the solution to the question is panel.mean(axis=0).

As a more complete example:

import pandas as pdimport numpy as npdfs = {}nrows = 4ncols = 3for i in range(4):    dfs[i] = pd.DataFrame(np.arange(i, nrows*ncols+i).reshape(nrows, ncols),                          columns=list('abc'))    print('DF{i}:\n{df}\n'.format(i=i, df=dfs[i]))panel = pd.Panel(dfs)print('Mean of stacked DFs:\n{df}'.format(df=panel.mean(axis=0)))

Will give the following output:

DF0:   a   b   c0  0   1   21  3   4   52  6   7   83  9  10  11DF1:    a   b   c0   1   2   31   4   5   62   7   8   93  10  11  12DF2:    a   b   c0   2   3   41   5   6   72   8   9  103  11  12  13DF3:    a   b   c0   3   4   51   6   7   82   9  10  113  12  13  14Mean of stacked DFs:      a     b     c0   1.5   2.5   3.51   4.5   5.5   6.52   7.5   8.5   9.53  10.5  11.5  12.5