Get the mean across multiple Pandas DataFrames

python r numpy pandas

Assuming the two dataframes have the same columns, you could just concatenate them and compute your summary stats on the concatenated frames:

import numpy as npimport pandas as pd# some random data framesdf1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))# concatenate themdf_concat = pd.concat((df1, df2))print df_concat.mean()# x   -0.163044# y    2.120000# dtype: float64print df_concat.median()# x   -0.192037# y    2.000000# dtype: float64

Update

If you want to compute stats across each set of rows with the same index in the two datasets, you can use .groupby() to group the data by row index, then apply the mean, median etc.:

by_row_index = df_concat.groupby(df_concat.index)df_means = by_row_index.mean()print df_means.head()#           x    y# 0 -0.850794  1.5# 1  0.159038  1.5# 2  0.083278  1.0# 3 -0.540336  0.5# 4  0.390954  3.5

This method will work even when your dataframes have unequal numbers of rows - if a particular row index is missing in one of the two dataframes, the mean/median will be computed on the single existing row.

python r numpy pandas

I go similar as @ali_m, but since you want one mean per row-column combination, I conclude differently:

df1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))df = pd.concat([df1, df2])foo = df.groupby(level=1).mean()foo.head()          x    y0  0.841282  2.51  0.716749  1.02 -0.551903  2.53  1.240736  1.54  1.227109  2.0

python r numpy pandas

As per Niklas' comment, the solution to the question is panel.mean(axis=0).

As a more complete example:

import pandas as pdimport numpy as npdfs = {}nrows = 4ncols = 3for i in range(4):    dfs[i] = pd.DataFrame(np.arange(i, nrows*ncols+i).reshape(nrows, ncols),                          columns=list('abc'))    print('DF{i}:\n{df}\n'.format(i=i, df=dfs[i]))panel = pd.Panel(dfs)print('Mean of stacked DFs:\n{df}'.format(df=panel.mean(axis=0)))

Will give the following output:

DF0:   a   b   c0  0   1   21  3   4   52  6   7   83  9  10  11DF1:    a   b   c0   1   2   31   4   5   62   7   8   93  10  11  12DF2:    a   b   c0   2   3   41   5   6   72   8   9  103  11  12  13DF3:    a   b   c0   3   4   51   6   7   82   9  10  113  12  13  14Mean of stacked DFs:      a     b     c0   1.5   2.5   3.51   4.5   5.5   6.52   7.5   8.5   9.53  10.5  11.5  12.5

CodeHunter

Get the mean across multiple Pandas DataFrames

Update

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last