Averaging data from multiple data files in Python with pandas Averaging data from multiple data files in Python with pandas pandas pandas

Averaging data from multiple data files in Python with pandas


Check it out:

In [14]: glued = pd.concat([x, y], axis=1, keys=['x', 'y'])In [15]: gluedOut[15]:           x                             y                              A         B         C         A         B         C0 -0.264438 -1.026059 -0.619500  1.923135  0.135355 -0.2854911  0.927272  0.302904 -0.032399 -0.208940  0.642432 -0.7649022 -0.264273 -0.386314 -0.217601  1.477419 -1.659804 -0.4313753 -0.871858 -0.348382  1.100491 -1.191664  0.152576  0.935773In [16]: glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)Out[16]:           A                   B                   C                    x         y         x         y         x         y0 -0.264438  1.923135 -1.026059  0.135355 -0.619500 -0.2854911  0.927272 -0.208940  0.302904  0.642432 -0.032399 -0.7649022 -0.264273  1.477419 -0.386314 -1.659804 -0.217601 -0.4313753 -0.871858 -1.191664 -0.348382  0.152576  1.100491  0.935773In [17]: glued = glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)In [18]: gluedOut[18]:           A                   B                   C                    x         y         x         y         x         y0 -0.264438  1.923135 -1.026059  0.135355 -0.619500 -0.2854911  0.927272 -0.208940  0.302904  0.642432 -0.032399 -0.7649022 -0.264273  1.477419 -0.386314 -1.659804 -0.217601 -0.4313753 -0.871858 -1.191664 -0.348382  0.152576  1.100491  0.935773

For the record, swapping the level and reordering was not necessary, just for visual purposes.

Then you can do stuff like:

In [19]: glued.groupby(level=0, axis=1).mean()Out[19]:           A         B         C0  0.829349 -0.445352 -0.4524961  0.359166  0.472668 -0.3986502  0.606573 -1.023059 -0.3244883 -1.031761 -0.097903  1.018132


I figured out one way to do it.

pandas DataFrames can be added together with the DataFrame.add() function: http://pandas.sourceforge.net/generated/pandas.DataFrame.add.html

So I can add the DataFrames together then divide by the number of DataFrames, e.g.:

avgDataFrame = DataFrameList[0]for i in range(1, len(DataFrameList)):    avgDataFrame = avgDataFrame.add(DataFrameList[i])avgDataFrame = avgDataFrame / len(DataFrameList)


Have a look at the pandas.concat() function. When you read in your files, you can use concat to join the resulting DataFrames into one, then just use normal pandas averaging techniques to average them.

To use it, just pass it a list of the DataFrames you want joined together:

>>> x          A         B         C0 -0.264438 -1.026059 -0.6195001  0.927272  0.302904 -0.0323992 -0.264273 -0.386314 -0.2176013 -0.871858 -0.348382  1.100491>>> y          A         B         C0  1.923135  0.135355 -0.2854911 -0.208940  0.642432 -0.7649022  1.477419 -1.659804 -0.4313753 -1.191664  0.152576  0.935773>>> pandas.concat([x, y])          A         B         C0 -0.264438 -1.026059 -0.6195001  0.927272  0.302904 -0.0323992 -0.264273 -0.386314 -0.2176013 -0.871858 -0.348382  1.1004910  1.923135  0.135355 -0.2854911 -0.208940  0.642432 -0.7649022  1.477419 -1.659804 -0.4313753 -1.191664  0.152576  0.935773