Calculate average of every x rows in a table and create new table

python python-3.x pandas dataframe numpy

You can create an artificial group using df.index//2 (or as @DSM pointed out, using np.arange(len(df))//2 - so that it works for all indices) and then use groupby:

df.groupby(np.arange(len(df))//2).mean()Out[13]:       a     b     c     d0   3.0  30.5  31.5  35.01   7.0  35.0  21.5  25.02  11.0  37.5  41.5  38.53  15.0  10.0  16.0  18.54  19.0  15.5  27.0  38.0

python python-3.x pandas dataframe numpy

You can approach this problem using pd.rolling() to create a rolling average and then just grab every second element using iloc

df = df.rolling(2).mean() df = df.iloc[::2, :]

Note that the first observation will be missing (i.e. the rolling starts at the top) so make sure to check that your data is sorted how you need it.

python python-3.x pandas dataframe numpy

NumPythonic way would be to extract the elements as a NumPy array with df.values, then reshape to a 3D array with 2 elements along axis=1 and 4 along axis=2 and perform the average reduction along axis=1 and finally convert back to a dataframe, like so -

pd.DataFrame(df.values.reshape(-1,2,df.shape[1]).mean(1))

As it turns out, you can introduce NumPy's very efficient tool : np.einsum to do this average-reduction as a combination of sum-reduction and scaling-down, like so -

pd.DataFrame(np.einsum('ijk->ik',df.values.reshape(-1,2,df.shape[1]))/2.0)

Please note that the proposed approaches assume that the number of rows is divisible by 2.

Also as noted by @DSM, to preserve the column names, you need to add columns=df.columns when converting back to Dataframe, i.e. -

pd.DataFrame(...,columns=df.columns)

Sample run -

>>> df    0   1   2   30   2  50  25  261   4  11  38  442   6  33  16  253   8  37  27  254  10  28  48  325  12  47  35  456  14   8  16   77  16  12  16  308  18  22  39  299  20   9  15  47>>> pd.DataFrame(df.values.reshape(-1,2,df.shape[1]).mean(1))    0     1     2     30   3  30.5  31.5  35.01   7  35.0  21.5  25.02  11  37.5  41.5  38.53  15  10.0  16.0  18.54  19  15.5  27.0  38.0>>> pd.DataFrame(np.einsum('ijk->ik',df.values.reshape(-1,2,df.shape[1]))/2.0)    0     1     2     30   3  30.5  31.5  35.01   7  35.0  21.5  25.02  11  37.5  41.5  38.53  15  10.0  16.0  18.54  19  15.5  27.0  38.0

Runtime tests -

In this section, let's test out all the three approaches listed thus far to solve the problem for performance, including @ayhan's solution with groupby.

In [24]: A = np.random.randint(0,9,(200,50))In [25]: df = pd.DataFrame(A)In [26]: %timeit df.groupby(df.index//2).mean() # @ayhan's solution1000 loops, best of 3: 1.61 ms per loopIn [27]: %timeit pd.DataFrame(df.values.reshape(-1,2,df.shape[1]).mean(1))1000 loops, best of 3: 317 µs per loopIn [28]: %timeit pd.DataFrame(np.einsum('ijk->ik',df.values.reshape(-1,2,df.shape[1]))/2.0)1000 loops, best of 3: 266 µs per loop

CodeHunter

Calculate average of every x rows in a table and create new table

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last