Python - rolling functions for GroupBy object

python pandas pandas-groupby rolling-computation rolling-sum

For the Googlers who come upon this old question:

Regarding @kekert's comment on @Garrett's answer to use the new

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

So I think I've figured out a solution that uses the new rolling() method and still works the same:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

0    0.01    0.52    1.53    3.04    3.55    4.5

which you can add as a column:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

python pandas pandas-groupby rolling-computation rolling-sum

cumulative sum

To answer the question directly, the cumsum method would produced the desired series:

In [17]: dfOut[17]:  id  x0  a  01  a  12  a  23  b  34  b  45  b  5In [18]: df.groupby('id').x.cumsum()Out[18]:0     01     12     33     34     75    12Name: x, dtype: int64

pandas rolling functions per group

More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.

In [10]: df.groupby('id')['x'].rolling(2, min_periods=1).sum()Out[10]:ida   0   0.00    1   1.00    2   3.00b   3   3.00    4   7.00    5   9.00Name: x, dtype: float64

To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:

In [16]: df.groupby('id')['x'].transform(lambda s: s.rolling(2, min_periods=1).sum())Out[16]:0    01    12    33    34    75    9Name: x, dtype: int64

deprecated approach

For reference, here's how the now deprecated pandas.rolling_mean behaved:

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)Out[16]: 0    0.01    0.52    1.53    3.04    3.55    4.5

python pandas pandas-groupby rolling-computation rolling-sum

Here is another way that generalizes well and uses pandas' expanding method.

It is very efficient and also works perfectly for rolling window calculations with fixed windows, such as for time series.

# Import pandas libraryimport pandas as pd# Prepare columnsx = range(0, 6)id = ['a', 'a', 'a', 'b', 'b', 'b']# Create dataframe from columns abovedf = pd.DataFrame({'id':id, 'x':x})# Calculate rolling sum with infinite window size (i.e. all rows in group) using "expanding"df['rolling_sum'] = df.groupby('id')['x'].transform(lambda x: x.expanding().sum())# Output as desired by original posterprint(df)  id  x  rolling_sum0  a  0            01  a  1            12  a  2            33  b  3            34  b  4            75  b  5           12

CodeHunter

Python - rolling functions for GroupBy object

cumulative sum

pandas rolling functions per group

deprecated approach

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last