Pandas: GroupBy Shift And Cumulative Sum
You could use transform()
to feed the separate groups that are created at each level of groupby
into the cumsum()
and shift()
methods.
temp['transformed'] = \ temp.groupby('ID')['X'].transform(lambda x: x.cumsum().shift())
ID X transformed0 a 1 NaN1 a 1 1.02 a 1 2.03 b 1 NaN4 b 1 1.05 b 1 2.06 c 1 NaN7 c 1 1.0
For more info on transform()
please see here:
You need using apply
, since one function is under groupby
object
which is cumsum
another function shift
is for all df
temp['transformed'] = temp.groupby('ID')['X'].apply(lambda x : x.cumsum().shift())tempOut[287]: ID X transformed0 a 1 NaN1 a 1 1.02 a 1 2.03 b 1 NaN4 b 1 1.05 b 1 2.06 c 1 NaN7 c 1 1.0
While working on this problem, as the DataFrame size grows, using lambdas on transform starts to get very slow. I found out that using some DataFrameGroupBy methods (like cumsum and shift instead of lambdas are much faster.
So here's my proposed solution, creating a 'temp'
column to save the cumsum for each ID and then shifting in a different groupby:
df['temp'] = df.groupby("ID")['X'].cumsum()df['transformed'] = df.groupby("ID")['temp'].shift()df = df.drop(columns=["temp"])