Pandas groupby cumulative sum Pandas groupby cumulative sum python python

Pandas groupby cumulative sum


This should do it, need groupby() twice:

df.groupby(['name', 'day']).sum() \  .groupby(level=0).cumsum().reset_index()

Explanation:

print(df)   name        day   no0  Jack     Monday   101  Jack    Tuesday   202  Jack    Tuesday   103  Jack  Wednesday   504  Jill     Monday   405  Jill  Wednesday  110# sum per name/dayprint( df.groupby(['name', 'day']).sum() )                 noname day           Jack Monday      10     Tuesday     30     Wednesday   50Jill Monday      40      Wednesday  110# cumulative sum per name/dayprint( df.groupby(['name', 'day']).sum() \         .groupby(level=0).cumsum() )                 noname day           Jack Monday      10     Tuesday     40     Wednesday   90Jill Monday      40     Wednesday  150

The dataframe resulting from the first sum is indexed by 'name' and by 'day'. You can see it by printing

df.groupby(['name', 'day']).sum().index 

When computing the cumulative sum, you want to do so by 'name', corresponding to the first index (level 0).

Finally, use reset_index to have the names repeated.

df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()   name        day   no0  Jack     Monday   101  Jack    Tuesday   402  Jack  Wednesday   903  Jill     Monday   404  Jill  Wednesday  150


Modification to @Dmitry's answer. This is simpler and works in pandas 0.19.0:

print(df)  name        day   no0  Jack     Monday   101  Jack    Tuesday   202  Jack    Tuesday   103  Jack  Wednesday   504  Jill     Monday   405  Jill  Wednesday  110df['no_csum'] = df.groupby(['name'])['no'].cumsum()print(df)   name        day   no  no_csum0  Jack     Monday   10       101  Jack    Tuesday   20       302  Jack    Tuesday   10       403  Jack  Wednesday   50       904  Jill     Monday   40       405  Jill  Wednesday  110      150


This works in pandas 0.16.2

In[23]: print df        name          day   no0      Jack       Monday    101      Jack      Tuesday    202      Jack      Tuesday    103      Jack    Wednesday    504      Jill       Monday    405      Jill    Wednesday   110In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())In[25]: print df        name          day   no  no_cumulative0      Jack       Monday    10             101      Jack      Tuesday    20             302      Jack      Tuesday    10             403      Jack    Wednesday    50             904      Jill       Monday    40             405      Jill    Wednesday   110            150