Pandas groupby cumulative sum
This should do it, need groupby()
twice:
df.groupby(['name', 'day']).sum() \ .groupby(level=0).cumsum().reset_index()
Explanation:
print(df) name day no0 Jack Monday 101 Jack Tuesday 202 Jack Tuesday 103 Jack Wednesday 504 Jill Monday 405 Jill Wednesday 110# sum per name/dayprint( df.groupby(['name', 'day']).sum() ) noname day Jack Monday 10 Tuesday 30 Wednesday 50Jill Monday 40 Wednesday 110# cumulative sum per name/dayprint( df.groupby(['name', 'day']).sum() \ .groupby(level=0).cumsum() ) noname day Jack Monday 10 Tuesday 40 Wednesday 90Jill Monday 40 Wednesday 150
The dataframe resulting from the first sum is indexed by 'name'
and by 'day'
. You can see it by printing
df.groupby(['name', 'day']).sum().index
When computing the cumulative sum, you want to do so by 'name'
, corresponding to the first index (level 0).
Finally, use reset_index
to have the names repeated.
df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index() name day no0 Jack Monday 101 Jack Tuesday 402 Jack Wednesday 903 Jill Monday 404 Jill Wednesday 150
Modification to @Dmitry's answer. This is simpler and works in pandas 0.19.0:
print(df) name day no0 Jack Monday 101 Jack Tuesday 202 Jack Tuesday 103 Jack Wednesday 504 Jill Monday 405 Jill Wednesday 110df['no_csum'] = df.groupby(['name'])['no'].cumsum()print(df) name day no no_csum0 Jack Monday 10 101 Jack Tuesday 20 302 Jack Tuesday 10 403 Jack Wednesday 50 904 Jill Monday 40 405 Jill Wednesday 110 150
This works in pandas 0.16.2
In[23]: print df name day no0 Jack Monday 101 Jack Tuesday 202 Jack Tuesday 103 Jack Wednesday 504 Jill Monday 405 Jill Wednesday 110In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())In[25]: print df name day no no_cumulative0 Jack Monday 10 101 Jack Tuesday 20 302 Jack Tuesday 10 403 Jack Wednesday 50 904 Jill Monday 40 405 Jill Wednesday 110 150