Groupby sum and count on multiple columns in python
It can be done using pivot_table
this way:
>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)>>> df1 ebit profit revenuecountry month Canada 201411 5 10 15UK 201410 5 10 20USA 201409 5 12 19>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')>>> df2country month Canada 201411 1UK 201410 1USA 201409 2>>> pd.concat([df1,df2],axis=1) ebit profit revenue countcountry month Canada 201411 5 10 15 1UK 201410 5 10 20 1USA 201409 5 12 19 2
UPDATE
It can be done in one-line using pivot_table
and providing a dict of functions to apply to each column in the aggfunc
argument:
pd.pivot_table( df, index=['country','month'], aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}).rename(columns={'ID': 'count'}) count ebit profit revenuecountry month Canada 201411 1 5 10 15UK 201410 1 5 10 20USA 201409 2 5 12 19
You can do the groupby, and then map the counts of each country to a new column.
g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()g['count'] = g['country'].map(df['country'].value_counts())gOut[3]: country month revenue profit ebit count0 Canada 201411 15 10 5 11 UK 201410 20 10 5 12 USA 201409 19 12 5 2
Edit
To get the counts per country and month, you can do another groupby, and then join the two DataFrames together.
g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()j = df.groupby(['country', 'month']).size().to_frame('count')pd.merge(g, j, left_index=True, right_index=True).reset_index()Out[6]: country month revenue profit ebit count0 Canada 201411 15 10 5 11 UK 201410 20 10 5 12 UK 201411 10 5 2 13 USA 201409 19 12 5 2
I added another record for the UK with a different date - notice how there are now two UK entries in the merged DataFrame, with the appropriate counts.
The following solution seems the simplest.
Group by country and month:
grouped_df = df.groupby(['country', 'month'])
Apply sum to columns of interest (revenue, profit, ebit):
final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')
Assign the size of the grouped_df to a new column in 'final':
final['count'] = grouped_df.size()print(final)Out[256]: revenue profit ebit countcountry month Canada 201411 15 10 5 1UK 201410 20 10 5 1USA 201409 19 12 5 2
All done!