Groupby sum and count on multiple columns in python Groupby sum and count on multiple columns in python python-3.x python-3.x

Groupby sum and count on multiple columns in python


It can be done using pivot_table this way:

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)>>> df1                 ebit  profit  revenuecountry month                        Canada  201411     5      10       15UK      201410     5      10       20USA     201409     5      12       19>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')>>> df2country  month Canada   201411    1UK       201410    1USA      201409    2>>> pd.concat([df1,df2],axis=1)                ebit  profit  revenue  countcountry month                               Canada  201411     5      10       15      1UK      201410     5      10       20      1USA     201409     5      12       19      2

UPDATE

It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:

pd.pivot_table(   df,   index=['country','month'],   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}).rename(columns={'ID': 'count'})                count  ebit  profit  revenuecountry month                               Canada  201411      1     5      10       15UK      201410      1     5      10       20USA     201409      2     5      12       19


You can do the groupby, and then map the counts of each country to a new column.

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()g['count'] = g['country'].map(df['country'].value_counts())gOut[3]:    country  month   revenue  profit  ebit  count0   Canada   201411  15       10      5     11   UK       201410  20       10      5     12   USA      201409  19       12      5     2

Edit

To get the counts per country and month, you can do another groupby, and then join the two DataFrames together.

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()j = df.groupby(['country', 'month']).size().to_frame('count')pd.merge(g, j, left_index=True, right_index=True).reset_index()Out[6]:    country  month   revenue  profit  ebit  count0   Canada   201411  15       10      5     11   UK       201410  20       10      5     12   UK       201411  10       5       2     13   USA      201409  19       12      5     2

I added another record for the UK with a different date - notice how there are now two UK entries in the merged DataFrame, with the appropriate counts.


The following solution seems the simplest.

Group by country and month:

grouped_df = df.groupby(['country', 'month'])

Apply sum to columns of interest (revenue, profit, ebit):

final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')

Assign the size of the grouped_df to a new column in 'final':

final['count'] = grouped_df.size()print(final)Out[256]:                 revenue  profit  ebit  countcountry month                               Canada  201411       15      10     5      1UK      201410       20      10     5      1USA     201409       19      12     5      2

All done!