Python pandas groupby aggregate on multiple columns, then pivot

python pandas dataframe pivot data-cleaning

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')df.set_index(['Category', 'Item']).stack().groupby('Category') \  .agg(rnm_cols.keys()).rename(columns=rnm_cols)            Size   Sum        Mean        StdCategory                                     Books          3    58   19.333333   2.081666Clothes        3   148   49.333333   4.041452Technology     6  1800  300.000000  70.710678

option 1
use agg ← link to docs

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)                  Std   Sum        Mean  SizeCategory                                     Books        2.081666    58   19.333333     3Clothes      4.041452   148   49.333333     3Technology  70.710678  1800  300.000000     6

option 2
more for less
use describe ← link to docs

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()            count        mean        std    min    25%    50%    75%    maxCategory                                                                   Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

python pandas dataframe pivot data-cleaning

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

Or if you want it across all shops then:

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'})df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

python pandas dataframe pivot data-cleaning

If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stack your dataframe and then group by Category:

stacked = df.set_index(['Item', 'Category']).stack().reset_index()stacked.columns = ['Item', 'Category', 'Shop', 'Price']stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})

Which results in

           Price                                        count   sum        mean        stdCategory                                     Books          3    58   19.333333   2.081666Clothes        3   148   49.333333   4.041452Technology     6  1800  300.000000  70.710678

CodeHunter

Python pandas groupby aggregate on multiple columns, then pivot

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last