Multiple aggregations of the same column using pandas GroupBy.agg()

python pandas dataframe aggregate pandas-groupby

You can simply pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})Out[20]:                    mean       sumdummy                    1      0.036901  0.369012

or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':                                  {'Mean': np.mean, 'Sum': np.sum}})Out[21]:         returns                     Mean       Sumdummy                    1      0.036901  0.369012

python pandas dataframe aggregate pandas-groupby

TLDR; Pandas groupby.agg has a new, easier syntax for specifying (1) aggregations on multiple columns, and (2) multiple aggregations on a column. So, to do this for pandas >= 0.25, use

df.groupby('dummy').agg(Mean=('returns', 'mean'), Sum=('returns', 'sum'))           Mean       Sumdummy                    1      0.036901  0.369012

df.groupby('dummy')['returns'].agg(Mean='mean', Sum='sum')           Mean       Sumdummy                    1      0.036901  0.369012

Pandas >= 0.25: Named Aggregation

Pandas has changed the behavior of GroupBy.agg in favour of a more intuitive syntax for specifying named aggregations. See the 0.25 docs section on Enhancements as well as relevant GitHub issues GH18366 and GH26512.

From the documentation,

To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where
The keywords are the output column names
The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

You can now pass a tuple via keyword arguments. The tuples follow the format of (<colName>, <aggFunc>).

import pandas as pdpd.__version__                                                                                                                            # '0.25.0.dev0+840.g989f912ee'# Setupdf = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],                   'height': [9.1, 6.0, 9.5, 34.0],                   'weight': [7.9, 7.5, 9.9, 198.0]})df.groupby('kind').agg(    max_height=('height', 'max'), min_weight=('weight', 'min'),)      max_height  min_weightkind                        cat          9.5         7.9dog         34.0         7.5

Alternatively, you can use pd.NamedAgg (essentially a namedtuple) which makes things more explicit.

df.groupby('kind').agg(    max_height=pd.NamedAgg(column='height', aggfunc='max'),     min_weight=pd.NamedAgg(column='weight', aggfunc='min'))      max_height  min_weightkind                        cat          9.5         7.9dog         34.0         7.5

It is even simpler for Series, just pass the aggfunc to a keyword argument.

df.groupby('kind')['height'].agg(max_height='max', min_height='min')          max_height  min_heightkind                        cat          9.5         9.1dog         34.0         6.0

Lastly, if your column names aren't valid python identifiers, use a dictionary with unpacking:

df.groupby('kind')['height'].agg(**{'max height': 'max', ...})

Pandas < 0.25

In more recent versions of pandas leading upto 0.24, if using a dictionary for specifying column names for the aggregation output, you will get a FutureWarning:

df.groupby('dummy').agg({'returns': {'Mean': 'mean', 'Sum': 'sum'}})# FutureWarning: using a dict with renaming is deprecated and will be removed # in a future version

Using a dictionary for renaming columns is deprecated in v0.20. On more recent versions of pandas, this can be specified more simply by passing a list of tuples. If specifying the functions this way, all functions for that column need to be specified as tuples of (name, function) pairs.

df.groupby("dummy").agg({'returns': [('op1', 'sum'), ('op2', 'mean')]})        returns                      op1       op2dummy                    1      0.328953  0.032895

Or,

df.groupby("dummy")['returns'].agg([('op1', 'sum'), ('op2', 'mean')])            op1       op2dummy                    1      0.328953  0.032895

python pandas dataframe aggregate pandas-groupby

Would something like this work:

In [7]: df.groupby('dummy').returns.agg({'func1' : lambda x: x.sum(), 'func2' : lambda x: x.prod()})Out[7]:               func2     func1dummy                        1     -4.263768e-16 -0.188565

CodeHunter

Multiple aggregations of the same column using pandas GroupBy.agg()

Pandas >= 0.25: Named Aggregation

Pandas < 0.25

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last