conditional sums for pandas aggregate

To complement unutbu's answer, here's an approach using apply on the groupby object.

>>> df.groupby('A_id').apply(lambda x: pd.Series(dict(    sum_up=(x.B == 'up').sum(),    sum_down=(x.B == 'down').sum(),    over_200_up=((x.B == 'up') & (x.C > 200)).sum())))      over_200_up  sum_down  sum_upA_id                               a1              0         0       1a2              0         1       0a3              1         0       2a4              0         0       0a5              0         0       0

python r pandas data.table

There might be a better way; I'm pretty new to pandas, but this works:

import pandas as pdimport numpy as npdf = pd.DataFrame({'A_id':'a1 a2 a3 a3 a4 a5'.split(),                   'B': 'up down up up left right'.split(),                   'C': [100, 102, 100, 250, 100, 102]})df['D'] = (df['B']=='up') & (df['C'] > 200)grouped = df.groupby(['A_id'])def sum_up(grp):    return np.sum(grp=='up')def sum_down(grp):    return np.sum(grp=='down')def over_200_up(grp):    return np.sum(grp)result = grouped.agg({'B': [sum_up, sum_down],                      'D': [over_200_up]})result.columns = [col[1] for col in result.columns]print(result)

yields

      sum_up  sum_down  over_200_upA_id                               a1         1         0            0a2         0         1            0a3         2         0            1a4         0         0            0a5         0         0            0

python r pandas data.table

An old question; I feel a better way, and avoiding the apply, would be to create a new dataframe, before grouping and aggregating:

df = df.set_index('A_id')outcome = {'sum_up' : df.B.eq('up'),           'sum_down': df.B.eq('down'),           'over_200_up' : df.B.eq('up') & df.C.gt(200)}outcome = pd.DataFrame(outcome).groupby(level=0).sum()outcome       sum_up  sum_down  over_200_upA_id                               a1         1         0            0a2         0         1            0a3         2         0            1a4         0         0            0a5         0         0            0

CodeHunter

conditional sums for pandas aggregate

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last