Pandas: conditional group-specific computations Pandas: conditional group-specific computations pandas pandas

Pandas: conditional group-specific computations


Here's some setup of some dummy data.

In [81]: df = pd.DataFrame({'Key': ['a','a','b','b','c','c'],                             'C1': [1,2,3,4,5,6],                              'C2': [7,8,9,10,11,12]})In [82]: df['F1'] = df.groupby('Key')['C1'].transform(np.sum)In [83]: dfOut[83]:    C1  C2 Key  F10   1   7   a   31   2   8   a   32   3   9   b   73   4  10   b   74   5  11   c  115   6  12   c  11

If you want to do a conditional GroupBy, you can just filter the dataframe as it's passed to .groubpy. For example, if you wanted the group sum of 'C1' if C2 is less than 8 or greater than 9.

In [87]: cond = (df['C2'] < 8) | (df['C2'] > 9)In [88]: df['F2'] = df[cond].groupby('Key')['C1'].transform(np.sum)In [89]: dfOut[89]:    C1  C2 Key  F1  F20   1   7   a   3   11   2   8   a   3 NaN2   3   9   b   7 NaN3   4  10   b   7   44   5  11   c  11  115   6  12   c  11  11

This works because the transform operation preserves the index, so it will still align with the original dataframe correctly.

If you want to sum the group totals for two columns, probably easiest to do something like this? Someone may have something more clever.

In [93]: gb = df.groupby('Key')In [94]: df['C1+C2'] = gb['C1'].transform(np.sum) + gb['C2'].transform(np.sum)

Edit:Here's one other way to get group totals for multiple columns. The syntax isn't really any cleaner, but may be more convenient for a large number of a columns.

df['C1_C2'] = gb[['C1','C2']].apply(lambda x: pd.DataFrame(x.sum().sum(), index=x.index, columns=['']))


I found another approach that uses apply() instead of transform(), but you need to join the result table with the input DataFrame and I just haven't figured out yet how to do it. Would appreciate help to finish the table joining part or any better alternatives.

df = pd.DataFrame({'Key': ['a','a','b','b','c','c'],                            'C1': [1,2,3,4,5,6],                            'C2': [7,8,9,10,11,12]})# Group g will be given as a DataFramedef group_feature_extractor(g):    feature_1 = (g['C1'] + g['C2']).sum()    even_C1_filter = g['C1'] % 2 == 0    feature_2 = g[even_C1_filter]['C2'].sum()    return pd.Series([feature_1, feature_2], index = ['F1', 'F2'])# Group oncegroup = df.groupby(['Key'])# Extract features from each groupgroup_features = group.apply(group_feature_extractor)## Join with the input data frame ...#