Pandas: conditional group-specific computations

python pandas

Here's some setup of some dummy data.

In [81]: df = pd.DataFrame({'Key': ['a','a','b','b','c','c'],                             'C1': [1,2,3,4,5,6],                              'C2': [7,8,9,10,11,12]})In [82]: df['F1'] = df.groupby('Key')['C1'].transform(np.sum)In [83]: dfOut[83]:    C1  C2 Key  F10   1   7   a   31   2   8   a   32   3   9   b   73   4  10   b   74   5  11   c  115   6  12   c  11

If you want to do a conditional GroupBy, you can just filter the dataframe as it's passed to .groubpy. For example, if you wanted the group sum of 'C1' if C2 is less than 8 or greater than 9.

In [87]: cond = (df['C2'] < 8) | (df['C2'] > 9)In [88]: df['F2'] = df[cond].groupby('Key')['C1'].transform(np.sum)In [89]: dfOut[89]:    C1  C2 Key  F1  F20   1   7   a   3   11   2   8   a   3 NaN2   3   9   b   7 NaN3   4  10   b   7   44   5  11   c  11  115   6  12   c  11  11

This works because the transform operation preserves the index, so it will still align with the original dataframe correctly.

If you want to sum the group totals for two columns, probably easiest to do something like this? Someone may have something more clever.

In [93]: gb = df.groupby('Key')In [94]: df['C1+C2'] = gb['C1'].transform(np.sum) + gb['C2'].transform(np.sum)

Edit:Here's one other way to get group totals for multiple columns. The syntax isn't really any cleaner, but may be more convenient for a large number of a columns.

df['C1_C2'] = gb[['C1','C2']].apply(lambda x: pd.DataFrame(x.sum().sum(), index=x.index, columns=['']))

python pandas

I found another approach that uses apply() instead of transform(), but you need to join the result table with the input DataFrame and I just haven't figured out yet how to do it. Would appreciate help to finish the table joining part or any better alternatives.

df = pd.DataFrame({'Key': ['a','a','b','b','c','c'],                            'C1': [1,2,3,4,5,6],                            'C2': [7,8,9,10,11,12]})# Group g will be given as a DataFramedef group_feature_extractor(g):    feature_1 = (g['C1'] + g['C2']).sum()    even_C1_filter = g['C1'] % 2 == 0    feature_2 = g[even_C1_filter]['C2'].sum()    return pd.Series([feature_1, feature_2], index = ['F1', 'F2'])# Group oncegroup = df.groupby(['Key'])# Extract features from each groupgroup_features = group.apply(group_feature_extractor)## Join with the input data frame ...#


          
            
                  
                      
                      
                      

                      
                      
                          Recent Posts
                          
                              
                                  
                                      
                                  
                                  
                                      
                                      
                                          How can I color dots in a xy scatterplot according to column value?
                                      
                                      
                                      
                                          How to update a claim in ASP.NET Identity?
                                      
                                      
                                      
                                          What does {0} mean when initializing an object?
                                      
                                      
                                      
                                          Accessing members of items in a JSONArray with Java
                                      
                                      
                                      
                                          How to log SQL statements in Spring Boot?
                                      
                                      
                                      
                                          Powershell Get-WebSite name parameter is ignored
                                      
                                      
                                      
                                          How to detect scroll to bottom of html element
                                      
                                      
                                      
                                          Java synchronized method
                                      
                                      
                                      
                                          How to test controllers with CodeIgniter?
                                      
                                      
                                      
                                          Detect Visual Composer
                                      
                                      
                                      
                                          Matplotlib: Specify format of floats for tick labels
                                      
                                      
                                      
                                          Rails join a list of strings with commas and "and" before the last