Pandas - Groupby with conditional formula Pandas - Groupby with conditional formula pandas pandas

Pandas - Groupby with conditional formula


An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')df.groupby(family)['Survived'].mean()Out: Has Family    0.5No Family     1.0Name: Survived, dtype: float64


Use only one condition if never values in columns SibSp and Parch are less as 0:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()print (df)Has Family    0.5No Family     1.0Name: Survived, dtype: float64

If is impossible use first use both conditions:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)a = np.where(m1, 'Has Family',     np.where(m2, 'No Family', 'Not'))df = df.groupby(a)['Survived'].mean()print (df)Has Family    0.5No Family     1.0Name: Survived, dtype: float64


You could define your conditions in a list and use the function group_by_condition below to create a filtered list for each condition. Afterwards you can select the resulting items using pattern matching:

df = [  {"Survived": 0, "SibSp": 1, "Parch": 0},  {"Survived": 1, "SibSp": 1, "Parch": 0},  {"Survived": 1, "SibSp": 0, "Parch": 0}]conditions = [  lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0),  # has family  lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0)  # no family]def group_by_condition(l, conditions):    return [[item for item in l if condition(item)] for condition in conditions][has_family, no_family] = group_by_condition(df, conditions)