Creating a column based on multiple conditions
Option 1
You can use nested np.where
statements. For example:
df['area'] = np.where(df['x'] > df['rbound'], 'right', np.where(df['x'] < df['lbound'], 'left', 'somewhere else'))
Option 2
You can use .loc
accessor to assign specific ranges. Note you will have to add the new column before use. We take this opportunity to set the default, which may be overwritten later.
df['area'] = 'somewhere else'df.loc[df['x'] > df['rbound'], 'area'] = 'right'df.loc[df['x'] < df['lbound'], 'area'] = 'left'
Explanation
These are both valid alternatives with comparable performance. The calculations are vectorised in both instances. My preference is for Option 2 as it seems more readable. If there are a large number of nested criteria, np.where
may be more convenient.
You can use numpy select instead of np.where
cond = [df['x'].between(df['lbound'], df['rbound']), (df['x'] < df['lbound']) , df['x'] > df['rbound'] ]output = [ 'middle', 'left', 'right']df['area'] = np.select(cond, output, default=np.nan) lbound rbound x area0 -1 1 0 middle1 5 7 1 left2 0 1 2 right