Creating a column based on multiple conditions Creating a column based on multiple conditions pandas pandas

Creating a column based on multiple conditions


Option 1

You can use nested np.where statements. For example:

df['area'] = np.where(df['x'] > df['rbound'], 'right',                       np.where(df['x'] < df['lbound'],                               'left', 'somewhere else'))

Option 2

You can use .loc accessor to assign specific ranges. Note you will have to add the new column before use. We take this opportunity to set the default, which may be overwritten later.

df['area'] = 'somewhere else'df.loc[df['x'] > df['rbound'], 'area'] = 'right'df.loc[df['x'] < df['lbound'], 'area'] = 'left'

Explanation

These are both valid alternatives with comparable performance. The calculations are vectorised in both instances. My preference is for Option 2 as it seems more readable. If there are a large number of nested criteria, np.where may be more convenient.


You can use numpy select instead of np.where

cond = [df['x'].between(df['lbound'], df['rbound']), (df['x'] < df['lbound']) , df['x'] > df['rbound'] ]output = [ 'middle', 'left', 'right']df['area'] = np.select(cond, output, default=np.nan)    lbound  rbound  x   area0   -1      1       0   middle1   5       7       1   left2   0       1       2   right