Pandas conditional creation of a series/dataframe column Pandas conditional creation of a series/dataframe column python python

Pandas conditional creation of a series/dataframe column


If you only have two choices to select from:

df['color'] = np.where(df['Set']=='Z', 'green', 'red')

For example,

import pandas as pdimport numpy as npdf = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})df['color'] = np.where(df['Set']=='Z', 'green', 'red')print(df)

yields

  Set Type  color0   Z    A  green1   Z    B  green2   X    B    red3   Y    C    red

If you have more than two conditions then use np.select. For example, if you want color to be

  • yellow when (df['Set'] == 'Z') & (df['Type'] == 'A')
  • otherwise blue when (df['Set'] == 'Z') & (df['Type'] == 'B')
  • otherwise purple when (df['Type'] == 'B')
  • otherwise black,

then use

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})conditions = [    (df['Set'] == 'Z') & (df['Type'] == 'A'),    (df['Set'] == 'Z') & (df['Type'] == 'B'),    (df['Type'] == 'B')]choices = ['yellow', 'blue', 'purple']df['color'] = np.select(conditions, choices, default='black')print(df)

which yields

  Set Type   color0   Z    A  yellow1   Z    B    blue2   X    B  purple3   Y    C   black


List comprehension is another way to create another column conditionally. If you are working with object dtypes in columns, like in your example, list comprehensions typically outperform most other methods.

Example list comprehension:

df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]

%timeit tests:

import pandas as pdimport numpy as npdf = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')1000 loops, best of 3: 239 µs per loop1000 loops, best of 3: 523 µs per loop1000 loops, best of 3: 263 µs per loop


Another way in which this could be achieved is

df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')