Pandas conditional creation of a series/dataframe column
If you only have two choices to select from:
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
For example,
import pandas as pdimport numpy as npdf = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})df['color'] = np.where(df['Set']=='Z', 'green', 'red')print(df)
yields
Set Type color0 Z A green1 Z B green2 X B red3 Y C red
If you have more than two conditions then use np.select
. For example, if you want color
to be
yellow
when(df['Set'] == 'Z') & (df['Type'] == 'A')
- otherwise
blue
when(df['Set'] == 'Z') & (df['Type'] == 'B')
- otherwise
purple
when(df['Type'] == 'B')
- otherwise
black
,
then use
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})conditions = [ (df['Set'] == 'Z') & (df['Type'] == 'A'), (df['Set'] == 'Z') & (df['Type'] == 'B'), (df['Type'] == 'B')]choices = ['yellow', 'blue', 'purple']df['color'] = np.select(conditions, choices, default='black')print(df)
which yields
Set Type color0 Z A yellow1 Z B blue2 X B purple3 Y C black
List comprehension is another way to create another column conditionally. If you are working with object dtypes in columns, like in your example, list comprehensions typically outperform most other methods.
Example list comprehension:
df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit tests:
import pandas as pdimport numpy as npdf = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')1000 loops, best of 3: 239 µs per loop1000 loops, best of 3: 523 µs per loop1000 loops, best of 3: 263 µs per loop