vectorize conditional assignment in pandas dataframe vectorize conditional assignment in pandas dataframe numpy numpy

vectorize conditional assignment in pandas dataframe


One simple method would be to assign the default value first and then perform 2 loc calls:

In [66]:df = pd.DataFrame({'x':[0,-3,5,-1,1]})dfOut[66]:   x0  01 -32  53 -14  1In [69]:df['y'] = 0df.loc[df['x'] < -2, 'y'] = 1df.loc[df['x'] > 2, 'y'] = -1dfOut[69]:   x  y0  0  01 -3  12  5 -13 -1  04  1  0

If you wanted to use np.where then you could do it with a nested np.where:

In [77]:df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))dfOut[77]:   x  y0  0  01 -3  12  5 -13 -1  04  1  0

So here we define the first condition as where x is less than -2, return 1, then we have another np.where which tests the other condition where x is greater than 2 and returns -1, otherwise return 0

timings

In [79]:%timeit df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))1000 loops, best of 3: 1.79 ms per loopIn [81]:%%timeitdf['y'] = 0df.loc[df['x'] < -2, 'y'] = 1df.loc[df['x'] > 2, 'y'] = -1100 loops, best of 3: 3.27 ms per loop

So for this sample dataset the np.where method is twice as fast


This is a good use case for pd.cut where you define ranges and based on those ranges you can assign labels:

df['y'] = pd.cut(df['x'], [-np.inf, -2, 2, np.inf], labels=[1, 0, -1], right=False)

Output

   x  y0  0  01 -3  12  5 -13 -1  04  1  0