IF ELSE using Numpy and Pandas
You almost had it, you needed to pass a raw string (prepend with r
) to contains
so it treats it as a regex:
In [115]:df['NewColumn'] = np.where(df.ICD.str.contains(r'313.2|414.2').astype(int), 1, np.where(((df.ICD.str.contains(r'313.2|414.2').astype(int))&(df['Date1']>df['Date2'])), 2, 0))dfOut[115]: PatientID Date1 Date2 ICD NewColumn0 1234 2010-12-14 2010-12-12 313.2,414.2,228.1 11 3213 2010-08-02 2012-09-05 232.1,221.0 0
You get 1 returned because it short circuits on the first condition because that is met, if you want to get 2 returned then you need to rearrange the order of evaluation:
In [122]:df['NewColumn'] = np.where( (df.ICD.str.contains(r'313.2|414.2').astype(int)) & ( df['Date1'] > df['Date2'] ), 2 , np.where( df.ICD.str.contains(r'313.2|414.2').astype(int), 1, 0 ) )dfOut[122]: PatientID Date1 Date2 ICD NewColumn0 1234 2010-12-14 2010-12-12 313.2,414.2,228.1 21 3213 2010-08-02 2012-09-05 232.1,221.0 0
It is much easier to use the pandas functionality itself. Using numpy to do something that pandas already does is a good way to get unexpected behaviour.
Assuming you want to check for a cell value containing 313.2 only (so 2313.25 returns False).
df['ICD'].astype(str) == '313.2'
returns a Series Object of True or False next to each index entry.
so
boolean =(df['ICD'].astype(str) == '313.2')| (df['ICD'].astype(str) == '414.2')if(boolean.any()): #do something return 1 boolean2 =((df['ICD'].astype(str) == '313.2')| (df['ICD'].astype(str) == '414.2'))&(df['Date1']>df['Date2'])if(boolean2.any()): return 2
etc
Pandas also has the function isin() which can simplify things further.
The docs are here: http://pandas.pydata.org/pandas-docs/stable/indexing.html
Also, you do not return two because of the order you evaluate the conditional statements.In any circumstance where condition 2 evaluates as true, condition 1 must evaluate to be true also. So as you test condition 1 too, it always returns 1 or passes.
In short, you need to test condition 2 first, as there is no circumstance where 1 can be false and 2 can be true.