Filtering pandas dataframe with multiple Boolean columns
In [82]: dOut[82]: A B C D0 John Doe 45 True False1 Jane Smith 32 False False2 Alan Holmes 55 False True3 Eric Lamar 29 True True
Solution 1:
In [83]: d.loc[d.C | d.D]Out[83]: A B C D0 John Doe 45 True False2 Alan Holmes 55 False True3 Eric Lamar 29 True True
Solution 2:
In [94]: d[d[['C','D']].any(1)]Out[94]: A B C D0 John Doe 45 True False2 Alan Holmes 55 False True3 Eric Lamar 29 True True
Solution 3:
In [95]: d.query("C or D")Out[95]: A B C D0 John Doe 45 True False2 Alan Holmes 55 False True3 Eric Lamar 29 True True
PS If you change your solution to:
df[(df['C']==True) | (df['D']==True)]
it'll work too
Pandas docs - boolean indexing
why we should NOT use "PEP complaint"
df["col_name"] is True
instead ofdf["col_name"] == True
?
In [11]: df = pd.DataFrame({"col":[True, True, True]})In [12]: dfOut[12]: col0 True1 True2 TrueIn [13]: df["col"] is TrueOut[13]: False # <----- oops, that's not exactly what we wanted
Hooray! More options!
np.where
df[np.where(df.C | df.D, True, False)] A B C D0 John Doe 45 True False2 Alan Holmes 55 False True3 Eric Lamar 29 True True
pd.Series.where
on df.index
df.loc[df.index.where(df.C | df.D).dropna()] A B C D0.0 John Doe 45 True False2.0 Alan Holmes 55 False True3.0 Eric Lamar 29 True True
df.select_dtypes
df[df.select_dtypes([bool]).any(1)] A B C D0 John Doe 45 True False2 Alan Holmes 55 False True3 Eric Lamar 29 True True
Abusing np.select
df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates() A B C D0 John Doe 45 True False2 Alan Holmes 55 False True3 Eric Lamar 29 True True