How to delete rows from a pandas DataFrame based on a conditional expression [duplicate]
To directly answer this question's original title "How to delete rows from a pandas DataFrame based on a conditional expression" (which I understand is not necessarily the OP's problem but could help other users coming across this question) one way to do this is to use the drop method:
df = df.drop(some labels)df = df.drop(df[<some boolean condition>].index)
Example
To remove all rows where column 'score' is < 50:
df = df.drop(df[df.score < 50].index)
In place version (as pointed out in comments)
df.drop(df[df.score < 50].index, inplace=True)
Multiple conditions
(see Boolean Indexing)
The operators are:
|
foror
,&
forand
, and~
fornot
. These must begrouped by using parentheses.
To remove all rows where column 'score' is < 50 and > 20
df = df.drop(df[(df.score < 50) & (df.score > 20)].index)
You can assign the DataFrame
to a filtered version of itself:
df = df[df.score > 50]
This is faster than drop
:
%%timeittest = pd.DataFrame({'x': np.random.randn(int(1e6))})test = test[test.x < 0]# 54.5 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%%timeittest = pd.DataFrame({'x': np.random.randn(int(1e6))})test.drop(test[test.x > 0].index, inplace=True)# 201 ms ± 17.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%%timeittest = pd.DataFrame({'x': np.random.randn(int(1e6))})test = test.drop(test[test.x > 0].index)# 194 ms ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)