How to drop a list of rows from Pandas dataframe?
Use DataFrame.drop and pass it a Series of index labels:
In [65]: dfOut[65]: one twoone 1 4two 2 3three 3 2four 4 1In [66]: df.drop(df.index[[1,3]])Out[66]: one twoone 1 4three 3 2
Note that it may be important to use the "inplace" command when you want to do the drop in line.
df.drop(df.index[[1,3]], inplace=True)
Because your original question is not returning anything, this command should be used.http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html
If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[])
takes too much time.
In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols
, and I need to remove 10k
rows from it. The fastest method I found is, quite counterintuitively, to take
the remaining rows.
Let indexes_to_drop
be an array of positional indexes to drop ([1, 2, 4]
in the question).
indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)df_sliced = df.take(list(indexes_to_keep))
In my case this took 20.5s
, while the simple df.drop
took 5min 27s
and consumed a lot of memory. The resulting DataFrame is the same.