How to drop rows of Pandas DataFrame whose value in a certain column is NaN
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;In [26]: dfOut[26]: 0 1 20 NaN NaN NaN1 2.677677 -1.466923 -0.7503662 NaN 0.798002 -0.9060383 0.672201 0.964789 NaN4 NaN NaN 0.0507425 -1.250970 0.030561 -2.6786226 NaN 1.036043 NaN7 0.049896 -0.308003 0.8232958 NaN NaN 0.6374829 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN valuesOut[27]: 0 1 21 2.677677 -1.466923 -0.7503665 -1.250970 0.030561 -2.6786227 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaNOut[28]: 0 1 21 2.677677 -1.466923 -0.7503662 NaN 0.798002 -0.9060383 0.672201 0.964789 NaN4 NaN NaN 0.0507425 -1.250970 0.030561 -2.6786226 NaN 1.036043 NaN7 0.049896 -0.308003 0.8232958 NaN NaN 0.6374829 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaNOut[29]: 0 1 21 2.677677 -1.466923 -0.7503662 NaN 0.798002 -0.9060383 0.672201 0.964789 NaN5 -1.250970 0.030561 -2.6786227 0.049896 -0.308003 0.8232959 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)Out[30]: 0 1 21 2.677677 -1.466923 -0.7503662 NaN 0.798002 -0.9060383 0.672201 0.964789 NaN5 -1.250970 0.030561 -2.6786226 NaN 1.036043 NaN7 0.049896 -0.308003 0.8232959 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pddf = df[pd.notnull(df['EPS'])]