How to drop rows of Pandas DataFrame whose value in a certain column is NaN How to drop rows of Pandas DataFrame whose value in a certain column is NaN python python

How to drop rows of Pandas DataFrame whose value in a certain column is NaN


Don't drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()]


This question is already resolved, but...

...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

In [24]: df = pd.DataFrame(np.random.randn(10,3))In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;In [26]: dfOut[26]:          0         1         20       NaN       NaN       NaN1  2.677677 -1.466923 -0.7503662       NaN  0.798002 -0.9060383  0.672201  0.964789       NaN4       NaN       NaN  0.0507425 -1.250970  0.030561 -2.6786226       NaN  1.036043       NaN7  0.049896 -0.308003  0.8232958       NaN       NaN  0.6374829 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN valuesOut[27]:          0         1         21  2.677677 -1.466923 -0.7503665 -1.250970  0.030561 -2.6786227  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaNOut[28]:          0         1         21  2.677677 -1.466923 -0.7503662       NaN  0.798002 -0.9060383  0.672201  0.964789       NaN4       NaN       NaN  0.0507425 -1.250970  0.030561 -2.6786226       NaN  1.036043       NaN7  0.049896 -0.308003  0.8232958       NaN       NaN  0.6374829 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaNOut[29]:          0         1         21  2.677677 -1.466923 -0.7503662       NaN  0.798002 -0.9060383  0.672201  0.964789       NaN5 -1.250970  0.030561 -2.6786227  0.049896 -0.308003  0.8232959 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)Out[30]:          0         1         21  2.677677 -1.466923 -0.7503662       NaN  0.798002 -0.9060383  0.672201  0.964789       NaN5 -1.250970  0.030561 -2.6786226       NaN  1.036043       NaN7  0.049896 -0.308003  0.8232959 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!


I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:

import pandas as pddf = df[pd.notnull(df['EPS'])]