How to find which columns contain any NaN value in Pandas dataframe How to find which columns contain any NaN value in Pandas dataframe python python

How to find which columns contain any NaN value in Pandas dataframe


UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

In [71]: dfOut[71]:     a    b  c0  NaN  7.0  01  0.0  NaN  42  2.0  NaN  43  1.0  7.0  04  1.0  3.0  95  7.0  4.0  96  2.0  6.0  97  9.0  6.0  48  3.0  0.0  99  9.0  0.0  1In [72]: df.isna().any()Out[72]:a     Trueb     Truec    Falsedtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]Out[73]:     a    b0  NaN  7.01  0.0  NaN2  2.0  NaN3  1.0  7.04  1.0  3.05  7.0  4.06  2.0  6.07  9.0  6.08  3.0  0.09  9.0  0.0

OLD answer:

Try to use isnull():

In [97]: dfOut[97]:     a    b  c0  NaN  7.0  01  0.0  NaN  42  2.0  NaN  43  1.0  7.0  04  1.0  3.0  95  7.0  4.0  96  2.0  6.0  97  9.0  6.0  48  3.0  0.0  99  9.0  0.0  1In [98]: pd.isnull(df).sum() > 0Out[98]:a     Trueb     Truec    Falsedtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()Out[5]:a     Trueb     Truec    Falsedtype: boolIn [7]: df.columns[df.isnull().any()].tolist()Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]Out[31]:     a    b0  NaN  7.01  0.0  NaN2  2.0  NaN3  1.0  7.04  1.0  3.05  7.0  4.06  2.0  6.07  9.0  6.08  3.0  0.09  9.0  0.0


You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.


I had a problem where I had to many columns to visually inspect on the screen so a shortlist comp that filters and returns the offending columns is

nan_cols = [i for i in df.columns if df[i].isnull().any()]

if that's helpful to anyone

Adding to that if you want to filter out columns having more nan values than a threshold, say 85% then use

nan_cols85 = [i for i in df.columns if df[i].isnull().sum() > 0.85*len(data)]