How to find which columns contain any NaN value in Pandas dataframe
UPDATE: using Pandas 0.22.0
Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'
In [71]: dfOut[71]: a b c0 NaN 7.0 01 0.0 NaN 42 2.0 NaN 43 1.0 7.0 04 1.0 3.0 95 7.0 4.0 96 2.0 6.0 97 9.0 6.0 48 3.0 0.0 99 9.0 0.0 1In [72]: df.isna().any()Out[72]:a Trueb Truec Falsedtype: bool
as list of columns:
In [74]: df.columns[df.isna().any()].tolist()Out[74]: ['a', 'b']
to select those columns (containing at least one NaN
value):
In [73]: df.loc[:, df.isna().any()]Out[73]: a b0 NaN 7.01 0.0 NaN2 2.0 NaN3 1.0 7.04 1.0 3.05 7.0 4.06 2.0 6.07 9.0 6.08 3.0 0.09 9.0 0.0
OLD answer:
Try to use isnull():
In [97]: dfOut[97]: a b c0 NaN 7.0 01 0.0 NaN 42 2.0 NaN 43 1.0 7.0 04 1.0 3.0 95 7.0 4.0 96 2.0 6.0 97 9.0 6.0 48 3.0 0.0 99 9.0 0.0 1In [98]: pd.isnull(df).sum() > 0Out[98]:a Trueb Truec Falsedtype: bool
or as @root proposed clearer version:
In [5]: df.isnull().any()Out[5]:a Trueb Truec Falsedtype: boolIn [7]: df.columns[df.isnull().any()].tolist()Out[7]: ['a', 'b']
to select a subset - all columns containing at least one NaN
value:
In [31]: df.loc[:, df.isnull().any()]Out[31]: a b0 NaN 7.01 0.0 NaN2 2.0 NaN3 1.0 7.04 1.0 3.05 7.0 4.06 2.0 6.07 9.0 6.08 3.0 0.09 9.0 0.0
I had a problem where I had to many columns to visually inspect on the screen so a shortlist comp that filters and returns the offending columns is
nan_cols = [i for i in df.columns if df[i].isnull().any()]
if that's helpful to anyone
Adding to that if you want to filter out columns having more nan values than a threshold, say 85% then use
nan_cols85 = [i for i in df.columns if df[i].isnull().sum() > 0.85*len(data)]