How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly? How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly? python python

How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?


[Updated to adapt to modern pandas, which has isnull as a method of DataFrames..]

You can use isnull and any to build a boolean Series and use that to index into your frame:

>>> df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])>>> df.isnull()       0      1      20  False  False  False1  False   True  False2  False  False   True3  False  False  False4  False  False  False>>> df.isnull().any(axis=1)0    False1     True2     True3    False4    Falsedtype: bool>>> df[df.isnull().any(axis=1)]   0   1   21  0 NaN   02  0   0 NaN

[For older pandas:]

You could use the function isnull instead of the method:

In [56]: df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])In [57]: dfOut[57]:    0   1   20  0   1   21  0 NaN   02  0   0 NaN3  0   1   24  0   1   2In [58]: pd.isnull(df)Out[58]:        0      1      20  False  False  False1  False   True  False2  False  False   True3  False  False  False4  False  False  FalseIn [59]: pd.isnull(df).any(axis=1)Out[59]: 0    False1     True2     True3    False4    False

leading to the rather compact:

In [60]: df[pd.isnull(df).any(axis=1)]Out[60]:    0   1   21  0 NaN   02  0   0 NaN


def nans(df): return df[df.isnull().any(axis=1)]

then when ever you need it you can type:

nans(your_dataframe)


If you want to filter rows by a certain number of columns with null values, you may use this:

df.iloc[df[(df.isnull().sum(axis=1) >= qty_of_nuls)].index]

So, here is the example:

Your dataframe:

>>> df = pd.DataFrame([range(4), [0, np.NaN, 0, np.NaN], [0, 0, np.NaN, 0], range(4), [np.NaN, 0, np.NaN, np.NaN]])>>> df     0    1    2    30  0.0  1.0  2.0  3.01  0.0  NaN  0.0  NaN2  0.0  0.0  NaN  0.03  0.0  1.0  2.0  3.04  NaN  0.0  NaN  NaN

If you want to select the rows that have two or more columns with null value, you run the following:

>>> qty_of_nuls = 2>>> df.iloc[df[(df.isnull().sum(axis=1) >=qty_of_nuls)].index]     0    1    2   31  0.0  NaN  0.0 NaN4  NaN  0.0  NaN NaN