How to check whether a pandas DataFrame is empty? How to check whether a pandas DataFrame is empty? python python

How to check whether a pandas DataFrame is empty?


You can use the attribute df.empty to check whether it's empty or not:

if df.empty:    print('DataFrame is empty!')

Source: Pandas Documentation


I use the len function. It's much faster than empty. len(df.index) is even faster.

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))def empty(df):    return df.emptydef lenz(df):    return len(df) == 0def lenzi(df):    return len(df.index) == 0'''%timeit empty(df)%timeit lenz(df)%timeit lenzi(df)10000 loops, best of 3: 13.9 µs per loop100000 loops, best of 3: 2.34 µs per loop1000000 loops, best of 3: 695 ns per looplen on index seems to be faster'''


To see if a dataframe is empty, I argue that one should test for the length of a dataframe's columns index:

if len(df.columns) == 0: 1

Reason:

According to the Pandas Reference API, there is a distinction between:

  • an empty dataframe with 0 rows and 0 columns
  • an empty dataframe with rows containing NaN hence at least 1 column

Arguably, they are not the same. The other answers are imprecise in that df.empty, len(df), or len(df.index) make no distinction and return index is 0 and empty is True in both cases.

Examples

Example 1: An empty dataframe with 0 rows and 0 columns

In [1]: import pandas as pd        df1 = pd.DataFrame()        df1Out[1]: Empty DataFrame        Columns: []        Index: []In [2]: len(df1.index)  # or len(df1)Out[2]: 0In [3]: df1.emptyOut[3]: True

Example 2: A dataframe which is emptied to 0 rows but still retains n columns

In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})        df2Out[4]:    AA  BB        0   1  11        1   2  22        2   3  33In [5]: df2 = df2[df2['AA'] == 5]        df2Out[5]: Empty DataFrame        Columns: [AA, BB]        Index: []In [6]: len(df2.index)  # or len(df2)Out[6]: 0In [7]: df2.emptyOut[7]: True

Now, building on the previous examples, in which the index is 0 and empty is True. When reading the length of the columns index for the first loaded dataframe df1, it returns 0 columns to prove that it is indeed empty.

In [8]: len(df1.columns)Out[8]: 0In [9]: len(df2.columns)Out[9]: 2

Critically, while the second dataframe df2 contains no data, it is not completely empty because it returns the amount of empty columns that persist.

Why it matters

Let's add a new column to these dataframes to understand the implications:

# As expected, the empty column displays 1 seriesIn [10]: df1['CC'] = [111, 222, 333]         df1Out[10]:    CC         0 111         1 222         2 333In [11]: len(df1.columns)Out[11]: 1# Note the persisting series with rows containing `NaN` values in df2In [12]: df2['CC'] = [111, 222, 333]         df2Out[12]:    AA  BB   CC         0 NaN NaN  111         1 NaN NaN  222         2 NaN NaN  333In [13]: len(df2.columns)Out[13]: 3

It is evident that the original columns in df2 have re-surfaced. Therefore, it is prudent to instead read the length of the columns index with len(pandas.core.frame.DataFrame.columns) to see if a dataframe is empty.

Practical solution

# New dataframe dfIn [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})        dfOut[1]:    AA  BB        0   1  11        1   2  22        2   3  33# This data manipulation approach results in an empty df# because of a subset of values that are not available (`NaN`)In [2]: df = df[df['AA'] == 5]        dfOut[2]: Empty DataFrame        Columns: [AA, BB]        Index: []# NOTE: the df is empty, BUT the columns are persistentIn [3]: len(df.columns)Out[3]: 2# And accordingly, the other answers on this pageIn [4]: len(df.index)  # or len(df)Out[4]: 0In [5]: df.emptyOut[5]: True
# SOLUTION: conditionally check for empty columnsIn [6]: if len(df.columns) != 0:  # <--- here            # Do something, e.g.             # drop any columns containing rows with `NaN`            # to make the df really empty            df = df.dropna(how='all', axis=1)        dfOut[6]: Empty DataFrame        Columns: []        Index: []# Testing shows it is indeed empty nowIn [7]: len(df.columns)Out[7]: 0

Adding a new data series works as expected without the re-surfacing of empty columns (factually, without any series that were containing rows with only NaN):

In [8]: df['CC'] = [111, 222, 333]         dfOut[8]:    CC         0 111         1 222         2 333In [9]: len(df.columns)Out[9]: 1