How to check whether a pandas DataFrame is empty?
You can use the attribute df.empty
to check whether it's empty or not:
if df.empty: print('DataFrame is empty!')
Source: Pandas Documentation
I use the len
function. It's much faster than empty
. len(df.index)
is even faster.
import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))def empty(df): return df.emptydef lenz(df): return len(df) == 0def lenzi(df): return len(df.index) == 0'''%timeit empty(df)%timeit lenz(df)%timeit lenzi(df)10000 loops, best of 3: 13.9 µs per loop100000 loops, best of 3: 2.34 µs per loop1000000 loops, best of 3: 695 ns per looplen on index seems to be faster'''
To see if a dataframe is empty, I argue that one should test for the length of a dataframe's columns index:
if len(df.columns) == 0: 1
Reason:
According to the Pandas Reference API, there is a distinction between:
- an empty dataframe with 0 rows and 0 columns
- an empty dataframe with rows containing
NaN
hence at least 1 column
Arguably, they are not the same. The other answers are imprecise in that df.empty
, len(df)
, or len(df.index)
make no distinction and return index is 0 and empty is True in both cases.
Examples
Example 1: An empty dataframe with 0 rows and 0 columns
In [1]: import pandas as pd df1 = pd.DataFrame() df1Out[1]: Empty DataFrame Columns: [] Index: []In [2]: len(df1.index) # or len(df1)Out[2]: 0In [3]: df1.emptyOut[3]: True
Example 2: A dataframe which is emptied to 0 rows but still retains n
columns
In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]}) df2Out[4]: AA BB 0 1 11 1 2 22 2 3 33In [5]: df2 = df2[df2['AA'] == 5] df2Out[5]: Empty DataFrame Columns: [AA, BB] Index: []In [6]: len(df2.index) # or len(df2)Out[6]: 0In [7]: df2.emptyOut[7]: True
Now, building on the previous examples, in which the index is 0 and empty is True. When reading the length of the columns index for the first loaded dataframe df1, it returns 0 columns to prove that it is indeed empty.
In [8]: len(df1.columns)Out[8]: 0In [9]: len(df2.columns)Out[9]: 2
Critically, while the second dataframe df2 contains no data, it is not completely empty because it returns the amount of empty columns that persist.
Why it matters
Let's add a new column to these dataframes to understand the implications:
# As expected, the empty column displays 1 seriesIn [10]: df1['CC'] = [111, 222, 333] df1Out[10]: CC 0 111 1 222 2 333In [11]: len(df1.columns)Out[11]: 1# Note the persisting series with rows containing `NaN` values in df2In [12]: df2['CC'] = [111, 222, 333] df2Out[12]: AA BB CC 0 NaN NaN 111 1 NaN NaN 222 2 NaN NaN 333In [13]: len(df2.columns)Out[13]: 3
It is evident that the original columns in df2 have re-surfaced. Therefore, it is prudent to instead read the length of the columns index with len(pandas.core.frame.DataFrame.columns)
to see if a dataframe is empty.
Practical solution
# New dataframe dfIn [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]}) dfOut[1]: AA BB 0 1 11 1 2 22 2 3 33# This data manipulation approach results in an empty df# because of a subset of values that are not available (`NaN`)In [2]: df = df[df['AA'] == 5] dfOut[2]: Empty DataFrame Columns: [AA, BB] Index: []# NOTE: the df is empty, BUT the columns are persistentIn [3]: len(df.columns)Out[3]: 2# And accordingly, the other answers on this pageIn [4]: len(df.index) # or len(df)Out[4]: 0In [5]: df.emptyOut[5]: True
# SOLUTION: conditionally check for empty columnsIn [6]: if len(df.columns) != 0: # <--- here # Do something, e.g. # drop any columns containing rows with `NaN` # to make the df really empty df = df.dropna(how='all', axis=1) dfOut[6]: Empty DataFrame Columns: [] Index: []# Testing shows it is indeed empty nowIn [7]: len(df.columns)Out[7]: 0
Adding a new data series works as expected without the re-surfacing of empty columns (factually, without any series that were containing rows with only NaN
):
In [8]: df['CC'] = [111, 222, 333] dfOut[8]: CC 0 111 1 222 2 333In [9]: len(df.columns)Out[9]: 1