How do I get the row count of a Pandas DataFrame?
For a dataframe df
, one can use any of the following:
len(df.index)
df.shape[0]
df[df.columns[0]].count()
(slowest, but avoids counting NaN values in the first column)
Code to reproduce the plot:
import numpy as npimport pandas as pdimport perfplotperfplot.save( "out.png", setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)), n_range=[2**k for k in range(25)], kernels=[ lambda df: len(df.index), lambda df: df.shape[0], lambda df: df[df.columns[0]].count(), ], labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"], xlabel="Number of rows",)
Use len(df)
:-).
__len__()
is documented with "Returns length of index".
Timing info, set up the same way as in root's answer:
In [7]: timeit len(df.index)1000000 loops, best of 3: 248 ns per loopIn [8]: timeit len(df)1000000 loops, best of 3: 573 ns per loop
Due to one additional function call, it is of course correct to say that it is a bit slower than calling len(df.index)
directly. But this should not matter in most cases. I find len(df)
to be quite readable.