What is the difference between size and count in pandas?

size includes NaN values, count does not:

In [46]:df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})dfOut[46]:   a   b         c0  0   1  1.0676271  0   2  0.5546912  1   3  0.4580843  2   4  0.4266354  2 NaN -2.2380915  2   4  1.256943In [48]:print(df.groupby(['a'])['b'].count())print(df.groupby(['a'])['b'].size())a0    21    12    2Name: b, dtype: int64a0    21    12    3dtype: int64

python pandas numpy nan difference

What is the difference between size and count in pandas?

The other answers have pointed out the difference, however, it is not completely accurate to say "size counts NaNs while count does not". While size does indeed count NaNs, this is actually a consequence of the fact that size returns the size (or the length) of the object it is called on. Naturally, this also includes rows/values which are NaN.

So, to summarize, size returns the size of the Series/DataFrame¹,

df = pd.DataFrame({'A': ['x', 'y', np.nan, 'z']})df     A0    x1    y2  NaN3    z

<!- _>

df.A.size# 4

...while count counts the non-NaN values:

df.A.count()# 3

Notice that size is an attribute (gives the same result as len(df) or len(df.A)). count is a function.

_{1. DataFrame.size is also an attribute and returns the number of elements in the DataFrame (rows x columns).}

Behaviour with `GroupBy` - Output Structure

Besides the basic difference, there's also the difference in the structure of the generated output when calling GroupBy.size() vs GroupBy.count().

df = pd.DataFrame({    'A': list('aaabbccc'),    'B': ['x', 'x', np.nan, np.nan,          np.nan, np.nan, 'x', 'x']})df   A    B0  a    x1  a    x2  a  NaN3  b  NaN4  b  NaN5  c  NaN6  c    x7  c    x

Consider,

df.groupby('A').size()Aa    3b    2c    3dtype: int64

Versus,

df.groupby('A').count()   BA   a  2b  0c  2

GroupBy.count returns a DataFrame when you call count on all column, while GroupBy.size returns a Series.

The reason being that size is the same for all columns, so only a single result is returned. Meanwhile, the count is called for each column, as the results would depend on on how many NaNs each column has.

Behavior with `pivot_table`

Another example is how pivot_table treats this data. Suppose we would like to compute the cross tabulation of

df   A  B0  0  11  0  12  1  23  0  24  0  0pd.crosstab(df.A, df.B)  # Result we expect, but with `pivot_table`.B  0  1  2A         0  1  2  11  0  0  1

With pivot_table, you can issue size:

df.pivot_table(index='A', columns='B', aggfunc='size', fill_value=0)B  0  1  2A         0  1  2  11  0  0  1

But count does not work; an empty DataFrame is returned:

df.pivot_table(index='A', columns='B', aggfunc='count')Empty DataFrameColumns: []Index: [0, 1]

I believe the reason for this is that 'count' must be done on the series that is passed to the values argument, and when nothing is passed, pandas decides to make no assumptions.

python pandas numpy nan difference

Just to add a little bit to @Edchum's answer, even if the data has no NA values, the result of count() is more verbose, using the example before:

grouped = df.groupby('a')grouped.count()Out[197]:    b  ca      0  2  21  1  12  2  3grouped.size()Out[198]: a0    21    12    3dtype: int64

CodeHunter

What is the difference between size and count in pandas?

What is the difference between size and count in pandas?

Behaviour with `GroupBy` - Output Structure

Behavior with `pivot_table`

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

What is the difference between size and count in pandas?

What is the difference between size and count in pandas?

Behaviour with GroupBy - Output Structure

Behavior with pivot_table

Recent Posts

Behaviour with `GroupBy` - Output Structure

Behavior with `pivot_table`