How to count the element in a column and take the result as a new column?
pd.factorize
and np.bincount
My favorite. factorize
does not sort and has time complexity of O(n)
. For big data sets, factorize
should be preferred over np.unique
i, u = df.id.factorize()df.assign(Count=np.bincount(i)[i]) id Count0 1 21 1 22 3 1
np.unique
and np.bincount
u, i = np.unique(df.id, return_inverse=True)df.assign(Count=np.bincount(i)[i]) id Count0 1 21 1 22 3 1
Assign the new count
column to the dataframe by grouping on id
and then transforming that column with value_counts
(or size
).
>>> f.assign(count=f.groupby('id')['id'].transform('value_counts')) id count0 1 21 1 22 3 1
Use Series.map
with Series.value_counts
:
df['count'] = df['id'].map(df['id'].value_counts())#alternative#from collections import Counter#df['count'] = df['id'].map(Counter(df['id']))
Detail:
print (df['id'].value_counts())1 23 1Name: id, dtype: int64
Or GroupBy.transform
for return Series
with same size as original DataFrame
with GroupBy.size
:
df['count'] = df.groupby('id')['id'].transform('size')print (df) id count0 1 21 1 22 3 1