Count unique values per groups with Pandas [duplicate] Count unique values per groups with Pandas [duplicate] python python

Count unique values per groups with Pandas [duplicate]


You need nunique:

df = df.groupby('domain')['ID'].nunique()print (df)domain'facebook.com'    1'google.com'      1'twitter.com'     2'vk.com'          3Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()print (df)domainfacebook.com    1google.com      1twitter.com     2vk.com          3Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})print(df)    domain  ID0       fb   11      ggl   12  twitter   23       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.


Generally to count distinct values in single column, you can use Series.value_counts:

df.domain.value_counts()#'vk.com'          5#'twitter.com'     2#'facebook.com'    1#'google.com'      1#Name: domain, dtype: int64

To see how many unique values in a column, use Series.nunique:

df.domain.nunique()# 4

To get all these distinct values, you can use unique or drop_duplicates, the slight difference between the two functions is that unique return a numpy.array while drop_duplicates returns a pandas.Series:

df.domain.unique()# array(["'vk.com'", "'twitter.com'", "'facebook.com'", "'google.com'"], dtype=object)df.domain.drop_duplicates()#0          'vk.com'#2     'twitter.com'#4    'facebook.com'#6      'google.com'#Name: domain, dtype: object

As for this specific problem, since you'd like to count distinct value with respect to another variable, besides groupby method provided by other answers here, you can also simply drop duplicates firstly and then do value_counts():

import pandas as pddf.drop_duplicates().domain.value_counts()# 'vk.com'          3# 'twitter.com'     2# 'facebook.com'    1# 'google.com'      1# Name: domain, dtype: int64


df.domain.value_counts()

>>> df.domain.value_counts()vk.com          5twitter.com     2google.com      1facebook.com    1Name: domain, dtype: int64