Group by two columns and count the occurrences of each combination in Pandas Group by two columns and count the occurrences of each combination in Pandas python python

Group by two columns and count the occurrences of each combination in Pandas


Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})>>> count_series = data.groupby(['user_id', 'product_id']).size()>>> count_seriesuser_id  product_ida1       p1            2         p2            1a2       p1            3a3       p2            2         p3            1dtype: int64>>> new_df = count_series.to_frame(name = 'size').reset_index()>>> new_df  user_id product_id  size0      a1         p1     21      a1         p2     12      a2         p1     33      a3         p2     24      a3         p3     1>>> new_df['size']0    21    12    33    24    1Name: size, dtype: int64


In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts()

Output:

product_id  user_idp1          a2         3p2          a3         2p1          a1         2p3          a3         1p2          a1         1

If you need a DataFrame:

df.value_counts().to_frame('counts').reset_index()

Output:

  product_id user_id  counts0         p1      a2       31         p2      a3       22         p1      a1       23         p3      a3       14         p2      a1       1


Building a little on @Nehal's response if you want to create another column with the results combined:

you can combine the product_id and size columns:

new_df['combo'] = new_df['product_id'].map(str) + '(' + new_df['size'].map(str) +')'

...and then use groupby again to organize that column by 'user_id'

new_df['combo'].astype(str).groupby(new_df['userid']).agg(['size',', '.join])