Group by two columns and count the occurrences of each combination in Pandas
Maybe this is what you want?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})>>> count_series = data.groupby(['user_id', 'product_id']).size()>>> count_seriesuser_id product_ida1 p1 2 p2 1a2 p1 3a3 p2 2 p3 1dtype: int64>>> new_df = count_series.to_frame(name = 'size').reset_index()>>> new_df user_id product_id size0 a1 p1 21 a1 p2 12 a2 p1 33 a3 p2 24 a3 p3 1>>> new_df['size']0 21 12 33 24 1Name: size, dtype: int64
In Pandas 1.1.0 you can use the method value_counts
with DataFrames:
df.value_counts()
Output:
product_id user_idp1 a2 3p2 a3 2p1 a1 2p3 a3 1p2 a1 1
If you need a DataFrame:
df.value_counts().to_frame('counts').reset_index()
Output:
product_id user_id counts0 p1 a2 31 p2 a3 22 p1 a1 23 p3 a3 14 p2 a1 1
Building a little on @Nehal's response if you want to create another column with the results combined:
you can combine the product_id and size columns:
new_df['combo'] = new_df['product_id'].map(str) + '(' + new_df['size'].map(str) +')'
...and then use groupby again to organize that column by 'user_id'
new_df['combo'].astype(str).groupby(new_df['userid']).agg(['size',', '.join])