pandas groupby sort within groups pandas groupby sort within groups pandas pandas

pandas groupby sort within groups


You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)Out[35]:    count     job source4      7   sales      E2      6   sales      C1      4   sales      B5      5  market      A8      4  market      D6      3  market      B


What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)Out[65]:job     sourcemarket  A         5        D         4        B         3sales   E         7        C         6        B         4dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)


Here's other example of taking top 3 on sorted order, and sorting within the groups:

In [43]: import pandas as pd                                                                                                                                                       In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})In [45]: df                                                                                                                                                                        Out[45]:    count_1  count_2  name0        5      100   Foo1       10      150   Foo2       12      100  Baar3       15       25   Foo4       20      250  Baar5       25      300   Foo6       30      400  Baar7       35      500  Baar### Top 3 on sorted order:In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                               Out[46]: name   Baar  7    35      6    30      4    20Foo   5    25      3    15      1    10dtype: int64### Sorting within groups based on column "count_1":In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)Out[48]:    count_1  count_2  name0       35      500  Baar1       30      400  Baar2       20      250  Baar3       12      100  Baar4       25      300   Foo5       15       25   Foo6       10      150   Foo7        5      100   Foo