Get the row(s) which have the max value in groups using groupby Get the row(s) which have the max value in groups using groupby python python

Get the row(s) which have the max value in groups using groupby


In [1]: dfOut[1]:    Sp  Mt Value  count0  MM1  S1     a      31  MM1  S1     n      22  MM1  S3    cb      53  MM2  S3    mk      84  MM2  S4    bg     105  MM2  S4   dgd      16  MM4  S2    rd      27  MM4  S2    cb      28  MM4  S2   uyi      7In [2]: df.groupby(['Mt'], sort=False)['count'].max()Out[2]:MtS1     3S3     8S4    10S2     7Name: count

To get the indices of the original DF you can do:

In [3]: idx = df.groupby(['Mt'])['count'].transform(max) == df['count']In [4]: df[idx]Out[4]:    Sp  Mt Value  count0  MM1  S1     a      33  MM2  S3    mk      84  MM2  S4    bg     108  MM4  S2   uyi      7

Note that if you have multiple max values per group, all will be returned.

Update

On a hail mary chance that this is what the OP is requesting:

In [5]: df['count_max'] = df.groupby(['Mt'])['count'].transform(max)In [6]: dfOut[6]:    Sp  Mt Value  count  count_max0  MM1  S1     a      3          31  MM1  S1     n      2          32  MM1  S3    cb      5          83  MM2  S3    mk      8          84  MM2  S4    bg     10         105  MM2  S4   dgd      1         106  MM4  S2    rd      2          77  MM4  S2    cb      2          78  MM4  S2   uyi      7          7


You can sort the dataFrame by count and then remove duplicates. I think it's easier:

df.sort_values('count', ascending=False).drop_duplicates(['Sp','Mt'])


Easy solution would be to apply : idxmax() function to get indices of rows with max values. This would filter out all the rows with max value in the group.

In [365]: import pandas as pdIn [366]: df = pd.DataFrame({'sp' : ['MM1', 'MM1', 'MM1', 'MM2', 'MM2', 'MM2', 'MM4', 'MM4','MM4'],'mt' : ['S1', 'S1', 'S3', 'S3', 'S4', 'S4', 'S2', 'S2', 'S2'],'val' : ['a', 'n', 'cb', 'mk', 'bg', 'dgb', 'rd', 'cb', 'uyi'],'count' : [3,2,5,8,10,1,2,2,7]})In [367]: df                                                                                                       Out[367]:    count  mt   sp  val0      3  S1  MM1    a1      2  S1  MM1    n2      5  S3  MM1   cb3      8  S3  MM2   mk4     10  S4  MM2   bg5      1  S4  MM2  dgb6      2  S2  MM4   rd7      2  S2  MM4   cb8      7  S2  MM4  uyi### Apply idxmax() and use .loc() on dataframe to filter the rows with max values:In [368]: df.loc[df.groupby(["sp", "mt"])["count"].idxmax()]                                                       Out[368]:    count  mt   sp  val0      3  S1  MM1    a2      5  S3  MM1   cb3      8  S3  MM2   mk4     10  S4  MM2   bg8      7  S2  MM4  uyi### Just to show what values are returned by .idxmax() above:In [369]: df.groupby(["sp", "mt"])["count"].idxmax().values                                                        Out[369]: array([0, 2, 3, 4, 8])


matomo