Finding max occurrence of a column's value, after group-by on another column Finding max occurrence of a column's value, after group-by on another column pandas pandas

Finding max occurrence of a column's value, after group-by on another column


You can try double groupby with size and idxmax. Output is list of tuples (because MultiIndex), so use apply:

df = df.groupby(['id','city']).size().groupby(level=0).idxmax()                              .apply(lambda x: x[1]).reset_index(name='city')

Another solutions:

s = df.groupby(['id','city']).size()df = s.loc[s.groupby(level=0).idxmax()].reset_index().drop(0,axis=1)

Or:

df = df.groupby(['id'])['city'].apply(lambda x: x.value_counts().index[0]).reset_index()

print (df)                     id        city0  000.tushar@gmail.com   Bangalore1      00078r@gmail.com  Vijayawada2    0007ayan@gmail.com  Jamshedpur


The recommended approach is groupby('id').apply(your_custom_function), where your_custom_function aggregates by 'city' and returns the max value (or as you mentioned, multiple max values). We don't even have to use .agg('city')

import pandas as pddef get_top_city(g):    return g['city'].value_counts().idxmax()    df = pd.DataFrame.from_records(         [('000.tushar@gmail.com', 'Bangalore'), ('00078r@gmail.com',     'Mumbai'),         ('0007ayan@gmail.com',   'Jamshedpur'),('0007ayan@gmail.com',   'Jamshedpur'),         ('000.tushar@gmail.com', 'Bangalore'), ('00078r@gmail.com',     'Mumbai'),         ('00078r@gmail.com',     'Vijayawada'),('00078r@gmail.com',     'Vijayawada'),         ('00078r@gmail.com',     'Vijayawada')],         columns=['id','city'],         index=None     )topdf = df.groupby('id').apply(get_top_city)id000.tushar@gmail.com     Bangalore00078r@gmail.com        Vijayawada0007ayan@gmail.com      Jamshedpur# or topdf.items()/iteritems() if you want as list of (id,city) tuples[('000.tushar@gmail.com', 'Bangalore'), ('00078r@gmail.com', 'Vijayawada'), ('0007ayan@gmail.com', 'Jamshedpur')]