Pandas DataFrame.groupby() to dictionary with multiple columns for value
Customize the function you use in apply
so it returns a list of lists for each group:
df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()# {0: [[23, 1]], # 1: [[5, 2], [2, 3], [19, 5]], # 2: [[56, 1], [22, 2]], # 3: [[2, 4], [14, 5]], # 4: [[59, 1]], # 5: [[44, 1], [1, 2], [87, 3]]}
If you need a list of tuples explicitly, use list(map(tuple, ...))
to convert:
df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()# {0: [(23, 1)], # 1: [(5, 2), (2, 3), (19, 5)], # 2: [(56, 1), (22, 2)], # 3: [(2, 4), (14, 5)], # 4: [(59, 1)], # 5: [(44, 1), (1, 2), (87, 3)]}
One way is to create a new tup
column and then create the dictionary.
df['tup'] = list(zip(df['Column2'], df['Column3']))df.groupby('Column1')['tup'].apply(list).to_dict()# {0: [(23, 1)],# 1: [(5, 2), (2, 3), (19, 5)],# 2: [(56, 1), (22, 2)],# 3: [(2, 4), (14, 5)],# 4: [(59, 1)],# 5: [(44, 1), (1, 2), (87, 3)]}
@Psidom's solution is more efficient, but if performance isn't an issue use what makes more sense to you:
df = pd.concat([df]*10000)def jp(df): df['tup'] = list(zip(df['Column2'], df['Column3'])) return df.groupby('Column1')['tup'].apply(list).to_dict()def psi(df): return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()%timeit jp(df) # 110ms%timeit psi(df) # 80ms
I'd rather use defaultdict
from collections import defaultdictd = defaultdict(list)for row in df.values.tolist(): d[row[0]].append(tuple(row[1:]))dict(d){0: [(23, 1)], 1: [(5, 2), (2, 3), (19, 5)], 2: [(56, 1), (22, 2)], 3: [(2, 4), (14, 5)], 4: [(59, 1)], 5: [(44, 1), (1, 2), (87, 3)]}