How to group dataframe rows into list in pandas groupby How to group dataframe rows into list in pandas groupby python python

How to group dataframe rows into list in pandas groupby


You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})        dfOut[1]:    a  b0  A  11  A  22  B  53  B  54  B  45  C  6In [2]: df.groupby('a')['b'].apply(list)Out[2]: aA       [1, 2]B    [5, 5, 4]C          [6]Name: b, dtype: objectIn [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')        df1Out[3]:    a        new0  A     [1, 2]1  B  [5, 5, 4]2  C        [6]


If performance is important go down to numpy level:

import numpy as npdf = pd.DataFrame({'a': np.random.randint(0, 60, 600), 'b': [1, 2, 5, 5, 4, 6]*100})def f(df):         keys, values = df.sort_values('a').values.T         ukeys, index = np.unique(keys, True)         arrays = np.split(values, index[1:])         df2 = pd.DataFrame({'a':ukeys, 'b':[list(a) for a in arrays]})         return df2

Tests:

In [301]: %timeit f(df)1000 loops, best of 3: 1.64 ms per loopIn [302]: %timeit df.groupby('a')['b'].apply(list)100 loops, best of 3: 5.26 ms per loop


A handy way to achieve this would be:

df.groupby('a').agg({'b':lambda x: list(x)})

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py