Pandas - group by column and transform the data to numpy array Pandas - group by column and transform the data to numpy array pandas pandas

Pandas - group by column and transform the data to numpy array


First is necessary add missing values - first solution with unstack and stack, counter Series is created by cumcount.

Second solution use reindex by MultiIndex.

Last use lambda function with groupby, convert to numpy array by values and last to lists:

g = df.groupby('group').cumcount()L = (df.set_index(['group',g])       .unstack(fill_value=0)       .stack().groupby(level=0)       .apply(lambda x: x.values.tolist())       .tolist())print (L)[[[1, 4], [2, 5], [3, 6], [4, 7]],  [[1, 4], [2, 5], [3, 6], [0, 0]],  [[1, 4], [0, 0], [0, 0], [0, 0]]]

Another solution:

g = df.groupby('group').cumcount()mux = pd.MultiIndex.from_product([df['group'].unique(), g.unique()])L = (df.set_index(['group',g])       .reindex(mux, fill_value=0)       .groupby(level=0)['data_1','data_2']       .apply(lambda x: x.values.tolist())       .tolist())