Pandas - group by column and transform the data to numpy array
First is necessary add missing values - first solution with unstack
and stack
, counter Series is created by cumcount
.
Second solution use reindex
by MultiIndex
.
Last use lambda function with groupby
, convert to numpy array by values
and last to lists:
g = df.groupby('group').cumcount()L = (df.set_index(['group',g]) .unstack(fill_value=0) .stack().groupby(level=0) .apply(lambda x: x.values.tolist()) .tolist())print (L)[[[1, 4], [2, 5], [3, 6], [4, 7]], [[1, 4], [2, 5], [3, 6], [0, 0]], [[1, 4], [0, 0], [0, 0], [0, 0]]]
Another solution:
g = df.groupby('group').cumcount()mux = pd.MultiIndex.from_product([df['group'].unique(), g.unique()])L = (df.set_index(['group',g]) .reindex(mux, fill_value=0) .groupby(level=0)['data_1','data_2'] .apply(lambda x: x.values.tolist()) .tolist())