Filter a data-frame and add a new column according to the given condition
Today's edition of Over Engineered with Numpy
Though admittedly very little obvious Numpy
i, rows = pd.factorize([*zip(df.ID, df.col1.replace('None'))])k, cols = pd.factorize(df.groupby(i).cumcount())dleft = pd.DataFrame(dict(zip(['ID', 'col1'], zip(*rows))))drigt = pd.DataFrame(index=dleft.index, columns=np.arange(len(cols)) + 2).add_prefix('col')drigt.values[i, k] = df.col2.valuesdleft.join(drigt) ID col1 col2 col30 1 Abc street 2017-07-27 2017-08-171 1 Def street 2018-07-15 2018-08-132 2 fbg street 2018-01-07 2018-08-123 2 trf street 2019-01-15 NaN
I am using cumcount
with merge
df1=df.loc[df.col1.ne('None'),:].copy()df2=df.loc[df.col1.eq('None'),:].copy()df1['Key']=df1.groupby('ID').cumcount()df2['Key']=df2.groupby('ID').cumcount()df1.merge(df2.drop('col1',1),on=['ID','Key'],how='left')Out[816]: ID col1 col2_x Key col2_y0 1 Abcstreet 2017-07-27 0 2017-08-171 1 Defstreet 2018-07-15 1 2018-08-132 2 fbgstreet 2018-01-07 0 2018-08-123 2 trfstreet 2019-01-15 1 NaN
Using ffill
+ pivot_table
. This assumes that None
follows the proper value, which it appears to from your data.
u = df.assign(col1=df.col1.replace('None'))g = ['ID', 'col1']idx = u.groupby(g).cumcount()(u.assign(idx=idx) .pivot_table(index=g, columns='idx', values='col2', aggfunc='first') .reset_index())
idx ID col1 0 10 1 Abc street 2017-07-27 2017-08-171 1 Def street 2018-07-15 2018-08-132 2 fbg street 2018-01-07 2018-08-123 2 trf street 2019-01-15 NaN