merging varying number of rows and columns by multiple conditions in python
EDIT v2 with additional columns
This version ensures the values in the additional columns are not impacted.
c = ['connector','type','q_text','a_text','var1','var2','cumsum','country','others']d = [[1111, 1, 'aa', None, 'xx', 'ps', 0, 'US', 'other values'], [9999, 2, None, 'tt', 'jjjj', 'pppp', 0, 'UK', 'no values'], [1111, 2, None, 'uu', None, 'oo', 1, 'US', 'some values'], [9999, 1, 'bb', None, 'yy', 'Rt', 1, 'UK', 'more values'], [9999, 1, 'cc', None, 'zz', 'tR', 2, 'UK', 'less values']]import pandas as pdpd.set_option('display.max_columns', None)df = pd.DataFrame(d,columns=c)print (df)df.loc[df['type'] == 2, 'var1.1'] = df['var1']df.loc[df['type'] == 2, 'var2.1'] = df['var2']my_cols = ['q_text','a_text','var1','var2','var1.1','var2.1']df[my_cols] = df.sort_values(['connector','type']).groupby('connector')[my_cols].transform(lambda x: x.bfill())df.dropna(subset=['q_text'],inplace=True)df.reset_index(drop=True,inplace=True)print (df)
Original DataFrame:
connector type q_text a_text var1 var2 cumsum country others0 1111 1 aa None xx ps 0 US other values1 9999 2 None tt jjjj pppp 0 UK no values2 1111 2 None uu None oo 1 US some values3 9999 1 bb None yy Rt 1 UK more values4 9999 1 cc None zz tR 2 UK less values
Updated DataFrame
connector type q_text a_text var1 var2 cumsum country others var1.1 var2.10 1111 1 aa uu xx ps 0 US other values None oo 1 9999 1 bb tt yy Rt 1 UK more values jjjj pppp 2 9999 1 cc tt zz tR 2 UK less values jjjj pppp