merging varying number of rows and columns by multiple conditions in python merging varying number of rows and columns by multiple conditions in python pandas pandas

merging varying number of rows and columns by multiple conditions in python


EDIT v2 with additional columns

This version ensures the values in the additional columns are not impacted.

c = ['connector','type','q_text','a_text','var1','var2','cumsum','country','others']d = [[1111, 1, 'aa',  None, 'xx',   'ps',   0, 'US', 'other values'],     [9999, 2, None,  'tt', 'jjjj', 'pppp', 0, 'UK', 'no values'],     [1111, 2, None,  'uu', None,   'oo',   1, 'US', 'some values'],     [9999, 1, 'bb',  None, 'yy',   'Rt',   1, 'UK', 'more values'],     [9999, 1, 'cc',  None, 'zz',   'tR',   2, 'UK', 'less values']]import pandas as pdpd.set_option('display.max_columns', None)df = pd.DataFrame(d,columns=c)print (df)df.loc[df['type'] == 2, 'var1.1'] = df['var1']df.loc[df['type'] == 2, 'var2.1'] = df['var2']my_cols = ['q_text','a_text','var1','var2','var1.1','var2.1']df[my_cols] = df.sort_values(['connector','type']).groupby('connector')[my_cols].transform(lambda x: x.bfill())df.dropna(subset=['q_text'],inplace=True)df.reset_index(drop=True,inplace=True)print (df)

Original DataFrame:

   connector  type q_text a_text  var1  var2  cumsum country        others0       1111     1     aa   None    xx    ps       0      US  other values1       9999     2   None     tt  jjjj  pppp       0      UK     no values2       1111     2   None     uu  None    oo       1      US   some values3       9999     1     bb   None    yy    Rt       1      UK   more values4       9999     1     cc   None    zz    tR       2      UK   less values

Updated DataFrame

   connector  type q_text a_text var1 var2  cumsum country        others  var1.1 var2.10       1111     1     aa     uu   xx   ps       0      US  other values    None     oo 1       9999     1     bb     tt   yy   Rt       1      UK   more values    jjjj   pppp 2       9999     1     cc     tt   zz   tR       2      UK   less values    jjjj   pppp