Concatenate column values in Pandas DataFrame with "NaN" values Concatenate column values in Pandas DataFrame with "NaN" values pandas pandas

Concatenate column values in Pandas DataFrame with "NaN" values


I don't think your problem is trivial. However, here is a workaround using numpy vectorization :

In [49]: def concat(*args):    ...:     strs = [str(arg) for arg in args if not pd.isnull(arg)]    ...:     return ','.join(strs) if strs else np.nan    ...: np_concat = np.vectorize(concat)    ...: In [50]: np_concat(df['col2'], df['col3'])Out[50]: array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'],       dtype='|S64')In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])In [52]: dfOut[52]:   col1 col2 col3 concatenated0    1   p1    A         p1,A1    1   p2    B         p2,B2    2   p1    C         p1,C3    2  NaN    D            D4    3   p2    E         p2,E5    3  NaN    F            F[6 rows x 4 columns]


You could first replace NaNs with empty strings, for the whole dataframe or the column(s) you desire.

In [6]: df = df.fillna('')In [7]: df['concatenated'] = df['col2'] +','+ df['col3']In [8]: dfOut[8]:  col1 col2 col3 concatenated0    1   p1    A         p1,A1    1   p2    B         p2,B2    2   p1    C         p1,C3    2         D           ,D4    3   p2    E         p2,E5    3         F           ,F


We can use stack which will drop the NaN, then use groupby.agg and ','.join the strings:

df['concatenated'] = df[['col2', 'col3']].stack().groupby(level=0).agg(','.join)
  col1 col2 col3 concatenated0    1   p1    A         p1,A1    1   p2    B         p2,B2    2   p1    C         p1,C3    2  NaN    D            D4    3   p2    E         p2,E5    3  NaN    F            F