Concatenate column values in Pandas DataFrame with "NaN" values
I don't think your problem is trivial. However, here is a workaround using numpy vectorization :
In [49]: def concat(*args): ...: strs = [str(arg) for arg in args if not pd.isnull(arg)] ...: return ','.join(strs) if strs else np.nan ...: np_concat = np.vectorize(concat) ...: In [50]: np_concat(df['col2'], df['col3'])Out[50]: array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'], dtype='|S64')In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])In [52]: dfOut[52]: col1 col2 col3 concatenated0 1 p1 A p1,A1 1 p2 B p2,B2 2 p1 C p1,C3 2 NaN D D4 3 p2 E p2,E5 3 NaN F F[6 rows x 4 columns]
You could first replace NaNs with empty strings, for the whole dataframe or the column(s) you desire.
In [6]: df = df.fillna('')In [7]: df['concatenated'] = df['col2'] +','+ df['col3']In [8]: dfOut[8]: col1 col2 col3 concatenated0 1 p1 A p1,A1 1 p2 B p2,B2 2 p1 C p1,C3 2 D ,D4 3 p2 E p2,E5 3 F ,F
We can use stack
which will drop the NaN
, then use groupby.agg
and ','.join
the strings:
df['concatenated'] = df[['col2', 'col3']].stack().groupby(level=0).agg(','.join)
col1 col2 col3 concatenated0 1 p1 A p1,A1 1 p2 B p2,B2 2 p1 C p1,C3 2 NaN D D4 3 p2 E p2,E5 3 NaN F F