How to row-wise concatenate several columns containing strings?
The key to operate in columns (Series) of strings en mass is the Series.str
accessor.
I can think of two .str
methods to do what you want.
str.cat()
The first is str.cat
. You have to start from a series, but you can pass a list of series (unfortunately you can't pass a dataframe) to concatenate with an optional separator. Using your example:
column_names = df.columns[1:] # skipping the first, numeric, columnseries_list = [df[c] for c in column_names[1:]]# concatenate:df['result'] = series_list[0].str.cat(series_list[1:], sep=' ')
Or, in one line:
df['result'] = df[df.columns[1]].str.cat([df[c] for c in df.columns[2:]], sep=' ')
str.join()
The second is the .str.join()
method, which works like the standard Python method string.join()
, but for which you need to have a column (Series) of iterables, for example, a column of tuples, which we can get by applying tuples
row-wise to a sub-dataframe of the columns you're interested in:
tuple_series = df[column_names].apply(tuple, axis=1)df['result'] = tuple_series.str.join(' ')
Or, in one line:
df['result'] = df[df.columns[1:]].apply(tuple, axis=1).str.join(' ')
BTW, don't try the above with list
instead of tuple
. As of pandas-0.20.1
, if the function passed into the Dataframe.apply()
method returns a list
and the returned list has the same number entries as the columns of the original (sub)dataframe, Dataframe.apply()
returns a Dataframe
instead of a Series
.
Here is a slightly alternative solution:
In [57]: df['result'] = df.filter(regex=r'^t').apply(lambda x: x.add(' ')).sum(axis=1).str.strip()In [58]: dfOut[58]: n t0 t1 t2 t3 result0 92 a d g i a d g i1 916 b e h j b e h j2 363 c f i k c f i k