Concatenate strings from several rows using Pandas groupby
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))df[['name','text','month']].drop_duplicates()Out[119]: name text month0 name1 hej,du 112 name1 aj,oj 124 name2 fin,katt 116 name2 mycket,lite 12
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
EDIT actually I can just call apply
and then reset_index
:
In [124]:df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()Out[124]: name month text0 name1 11 hej,du1 name1 12 aj,oj2 name2 11 fin,katt3 name2 12 mycket,lite
update
the lambda
is unnecessary here:
In[38]:df.groupby(['name','month'])['text'].apply(','.join).reset_index()Out[38]: name month text0 name1 11 du1 name1 12 aj,oj2 name2 11 fin,katt3 name2 12 mycket,lite
we can groupby the 'name' and 'month' columns, then call agg() functions of Panda’s DataFrame objects.
The aggregation functionality provided by the agg() function allows multiple statistics to be calculated per group in one calculation.
df.groupby(['name', 'month'], as_index = False).agg({'text': ' '.join})
The answer by EdChum provides you with a lot of flexibility but if you just want to concateate strings into a column of list objects you can also:
output_series = df.groupby(['name','month'])['text'].apply(list)