Pandas DataFrame stack multiple column values into single column
You can melt your dataframe:
>>> keys = [c for c in df if c.startswith('key.')]>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key') topic variable key0 8 key.0 abc1 9 key.0 xab2 8 key.1 def3 9 key.1 xcd4 8 key.2 ghi5 9 key.2 xef
It also gives you the source of the key.
From v0.20
, melt
is a first class function of the pd.DataFrame
class:
>>> df.melt('topic', value_name='key').drop('variable', 1) topic key0 8 abc1 9 xab2 8 def3 9 xcd4 8 ghi5 9 xef
OK , cause one of the current answer is mark as duplicated of this question, I will answer here.
By Using wide_to_long
pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)Out[123]: topic key0 8 abc1 9 xab2 8 def3 9 xcd4 8 ghi5 9 xef
After trying various ways, I find the following is more or less intuitive, provided stack
's magic is understood:
# keep topic as index, stack other columns 'against' itstacked = df.set_index('topic').stack()# set the name of the new series createddf = stacked.reset_index(name='key')# drop the 'source' level (key.*)df.drop('level_1', axis=1, inplace=True)
The resulting dataframe is as required:
topic key0 8 abc1 8 def2 8 ghi3 9 xab4 9 xcd5 9 xef
You may want to print intermediary results to understand the process in full. If you don't mind having more columns than needed, the key steps are set_index('topic')
, stack()
and reset_index(name='key')
.