Split a Pandas column of lists into multiple columns
You can use the DataFrame
constructor with lists
created by to_list
:
import pandas as pdd1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'], ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}df2 = pd.DataFrame(d1)print (df2) teams0 [SF, NYG]1 [SF, NYG]2 [SF, NYG]3 [SF, NYG]4 [SF, NYG]5 [SF, NYG]6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.tolist(), index= df2.index)print (df2) teams team1 team20 [SF, NYG] SF NYG1 [SF, NYG] SF NYG2 [SF, NYG] SF NYG3 [SF, NYG] SF NYG4 [SF, NYG] SF NYG5 [SF, NYG] SF NYG6 [SF, NYG] SF NYG
And for a new DataFrame
:
df3 = pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])print (df3) team1 team20 SF NYG1 SF NYG2 SF NYG3 SF NYG4 SF NYG5 SF NYG6 SF NYG
A solution with apply(pd.Series)
is very slow:
#7k rowsdf2 = pd.concat([df2]*1000).reset_index(drop=True)In [121]: %timeit df2['teams'].apply(pd.Series)1.79 s ± 52.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)In [122]: %timeit pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])1.63 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Much simpler solution:
pd.DataFrame(df2["teams"].to_list(), columns=['team1', 'team2'])
Yields,
team1 team2-------------0 SF NYG1 SF NYG2 SF NYG3 SF NYG4 SF NYG5 SF NYG6 SF NYG7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df["teams"].str.split('<delim>', expand=True).values, columns=['team1', 'team2'])