Pandas expand rows from list data available in column

`DataFrame.explode`

Since pandas >= 0.25.0 we have the explode method for this, which expands a list to a row for each element and repeats the rest of the columns:

df.explode('column1').reset_index(drop=True)

Output

  column1  column20       a        11       b        12       c        13       d        24       e        25       f        26       g        37       h        38       i        3

Since pandas >= 1.1.0 we have the ignore_index argument, so we don't have to chain with reset_index:

df.explode('column1', ignore_index=True)

Output

  column1  column20       a        11       b        12       c        13       d        24       e        25       f        26       g        37       h        38       i        3

python list pandas dataframe expand

You can create DataFrame by its constructor and stack:

 df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)        .stack()        .reset_index(level=1, drop=True)        .reset_index(name='column1')[['column1','column2']]print (df2)  column1  column20       a        11       b        12       c        13       d        24       e        25       f        26       g        37       h        38       i        3

If need change ordering by subset [['column1','column2']], you can also omit first reset_index:

df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)        .stack()        .reset_index(name='column1')[['column1','column2']]print (df2)  column1  column20       a        11       b        12       c        13       d        24       e        25       f        26       g        37       h        38       i        3

Another solution DataFrame.from_records for creating DataFrame from first column, then create Series by stack and join to original DataFrame:

df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']],                   'column2':[1,2,3]})a = pd.DataFrame.from_records(df.column1.tolist())                .stack()                .reset_index(level=1, drop=True)                .rename('column1')print (a)0    a0    b0    c1    d1    e1    f2    g2    h2    iName: column1, dtype: objectprint (df.drop('column1', axis=1)         .join(a)         .reset_index(drop=True)[['column1','column2']])  column1  column20       a        11       b        12       c        13       d        24       e        25       f        26       g        37       h        38       i        3

python list pandas dataframe expand

Another solution is to use the result_type='expand' argument of the pandas.apply function available since pandas 0.23. Answering @splinter's question this method can be generalized -- see below:

import pandas as pdfrom numpy import arangedf = pd.DataFrame(    {'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],    'column2': [1,2,3]})pd.melt(    df.join(        df.apply(lambda row: row['column1'], axis=1, result_type='expand')        ), value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2')[['column1','column2']]# can be generalized df = pd.DataFrame(    {'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],    'column2': [1,2,3],    'column3': [[1,2],[2,3],[3,4]],    'column4': [42,23,321],    'column5': ['a','b','c']})(pd.melt(    df.join(        df.apply(lambda row: row['column1'], axis=1, result_type='expand')        ), value_vars=arange(df['column1'].shape[0]), value_name='column1', id_vars=df.columns[1:]) .drop(columns=['variable'])[list(df.columns[:1]) + list(df.columns[1:])] .sort_values(by=['column1']))

UPDATE (for Jwely's comment):if you have lists with varying length, you can do:

df = pd.DataFrame(    {'column1' : [['a','b','c'],['d','f'],['g','h','i']],    'column2': [1,2,3]})longest = max(df['column1'].apply(lambda x: len(x)))pd.melt(    df.join(        df.apply(lambda row: row['column1'] if len(row['column1']) >= longest else row['column1'] + [None] * (longest - len(row['column1'])), axis=1, result_type='expand')    ), value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2').query("column1 == column1")[['column1','column2']]

CodeHunter

Pandas expand rows from list data available in column

`DataFrame.explode`

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last