Flatten DataFrame nested list/array with extra index keys (for time series) Flatten DataFrame nested list/array with extra index keys (for time series) json json

Flatten DataFrame nested list/array with extra index keys (for time series)


Use dictionary comprehension with pop for extract original column and concat for MulltiIndex:

df = pd.concat({k: pd.DataFrame(array) for k, array in mydf.pop('colArray').items()})

Alternative is use parameter keys:

df = pd.concat([pd.DataFrame(array) for array in mydf.pop('colArray')], keys=mydf.index)

Then remove second level, so possible join with original DataFrame:

df = df.reset_index(level=1, drop=True).join(mydf).reset_index(drop=True)

Sample:

mydf = pd.DataFrame({'id': ['foo', 'bar', 'fooz', 'barz'], 'colA': ['a1', 'a2', 'a3', 'a4'], 'colB': ['b1', 'b2', 'b3', 'b4'], 'colArray': [[{'date': 's', 'data1': 't', 'data2': 0.1}, {'date': 'd', 'data1': 'r', 'data2': 0.8}], [{'date': 'd', 'data1': 'y', 'data2': 0.1}], [{'date': 'g', 'data1': 'u', 'data2': 0.1}], [{'date': 'h', 'data1': 'i', 'data2': 0.1}]]})print (mydf)     id colA colB                                           colArray0   foo   a1   b1  [{'date': 's', 'data1': 't', 'data2': 0.1}, {'...1   bar   a2   b2        [{'date': 'd', 'data1': 'y', 'data2': 0.1}]2  fooz   a3   b3        [{'date': 'g', 'data1': 'u', 'data2': 0.1}]3  barz   a4   b4        [{'date': 'h', 'data1': 'i', 'data2': 0.1}]

df = pd.concat({k: pd.DataFrame(array) for k, array in mydf.pop('colArray').items()})print (df)    data1  data2 date0 0     t    0.1    s  1     r    0.8    d1 0     y    0.1    d2 0     u    0.1    g3 0     i    0.1    hdf = df.reset_index(level=1, drop=True).join(mydf).reset_index(drop=True)print (df)  data1  data2 date    id colA colB0     t    0.1    s   foo   a1   b11     r    0.8    d   foo   a1   b12     y    0.1    d   bar   a2   b23     u    0.1    g  fooz   a3   b34     i    0.1    h  barz   a4   b4