Flatten DataFrame nested list/array with extra index keys (for time series)
Use dictionary comprehension with pop
for extract original column and concat
for MulltiIndex
:
df = pd.concat({k: pd.DataFrame(array) for k, array in mydf.pop('colArray').items()})
Alternative is use parameter keys
:
df = pd.concat([pd.DataFrame(array) for array in mydf.pop('colArray')], keys=mydf.index)
Then remove second level, so possible join
with original DataFrame
:
df = df.reset_index(level=1, drop=True).join(mydf).reset_index(drop=True)
Sample:
mydf = pd.DataFrame({'id': ['foo', 'bar', 'fooz', 'barz'], 'colA': ['a1', 'a2', 'a3', 'a4'], 'colB': ['b1', 'b2', 'b3', 'b4'], 'colArray': [[{'date': 's', 'data1': 't', 'data2': 0.1}, {'date': 'd', 'data1': 'r', 'data2': 0.8}], [{'date': 'd', 'data1': 'y', 'data2': 0.1}], [{'date': 'g', 'data1': 'u', 'data2': 0.1}], [{'date': 'h', 'data1': 'i', 'data2': 0.1}]]})print (mydf) id colA colB colArray0 foo a1 b1 [{'date': 's', 'data1': 't', 'data2': 0.1}, {'...1 bar a2 b2 [{'date': 'd', 'data1': 'y', 'data2': 0.1}]2 fooz a3 b3 [{'date': 'g', 'data1': 'u', 'data2': 0.1}]3 barz a4 b4 [{'date': 'h', 'data1': 'i', 'data2': 0.1}]
df = pd.concat({k: pd.DataFrame(array) for k, array in mydf.pop('colArray').items()})print (df) data1 data2 date0 0 t 0.1 s 1 r 0.8 d1 0 y 0.1 d2 0 u 0.1 g3 0 i 0.1 hdf = df.reset_index(level=1, drop=True).join(mydf).reset_index(drop=True)print (df) data1 data2 date id colA colB0 t 0.1 s foo a1 b11 r 0.8 d foo a1 b12 y 0.1 d bar a2 b23 u 0.1 g fooz a3 b34 i 0.1 h barz a4 b4