How to generate n-level hierarchical JSON from pandas DataFrame? How to generate n-level hierarchical JSON from pandas DataFrame? json json

How to generate n-level hierarchical JSON from pandas DataFrame?


You can use itertuples to generate a nested dict, and then dump to json. To do this, you need to change the date timestamp to string

df4=df3.stack(level=[0,1,2]).reset_index() df4['Date'] = df4['Date'].dt.strftime('%Y-%m-%d')df4 = df4.set_index(['Date','Job Role','Department','Team']) \    .sort_index()

create the nested dict

def nested_dict():    return collections.defaultdict(nested_dict)result = nested_dict()

Use itertuples to populate it

for row in df4.itertuples():    result[row.Index[0]][row.Index[1]][row.Index[2]][row.Index[3]]['sales'] = row._1    # print(row)

and then use the json module to dump it.

import jsonjson.dumps(result)

'{"2017-12-31": {"Junior": {"Electronics": {"A": {"sales": -0.3947134370101142}, "B": {"sales": -0.9873530754403204}, "C": {"sales": -1.1182598058984508}}, "Household": {"A": {"sales": -1.1211850078098677}, "B": {"sales": 2.0330914483907847}, "C": {"sales": 3.94762379718749}}}, "Senior": {"Electronics": {"A": {"sales": 1.4528493451404196}, "B": {"sales": -2.3277322345261005}, "C": {"sales": -2.8040263791743922}}, "Household": {"A": {"sales": 3.0972591929279663}, "B": {"sales": 9.884565742502392}, "C": {"sales": 2.9359830722457576}}}}, "2018-01-31": {"Junior": {"Electronics": {"A": {"sales": -1.3580300149125217}, "B": {"sales": 1.414665000013205}, "C": {"sales": -1.432795129108244}}, "Household": {"A": {"sales": 2.7783259569115346}, "B": {"sales": 2.717700275321333}, "C": {"sales": 1.4358377416259644}}}, "Senior": {"Electronics": {"A": {"sales": 2.8981726774941485}, "B": {"sales": 12.022897003654117}, "C": {"sales": 0.01776855733076088}}, "Household": {"A": {"sales": -3.342163776613092}, "B": {"sales": -5.283208386572307}, "C": {"sales": 2.942580121975619}}}}}'


I ran into this and was confused by the complexity of the OP's setup. Here is a minimal example and solution (based on the answer provided by @Maarten Fabré).

import collectionsimport pandas as pd# build init DFx = ['a', 'a']y = ['b', 'c']z = [['d'], ['e', 'f']]df = pd.DataFrame(list(zip(x, y, z)), columns=['x', 'y', 'z'])#    x  y       z# 0  a  b     [d]# 1  a  c  [e, f]

Set up the the regular, flat, index, and then make that a multi index

# set flat indexdf = df.set_index(['x', 'y'])# set up multi indexdf = df.reindex(pd.MultiIndex.from_tuples(zip(x, y)))      #           z# a b     [d]#   c  [e, f]

Then init a nested dictionary, and fill it out item-by-item

nested_dict = collections.defaultdict(dict)for keys, value in df.z.iteritems():    nested_dict[keys[0]][keys[1]] = value# defaultdict(dict, {'a': {'b': ['d'], 'c': ['e', 'f']}})

At this point you can JSON dump it, etc.