Construct pandas DataFrame from items in nested dictionary
A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict
, using the option orient='index'
:
user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'}, 'Category 2': {'att_1': 23, 'att_2': 'another'}}, 15: {'Category 1': {'att_1': 10, 'att_2': 'foo'}, 'Category 2': {'att_1': 30, 'att_2': 'bar'}}}pd.DataFrame.from_dict({(i,j): user_dict[i][j] for i in user_dict.keys() for j in user_dict[i].keys()}, orient='index') att_1 att_212 Category 1 1 whatever Category 2 23 another15 Category 1 10 foo Category 2 30 bar
An alternative approach would be to build your dataframe up by concatenating the component dataframes:
user_ids = []frames = []for user_id, d in user_dict.iteritems(): user_ids.append(user_id) frames.append(pd.DataFrame.from_dict(d, orient='index'))pd.concat(frames, keys=user_ids) att_1 att_212 Category 1 1 whatever Category 2 23 another15 Category 1 10 foo Category 2 30 bar
pd.concat
accepts a dictionary. With this in mind, it is possible to improve upon the currently accepted answer in terms of simplicity and performance by use a dictionary comprehension to build a dictionary mapping keys to sub-frames.
pd.concat({k: pd.DataFrame(v).T for k, v in user_dict.items()}, axis=0)
Or,
pd.concat({ k: pd.DataFrame.from_dict(v, 'index') for k, v in user_dict.items() }, axis=0)
att_1 att_212 Category 1 1 whatever Category 2 23 another15 Category 1 10 foo Category 2 30 bar
So I used to use a for loop for iterating through the dictionary as well, but one thing I've found that works much faster is to convert to a panel and then to a dataframe. Say you have a dictionary d
import pandas as pdd{'RAY Index': {datetime.date(2014, 11, 3): {'PX_LAST': 1199.46,'PX_OPEN': 1200.14},datetime.date(2014, 11, 4): {'PX_LAST': 1195.323, 'PX_OPEN': 1197.69},datetime.date(2014, 11, 5): {'PX_LAST': 1200.936, 'PX_OPEN': 1195.32},datetime.date(2014, 11, 6): {'PX_LAST': 1206.061, 'PX_OPEN': 1200.62}},'SPX Index': {datetime.date(2014, 11, 3): {'PX_LAST': 2017.81,'PX_OPEN': 2018.21},datetime.date(2014, 11, 4): {'PX_LAST': 2012.1, 'PX_OPEN': 2015.81},datetime.date(2014, 11, 5): {'PX_LAST': 2023.57, 'PX_OPEN': 2015.29},datetime.date(2014, 11, 6): {'PX_LAST': 2031.21, 'PX_OPEN': 2023.33}}}
The command
pd.Panel(d)<class 'pandas.core.panel.Panel'>Dimensions: 2 (items) x 2 (major_axis) x 4 (minor_axis)Items axis: RAY Index to SPX IndexMajor_axis axis: PX_LAST to PX_OPENMinor_axis axis: 2014-11-03 to 2014-11-06
where pd.Panel(d)[item] yields a dataframe
pd.Panel(d)['SPX Index']2014-11-03 2014-11-04 2014-11-05 2014-11-06PX_LAST 2017.81 2012.10 2023.57 2031.21PX_OPEN 2018.21 2015.81 2015.29 2023.33
You can then hit the command to_frame() to turn it into a dataframe. I use reset_index as well to turn the major and minor axis into columns rather than have them as indices.
pd.Panel(d).to_frame().reset_index()major minor RAY Index SPX IndexPX_LAST 2014-11-03 1199.460 2017.81PX_LAST 2014-11-04 1195.323 2012.10PX_LAST 2014-11-05 1200.936 2023.57PX_LAST 2014-11-06 1206.061 2031.21PX_OPEN 2014-11-03 1200.140 2018.21PX_OPEN 2014-11-04 1197.690 2015.81PX_OPEN 2014-11-05 1195.320 2015.29PX_OPEN 2014-11-06 1200.620 2023.33
Finally, if you don't like the way the frame looks you can use the transpose function of panel to change the appearance before calling to_frame() see documentation here http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Panel.transpose.html
Just as an example
pd.Panel(d).transpose(2,0,1).to_frame().reset_index()major minor 2014-11-03 2014-11-04 2014-11-05 2014-11-06RAY Index PX_LAST 1199.46 1195.323 1200.936 1206.061RAY Index PX_OPEN 1200.14 1197.690 1195.320 1200.620SPX Index PX_LAST 2017.81 2012.100 2023.570 2031.210SPX Index PX_OPEN 2018.21 2015.810 2015.290 2023.330
Hope this helps.