Flattening dictionary with pd.json_normalize

python json pandas dictionary json-normalize

Setup

Your data is structured inconveniently. I want to focus on:

Getting the lists in 'IDs' into a list of dictionaries, which would be far more convenient.
Getting rid of the useless keys in the parent dictionary. All we care about are the values.

Your data:

{1: {'Name': 'Thrilling Tales of Dragon Slayers',     'IDs': {'StoreID': ['123445452543'],             'BookID': ['543533254353'],             'SalesID': ['543267765345']}}, 2: {'Name': 'boring Tales of Dragon Slayers',     'IDs': {'StoreID': ['111111', '1121111'],             'BookID': ['543533254353', '4324232342'],             'SalesID': ['543267765345', '4353543']}}}

What I want it to look like:

[{'Name': 'Thrilling Tales of Dragon Slayers',  'IDs': [{'StoreID': '123445452543',           'BookID': '543533254353',           'SalesID': '543267765345'}]}, {'Name': 'boring Tales of Dragon Slayers',  'IDs': [{'StoreID': '111111',           'BookID': '543533254353',           'SalesID': '543267765345'},          {'StoreID': '1121111',           'BookID': '4324232342',           'SalesID': '4353543'}]}]

Restructure Data

Reasonable Way

Simple loop, don't mess around. This gets us what I showed above

new = []for v in data.values():    temp = {**v}           # This is intended to keep all the other data that might be there    ids = temp.pop('IDs')  # I have to focus on this to create the records    temp['IDs'] = [dict(zip(ids, x)) for x in zip(*ids.values())]    new.append(temp)

Cute one-liner

new = [{**v, 'IDs': [dict(zip(v['IDs'], x)) for x in zip(*v['IDs'].values())]} for v in data.values()]

Create `DataFrame` with `pd.json_normalize`

In this call to json_normalize we need to specify the path to the records, i.e. the list of id dictionaries found at the 'IDs' key. json_normalize will create one row in the dataframe for every item in that list. This will be done with the the record_path parameter and we pass a tuple that describes the path (if it were in a deeper structure) or a string (if the key is at the top layer, which for us, it is).

record_path = 'IDs'

Then we want to tell json_normalize what keys are metadata for the records. If there are more than one record, as we have, then the metadata will be repeated for each record.

meta = 'Name'

So the final solution looks like this:

pd.json_normalize(new, record_path='IDs', meta='Name')        StoreID        BookID       SalesID                               Name0  123445452543  543533254353  543267765345  Thrilling Tales of Dragon Slayers1        111111  543533254353  543267765345     boring Tales of Dragon Slayers2       1121111    4324232342       4353543     boring Tales of Dragon Slayers

However

If we are restructuring anyway, might as well make it so we can just pass it to the dataframe constructor.

pd.DataFrame([    {'Name': r['Name'], **dict(zip(r['IDs'], x))}    for r in data.values() for x in zip(*r['IDs'].values())])                                Name       StoreID        BookID       SalesID0  Thrilling Tales of Dragon Slayers  123445452543  543533254353  5432677653451     boring Tales of Dragon Slayers        111111  543533254353  5432677653452     boring Tales of Dragon Slayers       1121111    4324232342       4353543

Bonus Content

While we are at it. The data is ambiguous in regards to whether or not each id type has the same number of ids. Suppose they did not.

data = {1:{      'Name': "Thrilling Tales of Dragon Slayers",      'IDs':{            "StoreID": ['123445452543'],            "BookID": ['543533254353'],            "SalesID": ['543267765345']}},     2:{      'Name': "boring Tales of Dragon Slayers",      'IDs':{            "StoreID": ['111111', '1121111'],            "BookID": ['543533254353', '4324232342'],            "SalesID": ['543267765345', '4353543', 'extra id']}}}

Then we can use zip_longest from itertools

from itertools import zip_longestpd.DataFrame([    {'Name': r['Name'], **dict(zip(r['IDs'], x))}    for r in data.values() for x in zip_longest(*r['IDs'].values())])                                Name       StoreID        BookID       SalesID0  Thrilling Tales of Dragon Slayers  123445452543  543533254353  5432677653451     boring Tales of Dragon Slayers        111111  543533254353  5432677653452     boring Tales of Dragon Slayers       1121111    4324232342       43535433     boring Tales of Dragon Slayers          None          None      extra id

python json pandas dictionary json-normalize

Use pandas.DataFrame.from_dict to read data
Convert the values in the 'IDs' column to separate columns
- .pop removes the old column from df
- pd.DataFrame(df.pop('IDs').values.tolist()) converts each dict key to a separate column
- .join the new columns back to df
pd.Series.explode each list in the columns, with .apply.

Depending on the data, sometimes the solution is to reshape the data, as shown by piRSquared

import pandas as pd# test datadata =\{1: {'IDs': {'BookID': ['543533254353'],             'SalesID': ['543267765345'],             'StoreID': ['123445452543']},     'Name': 'Thrilling Tales of Dragon Slayers'}, 2: {'IDs': {'BookID': ['543533254353', '4324232342'],             'SalesID': ['543267765345', '4353543'],             'StoreID': ['111111', '1121111']},     'Name': 'boring Tales of Dragon Slayers'}}# load the data using from_dictdf = pd.DataFrame.from_dict(data, orient='index').reset_index(drop=True)# convert IDs to separate columnsdf = df.join(pd.DataFrame(df.pop('IDs').values.tolist()))# explode the list in each columndf = df.apply(pd.Series.explode).reset_index(drop=True)# display(df)                                Name        BookID       SalesID       StoreID0  Thrilling Tales of Dragon Slayers  543533254353  543267765345  1234454525431     boring Tales of Dragon Slayers  543533254353  543267765345        1111112     boring Tales of Dragon Slayers    4324232342       4353543       1121111

CodeHunter

Flattening dictionary with pd.json_normalize

Setup

Restructure Data

Reasonable Way

Cute one-liner

Create `DataFrame` with `pd.json_normalize`

However

Bonus Content

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Flattening dictionary with pd.json_normalize

Setup

Restructure Data

Reasonable Way

Cute one-liner

Create DataFrame with pd.json_normalize

However

Bonus Content

Recent Posts

Create `DataFrame` with `pd.json_normalize`