Flatten double nested JSON Flatten double nested JSON json json

Flatten double nested JSON


Use pandas.io.json.json_normalize

json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])output:         email                firstname lastname mobile      orgname    phone       teams.teamname0   john.doe@wildlife.net       John    Doe                   Anon      916-555-1234    11   jane.doe@wildlife.net       Jane    Doe     916-555-7890  Anon      916-555-4321    12   mickey.moose@wildlife.net   Mickey  Moose   916-555-1111  Moosers   916-555-0000    23   minny.moose@wildlife.net    Minny   Moose                 Moosers   916-555-2222    2


Explanation

from pandas.io.json import json_normalizeimport pandas as pd

I've only learned how to use the json_normalize function recently so my explanation might not be right.

Start with what I'm calling 'Layer 0'

json_normalize(data)output:     teams0   [{'teamname': '1', 'members': [{'firstname': '...

There is 1 Column and 1 Row. Everything is inside the 'team' column.

Look into what I'm calling 'Layer 1' by using record_path=

json_normalize(data,record_path='teams')output:     members                                          teamname0   [{'firstname': 'John', 'lastname': 'Doe', 'org...    11   [{'firstname': 'Mickey', 'lastname': 'Moose', ...    2

In Layer 1 we have have flattened 'teamname' but there is more inside 'members'.

Look into Layer 2 with record_path=. The notation is unintuitive at first. I now remember it by ['layer','deeperlayer'] where the result is layer.deeperlayer.

json_normalize(data,record_path=['teams','members'])output:           email              firstname lastname   mobile     orgname   phone0   john.doe@wildlife.net      John        Doe                  Anon    916-555-12341   jane.doe@wildlife.net       Jane        Doe   916-555-7890  Anon    916-555-43212   mickey.moose@wildlife.net   Mickey     Moose   916-555-1111 Moosers 916-555-00003   minny.moose@wildlife.net    Minny       Moose               Moosers 916-555-2222

Excuse my output, I don't know how to make tables in a response.

Finally we add in Layer 1 columns using meta=

json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])output:         email                firstname lastname mobile      orgname    phone       teams.teamname0   john.doe@wildlife.net       John    Doe                   Anon      916-555-1234    11   jane.doe@wildlife.net       Jane    Doe     916-555-7890  Anon      916-555-4321    12   mickey.moose@wildlife.net   Mickey  Moose   916-555-1111  Moosers   916-555-0000    23   minny.moose@wildlife.net    Minny   Moose                 Moosers   916-555-2222    2

Notice how we needed a list of lists for meta=[[]] to reference Layer 1.If there was a column we want from Layer 0 and Layer 1 we could do this:

json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']])

The result of the json_normalize is a pandas dataframe.


This is one way to do it. Should give you some ideas.

df = pd.concat(    [        pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']    ], keys=[t['teamname'] for t in data['teams']])                                     0                         11 email          john.doe@wildlife.net     jane.doe@wildlife.net  firstname                       John                      Jane  lastname                         Doe                       Doe  mobile                                            916-555-7890  orgname                         Anon                      Anon  phone                   916-555-1234              916-555-43212 email      mickey.moose@wildlife.net  minny.moose@wildlife.net  firstname                     Mickey                     Minny  lastname                       Moose                     Moose  mobile                  916-555-1111                            orgname                      Moosers                   Moosers  phone                   916-555-0000              916-555-2222

To get a nice table with team name and members as rows, all attributes in columns:

df.index.levels[0].name = 'teamname'df.columns.name = 'member'df.T.stack(0).swaplevel(0, 1).sort_index()

enter image description here

To get team name and member as actual columns, just reset the index.

df.index.levels[0].name = 'teamname'df.columns.name = 'member'df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

enter image description here

The whole thing

import jsonimport pandas as pdjson_text = """{"teams": [  {    "teamname": "1",    "members": [      {        "firstname": "John",         "lastname": "Doe",        "orgname": "Anon",        "phone": "916-555-1234",        "mobile": "",        "email": "john.doe@wildlife.net"      },      {        "firstname": "Jane",        "lastname": "Doe",        "orgname": "Anon",        "phone": "916-555-4321",        "mobile": "916-555-7890",        "email": "jane.doe@wildlife.net"      }    ]  },  {    "teamname": "2",    "members": [      {        "firstname": "Mickey",        "lastname": "Moose",        "orgname": "Moosers",        "phone": "916-555-0000",        "mobile": "916-555-1111",        "email": "mickey.moose@wildlife.net"      },      {        "firstname": "Minny",        "lastname": "Moose",        "orgname": "Moosers",        "phone": "916-555-2222",        "mobile": "",        "email": "minny.moose@wildlife.net"      }    ]  }       ]}"""data = json.loads(json_text)df = pd.concat(    [        pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']    ], keys=[t['teamname'] for t in data['teams']])df.index.levels[0].name = 'teamname'df.columns.name = 'member'df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()