Flatten double nested JSON
Use pandas.io.json.json_normalize
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])output: email firstname lastname mobile orgname phone teams.teamname0 john.doe@wildlife.net John Doe Anon 916-555-1234 11 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 12 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 23 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 2
Explanation
from pandas.io.json import json_normalizeimport pandas as pd
I've only learned how to use the json_normalize function recently so my explanation might not be right.
Start with what I'm calling 'Layer 0'
json_normalize(data)output: teams0 [{'teamname': '1', 'members': [{'firstname': '...
There is 1 Column and 1 Row. Everything is inside the 'team' column.
Look into what I'm calling 'Layer 1' by using record_path=
json_normalize(data,record_path='teams')output: members teamname0 [{'firstname': 'John', 'lastname': 'Doe', 'org... 11 [{'firstname': 'Mickey', 'lastname': 'Moose', ... 2
In Layer 1 we have have flattened 'teamname' but there is more inside 'members'.
Look into Layer 2 with record_path=. The notation is unintuitive at first. I now remember it by ['layer','deeperlayer'] where the result is layer.deeperlayer.
json_normalize(data,record_path=['teams','members'])output: email firstname lastname mobile orgname phone0 john.doe@wildlife.net John Doe Anon 916-555-12341 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-43212 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-00003 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222
Excuse my output, I don't know how to make tables in a response.
Finally we add in Layer 1 columns using meta=
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])output: email firstname lastname mobile orgname phone teams.teamname0 john.doe@wildlife.net John Doe Anon 916-555-1234 11 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 12 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 23 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 2
Notice how we needed a list of lists for meta=[[]] to reference Layer 1.If there was a column we want from Layer 0 and Layer 1 we could do this:
json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']])
The result of the json_normalize is a pandas dataframe.
This is one way to do it. Should give you some ideas.
df = pd.concat( [ pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams'] ], keys=[t['teamname'] for t in data['teams']]) 0 11 email john.doe@wildlife.net jane.doe@wildlife.net firstname John Jane lastname Doe Doe mobile 916-555-7890 orgname Anon Anon phone 916-555-1234 916-555-43212 email mickey.moose@wildlife.net minny.moose@wildlife.net firstname Mickey Minny lastname Moose Moose mobile 916-555-1111 orgname Moosers Moosers phone 916-555-0000 916-555-2222
To get a nice table with team name and members as rows, all attributes in columns:
df.index.levels[0].name = 'teamname'df.columns.name = 'member'df.T.stack(0).swaplevel(0, 1).sort_index()
To get team name and member as actual columns, just reset the index.
df.index.levels[0].name = 'teamname'df.columns.name = 'member'df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
The whole thing
import jsonimport pandas as pdjson_text = """{"teams": [ { "teamname": "1", "members": [ { "firstname": "John", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-1234", "mobile": "", "email": "john.doe@wildlife.net" }, { "firstname": "Jane", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-4321", "mobile": "916-555-7890", "email": "jane.doe@wildlife.net" } ] }, { "teamname": "2", "members": [ { "firstname": "Mickey", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-0000", "mobile": "916-555-1111", "email": "mickey.moose@wildlife.net" }, { "firstname": "Minny", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-2222", "mobile": "", "email": "minny.moose@wildlife.net" } ] } ]}"""data = json.loads(json_text)df = pd.concat( [ pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams'] ], keys=[t['teamname'] for t in data['teams']])df.index.levels[0].name = 'teamname'df.columns.name = 'member'df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()