Pandas read nested json
You can use json_normalize
:
import jsonwith open('myJson.json') as data_file: data = json.load(data_file) df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'], record_prefix='locations_')print (df) locations_arrTime locations_arrTimeDiffMin locations_depTime \0 06:32 1 06:37 1 06:40 2 08:24 1 locations_depTimeDiffMin locations_name locations_platform \0 0 Spital am Pyhrn Bahnhof 2 1 0 Windischgarsten Bahnhof 2 2 Linz/Donau Hbf 1A-B locations_stationIdx locations_track number name date 0 0 R 3932 R 3932 01.10.2016 1 1 R 3932 01.10.2016 2 22 R 3932 01.10.2016
EDIT:
You can use read_json
with parsing name
by DataFrame
constructor and last groupby
with apply join
:
df = pd.read_json("myJson.json")df.locations = pd.DataFrame(df.locations.values.tolist())['name']df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index()print (df) date name number locations0 2016-01-10 R 3932 Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...
A possible alternative to pandas.json_normalize
is to build your own dataframe by extracting only the selected keys and values from the nested dictionary. The main reason for doing this is because json_normalize gets slow for very large json file (and might not always produce the output you want).
So, here is an alternative way to flatten the nested dictionary in pandas using glom
. The aim is to extract selected keys and value from the nested dictionary and save them in a separate column of the pandas dataframe (:
Here is a step by step guide: https://medium.com/@enrico.alemani/flatten-nested-dictionaries-in-pandas-using-glom-7948345c88f5
import pandas as pdfrom glom import glomfrom ast import literal_evaltarget = { "number": "", "date": "01.10.2016", "name": "R 3932", "locations": { "depTimeDiffMin": "0", "name": "Spital am Pyhrn Bahnhof", "arrTime": "", "depTime": "06:32", "platform": "2", "stationIdx": "0", "arrTimeDiffMin": "", "track": "R 3932" }} # Import datadf = pd.DataFrame([str(target)], columns=['target'])# Extract id keys and save value into a separate pandas columndf['id'] = df['target'].apply(lambda row: glom(literal_eval(row), 'locations.name'))