Pandas read nested json Pandas read nested json pandas pandas

Pandas read nested json


You can use json_normalize:

import jsonwith open('myJson.json') as data_file:        data = json.load(data_file)  df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'],                     record_prefix='locations_')print (df)  locations_arrTime locations_arrTimeDiffMin locations_depTime  \0                                                        06:32   1             06:37                        1             06:40   2             08:24                        1                       locations_depTimeDiffMin           locations_name locations_platform  \0                        0  Spital am Pyhrn Bahnhof                  2   1                        0  Windischgarsten Bahnhof                  2   2                                    Linz/Donau Hbf               1A-B     locations_stationIdx locations_track number    name        date  0                    0          R 3932         R 3932  01.10.2016  1                    1                         R 3932  01.10.2016  2                   22                         R 3932  01.10.2016 

EDIT:

You can use read_json with parsing name by DataFrame constructor and last groupby with apply join:

df = pd.read_json("myJson.json")df.locations = pd.DataFrame(df.locations.values.tolist())['name']df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index()print (df)        date    name number                                          locations0 2016-01-10  R 3932         Spital am Pyhrn Bahnhof,Windischgarsten Bahnho... 


A possible alternative to pandas.json_normalize is to build your own dataframe by extracting only the selected keys and values from the nested dictionary. The main reason for doing this is because json_normalize gets slow for very large json file (and might not always produce the output you want).

So, here is an alternative way to flatten the nested dictionary in pandas using glom. The aim is to extract selected keys and value from the nested dictionary and save them in a separate column of the pandas dataframe (:

Here is a step by step guide: https://medium.com/@enrico.alemani/flatten-nested-dictionaries-in-pandas-using-glom-7948345c88f5

import pandas as pdfrom glom import glomfrom ast import literal_evaltarget = {    "number": "",    "date": "01.10.2016",    "name": "R 3932",    "locations":        {            "depTimeDiffMin": "0",            "name": "Spital am Pyhrn Bahnhof",            "arrTime": "",            "depTime": "06:32",            "platform": "2",            "stationIdx": "0",            "arrTimeDiffMin": "",            "track": "R 3932"        }}   # Import datadf = pd.DataFrame([str(target)], columns=['target'])# Extract id keys and save value into a separate pandas columndf['id'] = df['target'].apply(lambda row: glom(literal_eval(row), 'locations.name'))