How to flatten a pandas dataframe with some columns as json? How to flatten a pandas dataframe with some columns as json? python python

How to flatten a pandas dataframe with some columns as json?


Here's a solution using json_normalize() again by using a custom function to get the data in the correct format understood by json_normalize function.

import astfrom pandas.io.json import json_normalizedef only_dict(d):    '''    Convert json string representation of dictionary to a python dict    '''    return ast.literal_eval(d)def list_of_dicts(ld):    '''    Create a mapping of the tuples formed after     converting json strings of list to a python list       '''    return dict([(list(d.values())[1], list(d.values())[0]) for d in ast.literal_eval(ld)])A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 

Finally, join the DFs on the common index to get:

df[['id', 'name']].join([A, B])

Image


EDIT:- As per the comment by @MartijnPieters, the recommended way of decoding the json strings would be to use json.loads() which is much faster when compared to using ast.literal_eval() if you know that the data source is JSON.


The quickest seems to be:

import pandas as pdimport jsonjson_struct = json.loads(df.to_json(orient="records"))    df_flat = pd.io.json.json_normalize(json_struct) #use pd.io.json


create a custom function to flatten columnB then use pd.concat

def flatten(js):    return pd.DataFrame(js).set_index('pos').squeeze()pd.concat([df.drop(['columnA', 'columnB'], axis=1),           df.columnA.apply(pd.Series),           df.columnB.apply(flatten)], axis=1)

enter image description here