From JSON file to Numpy Array From JSON file to Numpy Array numpy numpy

From JSON file to Numpy Array


Assuming you've successfully loaded that JSON into Python, here's one way to create the Numpy array you want. My code has a minimal definition of ObjectId so that it won't raise a NameError on ObjectId entries.

sorted(d["B"].items())]

produces a list of (key, value) tuples from the contents of a "B" dictionary, sorted by key. We then extract just the values from those tuples into a list, and append that list to a list containing the value from the "A" item.

import numpy as npclass ObjectId(object):    def __init__(self, objectid):        self.objectid = objectid    def __repr__(self):        return 'ObjectId("{}")'.format(self.objectid)data = [    {        "_id" : ObjectId("57065024c3d1132426c4dd53"),        "B" : {            "BA" : 14,            "BB" : 23,            "BC" : 32,            "BD" : 41        },        "A" : 50    },    {        "_id" : ObjectId("57065024c3d1132426c4dd53"),        "A" : 1,        "B" : {            "BA" : 1,            "BB" : 2,            "BC" : 3,            "BD" : 4        }    }]array2 = np.array([[d["A"]] + [v for _, v in sorted(d["B"].items())] for d in data])print(array2)

output

[[50 14 23 32 41] [ 1  1  2  3  4]]


The flatdict module can sometimes be useful when working with mongodb data structures. It will handle flattening the nested dictionary structure for you:

columns = []for d in data:    flat = flatdict.FlatDict(d)    del flat['_id']    columns.append([item[1] for item in sorted(flat.items(), key=lambda item: item[0])])np.vstack(columns)

Of course this can be solved without flatdict too.