From JSON file to Numpy Array
Assuming you've successfully loaded that JSON into Python, here's one way to create the Numpy array you want. My code has a minimal definition of ObjectId
so that it won't raise a NameError on ObjectId
entries.
sorted(d["B"].items())]
produces a list of (key, value) tuples from the contents of a "B" dictionary, sorted by key. We then extract just the values from those tuples into a list, and append that list to a list containing the value from the "A" item.
import numpy as npclass ObjectId(object): def __init__(self, objectid): self.objectid = objectid def __repr__(self): return 'ObjectId("{}")'.format(self.objectid)data = [ { "_id" : ObjectId("57065024c3d1132426c4dd53"), "B" : { "BA" : 14, "BB" : 23, "BC" : 32, "BD" : 41 }, "A" : 50 }, { "_id" : ObjectId("57065024c3d1132426c4dd53"), "A" : 1, "B" : { "BA" : 1, "BB" : 2, "BC" : 3, "BD" : 4 } }]array2 = np.array([[d["A"]] + [v for _, v in sorted(d["B"].items())] for d in data])print(array2)
output
[[50 14 23 32 41] [ 1 1 2 3 4]]
The flatdict module can sometimes be useful when working with mongodb data structures. It will handle flattening the nested dictionary structure for you:
columns = []for d in data: flat = flatdict.FlatDict(d) del flat['_id'] columns.append([item[1] for item in sorted(flat.items(), key=lambda item: item[0])])np.vstack(columns)
Of course this can be solved without flatdict too.