How to extract multiple JSON objects from one file?

python json pandas dataframe parsing

Update: I wrote a solution that doesn't require reading the entire file in one go. It's too big for a stackoverflow answer, but can be found here jsonstream.

You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where wasn't part of the parsed object. It's not documented, but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn't accept strings that have prefixing whitespace. So we need to search to find the first none-whitespace part of your document.

from json import JSONDecoder, JSONDecodeErrorimport reNOT_WHITESPACE = re.compile(r'[^\s]')def decode_stacked(document, pos=0, decoder=JSONDecoder()):    while True:        match = NOT_WHITESPACE.search(document, pos)        if not match:            return        pos = match.start()                try:            obj, pos = decoder.raw_decode(document, pos)        except JSONDecodeError:            # do something sensible if there's some error            raise        yield objs = """{"a": 1}     [1,   2]"""for obj in decode_stacked(s):    print(obj)

prints:

{'a': 1}[1, 2]

python json pandas dataframe parsing

Use a json array, in the format:

[{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",  "Code":[{"event1":"A","result":"1"},…]},{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",  "Code":[{"event1":"B","result":"1"},…]},{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",  "Code":[{"event1":"B","result":"0"},…]},...]

Then import it into your python code

import jsonwith open('file.json') as json_file:    data = json.load(json_file)

Now the content of data is an array with dictionaries representing each of the elements.

You can access it easily, i.e:

data[0]["ID"]

python json pandas dataframe parsing

So, as was mentioned in a couple comments containing the data in an array is simpler but the solution does not scale well in terms of efficiency as the data set size increases. You really should only use an iterator when you want to access a random object in the array, otherwise, generators are the way to go. Below I have prototyped a reader function which reads each json object individually and returns a generator.

The basic idea is to signal the reader to split on the carriage character "\n" (or "\r\n" for Windows). Python can do this with the file.readline() function.

import jsondef json_reader(filename):    with open(filename) as f:        for line in f:            yield json.loads(line)

However, this method only really works when the file is written as you have it -- with each object separated by a newline character. Below I wrote an example of a writer that separates an array of json objects and saves each one on a new line.

def json_writer(file, json_objects):    with open(file, "w") as f:        for jsonobj in json_objects:            jsonstr = json.dumps(jsonobj)            f.write(jsonstr + "\n")

You could also do the same operation with file.writelines() and a list comprehension:

...    json_strs = [json.dumps(j) + "\n" for j in json_objects]    f.writelines(json_strs)...

And if you wanted to append the data instead of writing a new file just change open(file, "w") to open(file, "a").

In the end I find this helps a great deal not only with readability when I try and open json files in a text editor but also in terms of using memory more efficiently.

On that note if you change your mind at some point and you want a list out of the reader, Python allows you to put a generator function inside of a list and populate the list automatically. In other words, just write

lst = list(json_reader(file))

CodeHunter

How to extract multiple JSON objects from one file?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last