Opening A large JSON file Opening A large JSON file json json

Opening A large JSON file


You want an incremental json parser like yajl and one of its python bindings. An incremental parser reads as little as possible from the input and invokes a callback when something meaningful is decoded. For example, to pull only numbers from a big json file:

class ContentHandler(YajlContentHandler):    def yajl_number(self, ctx, val):         list_of_numbers.append(float(val))parser = YajlParser(ContentHandler())parser.parse(some_file)

See http://pykler.github.com/yajl-py/ for more info.


I have found another python wrapper around yajl library, which is ijson.

It works better for me than yajl-py due to the following reasons:

  • yajl-py did not detect yajl library on my system, I had to hack the code in order to make it work
  • ijson code is more compact and easier to use
  • ijson can work with both yajl v1 and yajl v2, and it even has pure python yajl replacement
  • ijson has very nice ObjectBuilder, which helps extracting not just events but meaningful sub-objects from parsed stream, and at the level you specify


I found yajl (hence ijson) to be much slower than module json when a large data file was accessed from local disk. Here is a module that claims to perform better than yajl/ijson (still slower than json), when used with Cython:

http://pietrobattiston.it/jsaone

As the author points out, performance may be better than json when the file is received over the network since an incremental parser can start parsing sooner.