How to read big json? How to read big json? c c

How to read big json?


Although your question doesn't specify this detail, you may want to make sure that loading the entire JSON in memory is actually what you want. It looks like RJSONIO is a DOM-based API.

What computation do you need to do? Can you use a streaming parser? An example of a SAX-like streaming parser for JSON is yajl.


Even though the question is very old, this might be of use for someone with a similar problem.

The function jsonlite::stream_in() allows to define pagesize to set the number of lines read at a time, and a custom function that is applied to this subset in each iteration can be provided as handler. This allows working with very large JSON-files without reading everything into memory at the same time.

stream_in(con, pagesize = 5000, handler = function(x){    # Do something with the data here})


Not on the memory size, but on the speed, for the quite small iris dataset (only 7088 bytes), the RJSONIO package is an order of magnitude slower than rjson. Don't use the method 'R' unless you really have to! Note the different units in the two sets of results.

library(rjson) # library(RJSONIO)library(plyr)library(microbenchmark)x <- toJSON(iris)(op <- microbenchmark(CJ=toJSON(iris), RJ=toJSON(iris, method='R'),  JC=fromJSON(x), JR=fromJSON(x, method='R') ) )# for rjson on this machine...Unit: microseconds  expr        min          lq     median          uq        max1   CJ    491.470    496.5215    501.467    537.6295    561.4372   JC    242.079    249.8860    259.562    274.5550    325.8853   JR 167673.237 170963.4895 171784.270 172132.7540 190310.5824   RJ    912.666    925.3390    957.250   1014.2075   1153.494# for RJSONIO on the same machine...Unit: milliseconds  expr      min       lq   median       uq      max1   CJ 7.338376 7.467097 7.563563 7.639456 8.5917482   JC 1.186369 1.234235 1.247235 1.265922 2.1652603   JR 1.196690 1.238406 1.259552 1.278455 2.3257894   RJ 7.353977 7.481313 7.586960 7.947347 9.364393