Compressing A Series of JSON Objects While Maintaining Serial Reading? Compressing A Series of JSON Objects While Maintaining Serial Reading? json json

Compressing A Series of JSON Objects While Maintaining Serial Reading?


Just use a gzip.GzipFile() object and treat it like a regular file; write JSON objects line by line, and read them line by line.

The object takes care of compression transparently, and will buffer reads, decompressing chucks as needed.

import gzipimport json# writingwith gzip.GzipFile(jsonfilename, 'w') as outfile:    for obj in objects:        outfile.write(json.dumps(obj) + '\n')# readingwith gzip.GzipFile(jsonfilename, 'r') as infile:    for line in infile:        obj = json.loads(line)        # process obj

This has the added advantage that the compression algorithm can make use of repetition across objects for compression ratios.


You might want to try an incremental json parser, such as jsaone.

That is, create a single json with all your objects, and parse it like

with gzip.GzipFile(file_path, 'r') as f_in:    for key, val in jsaone.load(f_in):        ...

This is quite similar to Martin's answer, wasting slightly more space but maybe slightly more comfortable.

EDIT: oh, by the way, it's probably fair to clarify that I wrote jsaone.