Performance of gzipped json vs efficient binary serialization Performance of gzipped json vs efficient binary serialization json json

Performance of gzipped json vs efficient binary serialization


Serialising with json+gzip uses 25% more space than rawbytes+gzip for numbers and objects. For limited precision numbers (4 significant digits) the serialised size is the same. It seems that for small scale applications, using json+gzip is good enough in terms of data size. This is true even when sending an array of records where each record fully spells out the fields and values (the common way of storing data in JavaScript).

Source for the experiment below: https://github.com/csiz/gzip-json-performance

Numbers

I picked a million floating point (64 bit) numbers. I assume these numbers come from some natural source so I used an exponential distribution to generate them and then round them to 4 significant digits. Because JSON writes down the whole representation I thought storing large numbers might incur a bigger cost (eg. storing 123456.000000, vs 0.123456) so I check both cases. I also check serialising numbers that haven't been rounded.

Size used by compressed json is 9% larger vs compressed binary when serialising small numbers (order of magnitude around 1.0, so only a few digits to write down):

json 3.29mb json/raw 43%binary 3.03mb binary/raw 40%json/binary 1.09

Size used by compressed json is 17% smaller vs compressed binary when serialising large numbers (order of magnitude around 1000000, more digits to write down):

json 2.58mb json/raw 34%binary 3.10mb binary/raw 41%json/binary 0.83

Size used by compressed json is 22% larger vs compressed binary when serialising full precision doubles:

json 8.90mb json/raw 117%binary 7.27mb binary/raw 95%json/binary 1.22

Objects

For objects I'm serialising them the usual lazy way in JSON. Each object is stored as a complete record with the field names and values. The "choice" enumeration has it's value fully spelled out.

[  {    "small number": 0.1234,    "large number": 1234000,    "choice": "two"  },  ...]

While for the efficient binary representation I vectorise the objects. I store the number of objects, then a continuous vector of the small numbers, then a continuous vector for the choice enum. In this case I assume the enum values are known and fixed, so I store the index into this enum.

n = 1e6small number = binary([0.1234, ...])large number = binary([1234000, ...])choice = binary([2, ...]) # indexes to the enum ["zero", "one", ..., "four"]

Size used by compressed json is 27% larger vs compressed binary when storing objects:

json 8.36mb json/raw 44%binary 6.59mb binary/raw 35%json/binary 1.27