Which is the best way to compress json to store in a memory based store like redis or memcache? Which is the best way to compress json to store in a memory based store like redis or memcache? python python

Which is the best way to compress json to store in a memory based store like redis or memcache?


We just use gzip as a compressor.

import gzipimport cStringIOdef decompressStringToFile(value, outputFile):  """  decompress the given string value (which must be valid compressed gzip  data) and write the result in the given open file.  """  stream = cStringIO.StringIO(value)  decompressor = gzip.GzipFile(fileobj=stream, mode='r')  while True:  # until EOF    chunk = decompressor.read(8192)    if not chunk:      decompressor.close()      outputFile.close()      return     outputFile.write(chunk)def compressFileToString(inputFile):  """  read the given open file, compress the data and return it as string.  """  stream = cStringIO.StringIO()  compressor = gzip.GzipFile(fileobj=stream, mode='w')  while True:  # until EOF    chunk = inputFile.read(8192)    if not chunk:  # EOF?      compressor.close()      return stream.getvalue()    compressor.write(chunk)

In our usecase we store the result as files, as you can imagine. To use just in-memory strings, you can use a cStringIO.StringIO() object as a replacement for the file as well.


Based on @Alfe's answer above here is a version that keeps the contents in memory (for network I/O tasks). I also made a few changes to support Python 3.

import gzipfrom io import StringIO, BytesIOdef decompressBytesToString(inputBytes):  """  decompress the given byte array (which must be valid   compressed gzip data) and return the decoded text (utf-8).  """  bio = BytesIO()  stream = BytesIO(inputBytes)  decompressor = gzip.GzipFile(fileobj=stream, mode='r')  while True:  # until EOF    chunk = decompressor.read(8192)    if not chunk:      decompressor.close()      bio.seek(0)      return bio.read().decode("utf-8")    bio.write(chunk)  return Nonedef compressStringToBytes(inputString):  """  read the given string, encode it in utf-8,  compress the data and return it as a byte array.  """  bio = BytesIO()  bio.write(inputString.encode("utf-8"))  bio.seek(0)  stream = BytesIO()  compressor = gzip.GzipFile(fileobj=stream, mode='w')  while True:  # until EOF    chunk = bio.read(8192)    if not chunk:  # EOF?      compressor.close()      return stream.getvalue()    compressor.write(chunk)

To test the compression try:

inputString="asdf" * 1000len(inputString)len(compressStringToBytes(inputString))decompressBytesToString(compressStringToBytes(inputString))


If you want it to be fast, try lz4. If you want it to compress better, go for lzma.

Are there any other better ways to compress json to save memory in redis(also ensuring light weight decoding afterwards)?

How good a candidate would be msgpack [http://msgpack.org/]?

Msgpack is relatively fast and has a smaller memory footprint. But ujson is generally faster for me. You should compare them on your data, measure the compression and decompression rates and the compression ratio.

Shall I consider options like pickle as well?

Consider both pickle(cPickle in partucular) and marshal. They are fast. But remember that they are not secure or scalable and you pay for the speed with the added responsibility.