Which is the best way to compress json to store in a memory based store like redis or memcache?

We just use gzip as a compressor.

import gzipimport cStringIOdef decompressStringToFile(value, outputFile):  """  decompress the given string value (which must be valid compressed gzip  data) and write the result in the given open file.  """  stream = cStringIO.StringIO(value)  decompressor = gzip.GzipFile(fileobj=stream, mode='r')  while True:  # until EOF    chunk = decompressor.read(8192)    if not chunk:      decompressor.close()      outputFile.close()      return     outputFile.write(chunk)def compressFileToString(inputFile):  """  read the given open file, compress the data and return it as string.  """  stream = cStringIO.StringIO()  compressor = gzip.GzipFile(fileobj=stream, mode='w')  while True:  # until EOF    chunk = inputFile.read(8192)    if not chunk:  # EOF?      compressor.close()      return stream.getvalue()    compressor.write(chunk)

In our usecase we store the result as files, as you can imagine. To use just in-memory strings, you can use a cStringIO.StringIO() object as a replacement for the file as well.

python json redis msgpack

Based on @Alfe's answer above here is a version that keeps the contents in memory (for network I/O tasks). I also made a few changes to support Python 3.

import gzipfrom io import StringIO, BytesIOdef decompressBytesToString(inputBytes):  """  decompress the given byte array (which must be valid   compressed gzip data) and return the decoded text (utf-8).  """  bio = BytesIO()  stream = BytesIO(inputBytes)  decompressor = gzip.GzipFile(fileobj=stream, mode='r')  while True:  # until EOF    chunk = decompressor.read(8192)    if not chunk:      decompressor.close()      bio.seek(0)      return bio.read().decode("utf-8")    bio.write(chunk)  return Nonedef compressStringToBytes(inputString):  """  read the given string, encode it in utf-8,  compress the data and return it as a byte array.  """  bio = BytesIO()  bio.write(inputString.encode("utf-8"))  bio.seek(0)  stream = BytesIO()  compressor = gzip.GzipFile(fileobj=stream, mode='w')  while True:  # until EOF    chunk = bio.read(8192)    if not chunk:  # EOF?      compressor.close()      return stream.getvalue()    compressor.write(chunk)

To test the compression try:

inputString="asdf" * 1000len(inputString)len(compressStringToBytes(inputString))decompressBytesToString(compressStringToBytes(inputString))

python json redis msgpack

If you want it to be fast, try lz4. If you want it to compress better, go for lzma.

Are there any other better ways to compress json to save memory in redis(also ensuring light weight decoding afterwards)?
How good a candidate would be msgpack [http://msgpack.org/]?

Msgpack is relatively fast and has a smaller memory footprint. But ujson is generally faster for me. You should compare them on your data, measure the compression and decompression rates and the compression ratio.

Shall I consider options like pickle as well?

Consider both pickle(cPickle in partucular) and marshal. They are fast. But remember that they are not secure or scalable and you pay for the speed with the added responsibility.

CodeHunter

Which is the best way to compress json to store in a memory based store like redis or memcache?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last