Fastest way to store a numpy array in redis Fastest way to store a numpy array in redis numpy numpy

Fastest way to store a numpy array in redis


I don't know if it is fastest, but you could try something like this...

Storing a Numpy array to Redis goes like this - see function toRedis():

  • get shape of Numpy array and encode
  • append the Numpy array as bytes to the shape
  • store the encoded array under supplied key

Retrieving a Numpy array goes like this - see function fromRedis():

  • retrieve from Redis the encoded string corresponding to supplied key
  • extract the shape of the Numpy array from the string
  • extract data and repopulate Numpy array, reshape to original shape

#!/usr/bin/env python3import structimport redisimport numpy as npdef toRedis(r,a,n):   """Store given Numpy array 'a' in Redis under key 'n'"""   h, w = a.shape   shape = struct.pack('>II',h,w)   encoded = shape + a.tobytes()   # Store encoded data in Redis   r.set(n,encoded)   returndef fromRedis(r,n):   """Retrieve Numpy array from Redis key 'n'"""   encoded = r.get(n)   h, w = struct.unpack('>II',encoded[:8])   # Add slicing here, or else the array would differ from the original   a = np.frombuffer(encoded[8:]).reshape(h,w)   return a# Create 80x80 numpy array to storea0 = np.arange(6400,dtype=np.uint16).reshape(80,80) # Redis connectionr = redis.Redis(host='localhost', port=6379, db=0)# Store array a0 in Redis under name 'a0array'toRedis(r,a0,'a0array')# Retrieve from Redisa1 = fromRedis(r,'a0array')np.testing.assert_array_equal(a0,a1)

You could add more flexibility by encoding the dtype of the Numpy array along with the shape. I didn't do that because it may be the case that you already know all your arrays are of one specific type and then the code would just be bigger and harder to read for no reason.

Rough benchmark on modern iMac:

80x80 Numpy array of np.uint16   => 58 microseconds to write200x200 Numpy array of np.uint16 => 88 microseconds to write

Keywords: Python, Numpy, Redis, array, serialise, serialize, key, incr, unique


You could also consider using msgpack-numpy, which provides "encoding and decoding routines that enable the serialization and deserialization of numerical and array data types provided by numpy using the highly efficient msgpack format." -- see https://msgpack.org/.

Quick proof-of-concept:

import msgpackimport msgpack_numpy as mimport numpy as npm.patch()               # Important line to monkey-patch for numpy support!from redis import Redisr = Redis('127.0.0.1')# Create an array, then use msgpack to serialize it d_orig = np.array([1,2,3,4])d_orig_packed = m.packb(d_orig)# Set the data in redisr.set('d', d_orig_packed)# Retrieve and unpack the datad_out = m.unpackb(r.get('d'))# Check they matchassert np.alltrue(d_orig == d_out)assert d_orig.dtype == d_out.dtype

On my machine, msgpack runs much quicker than using struct:

In: %timeit struct.pack('4096L', *np.arange(0, 4096))1000 loops, best of 3: 443 µs per loopIn: %timeit m.packb(np.arange(0, 4096))The slowest run took 7.74 times longer than the fastest. This could mean that an intermediate result is being cached.10000 loops, best of 3: 32.6 µs per loop


You can check Mark Setchell's answer for how to actually write the bytes into Redis. Below I rewrite the functions fromRedis and toRedis to account for arrays of variable dimension size and to also include the array shape.

def toRedis(arr: np.array) -> str:    arr_dtype = bytearray(str(arr.dtype), 'utf-8')    arr_shape = bytearray(','.join([str(a) for a in arr.shape]), 'utf-8')    sep = bytearray('|', 'utf-8')    arr_bytes = arr.ravel().tobytes()    to_return = arr_dtype + sep + arr_shape + sep + arr_bytes    return to_returndef fromRedis(serialized_arr: str) -> np.array:    sep = '|'.encode('utf-8')    i_0 = serialized_arr.find(sep)    i_1 = serialized_arr.find(sep, i_0 + 1)    arr_dtype = serialized_arr[:i_0].decode('utf-8')    arr_shape = tuple([int(a) for a in serialized_arr[i_0 + 1:i_1].decode('utf-8').split(',')])    arr_str = serialized_arr[i_1 + 1:]    arr = np.frombuffer(arr_str, dtype = arr_dtype).reshape(arr_shape)    return arr