NumPy arrays with SQLite NumPy arrays with SQLite sqlite sqlite

NumPy arrays with SQLite


why not give redis a try?

Drivers for your two platforms of interest are available--python (redis, via package index]2), and R (rredis, CRAN).

The genius of redis is not that it will magically recognize the NumPy data type and allow you to insert and extract multi-dimensional NumPy arrays as if they were native redis datatypes, rather its genius is in the remarkable ease with which you can create such an interface with just a few lines of code.

There are (at least) several tutorials on redis in python; the one on the DeGizmo blog is particularly good.

import numpy as NP# create some dataA = NP.random.randint(0, 10, 40).reshape(8, 5)# a couple of utility functions to (i) manipulate NumPy arrays prior to insertion # into redis db for more compact storage & # (ii) to restore the original NumPy data types upon retrieval from redis dbfnx2 = lambda v : map(int, list(v))fnx = lambda v : ''.join(map(str, v))# start the redis server (e.g. from a bash prompt)$> cd /usr/local/bin      # default install directory for 'nix$> redis-server           # starts the redis server# start the redis client:from redis import Redisr0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()# to insert items using redis 'string' datatype, call 'set' on the database, r0, and# just pass in a key, and the item to insertr0.set('k1', A[0,:])# row-wise insertion the 2D array into redis, iterate over the array:for c in range(A.shape[0]):    r0.set( "k{0}".format(c), fnx(A[c,:]) )# or to insert all rows at once# use 'mset' ('multi set') and pass in a key-value mapping: x = dict([sublist for sublist in enumerate(A.tolist())])r0.mset(x1)# to retrieve a row, pass its key to 'get'>>> r0.get('k0')  '63295'# retrieve the entire array from redis:kx = r0.keys('*')           # returns all keys in redis database, r0for key in kx :    r0.get(key)# to retrieve it in original form:A = []for key in kx:    A.append(fnx2(r0.get("{0}".format(key))))>>> A = NP.array(A)>>> A  array([[ 6.,  2.,  3.,  3.,  9.],         [ 4.,  9.,  6.,  2.,  3.],         [ 3.,  7.,  9.,  5.,  0.],         [ 5.,  2.,  6.,  3.,  4.],         [ 7.,  1.,  5.,  0.,  2.],         [ 8.,  6.,  1.,  5.,  8.],         [ 1.,  7.,  6.,  4.,  9.],         [ 6.,  4.,  1.,  3.,  6.]])


Doug's suggestion with redis is quite good, but I think his code is a bit complicated and, as a result, rather slow. For my purposes, I had to serialize+write and then grab+deserialize a square matrix of about a million floats in less than a tenth of a second, so I did this:

For writing:

snapshot = np.random.randn(1024,1024)serialized = snapshot.tobytes()rs.set('snapshot_key', serialized)

Then for reads:

s = rs.get('snapshot_key')deserialized = np.frombuffer(s).astype(np.float32)rank = np.sqrt(deserialized.size).astype(int)snap = deserialized(rank, rank)

You can do some basic performance testing with ipython using %time, but neither the tobytes or frombuffer take more than a few milliseconds.


This looks a bit older but is there any reason you cannot just do a fetchall() instead of iterating and then just initializing numpy on declaration?