Scipy sparse... arrays? Scipy sparse... arrays? python python

Scipy sparse... arrays?


Use a scipy.sparse format that is row or column based: csc_matrix and csr_matrix.

These use efficient, C implementations under the hood (including multiplication), and transposition is a no-op (esp. if you call transpose(copy=False)), just like with numpy arrays.

EDIT: some timings via ipython:

import numpy, scipy.sparsen = 100000x = (numpy.random.rand(n) * 2).astype(int).astype(float) # 50% sparse vectorx_csr = scipy.sparse.csr_matrix(x)x_dok = scipy.sparse.dok_matrix(x.reshape(x_csr.shape))

Now x_csr and x_dok are 50% sparse:

print repr(x_csr)<1x100000 sparse matrix of type '<type 'numpy.float64'>'        with 49757 stored elements in Compressed Sparse Row format>

And the timings:

timeit numpy.dot(x, x)10000 loops, best of 3: 123 us per looptimeit x_dok * x_dok.T1 loops, best of 3: 1.73 s per looptimeit x_csr.multiply(x_csr).sum()1000 loops, best of 3: 1.64 ms per looptimeit x_csr * x_csr.T100 loops, best of 3: 3.62 ms per loop

So it looks like I told a lie. Transposition is very cheap, but there is no efficient C implementation of csr * csc (in the latest scipy 0.9.0). A new csr object is constructed in each call :-(

As a hack (though scipy is relatively stable these days), you can do the dot product directly on the sparse data:

timeit numpy.dot(x_csr.data, x_csr.data)10000 loops, best of 3: 62.9 us per loop

Note this last approach does a numpy dense multiplication again. The sparsity is 50%, so it's actually faster than dot(x, x) by a factor of 2.


You could create a subclass of one of the existing 2d sparse arrays

from scipy.sparse import dok_matrixclass sparse1d(dok_matrix):    def __init__(self, v):        dok_matrix.__init__(self, (v,))    def dot(self, other):        return dok_matrix.dot(self, other.transpose())[0,0]a=sparse1d((1,2,3))b=sparse1d((4,5,6))print a.dot(b)


I'm not sure that it is really much better or faster, but you could do this to avoid using the transpose:

Asp.multiply(Bsp).sum()

This just takes the element-by-element product of the two matrices and sums the products. You could make a subclass of whatever matrix format you are using that has the above statement as the dot product.

However, it is probably just easier to tranpose them:

Asp*Bsp.T

That doesn't seem like so much to do, but you could also make a subclass and modify the mul() method.