Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix pandas pandas

Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix


A direct conversion is not supported ATM. Contributions are welcome!

Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column)and pretty space efficient

In [37]: col = np.array([0,0,1,2,2,2])In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )In [40]: mOut[40]: <3x3 sparse matrix of type '<type 'numpy.float64'>'        with 6 stored elements in Compressed Sparse Column format>In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())                               for i in np.arange(m.shape[0]) ])Out[46]:    0  1  20  1  0  41  0  0  52  2  3  6In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())                                    for i in np.arange(m.shape[0]) ])In [48]: type(df)Out[48]: pandas.sparse.frame.SparseDataFrame


As of pandas v 0.20.0 you can use the SparseDataFrame constructor.

An example from the pandas docs:

import numpy as npimport pandas as pdfrom scipy.sparse import csr_matrixarr = np.random.random(size=(1000, 5))arr[arr < .9] = 0sp_arr = csr_matrix(arr)sdf = pd.SparseDataFrame(sp_arr)


A much shorter version:

df = pd.DataFrame(m.toarray())