Is there an efficient way of concatenating scipy.sparse matrices? Is there an efficient way of concatenating scipy.sparse matrices? python python

Is there an efficient way of concatenating scipy.sparse matrices?


The sparse library now has hstack and vstack for respectively concatenating matrices horizontally and vertically.


Using hstack, vstack, or concatenate, is dramatically slower than concatenating the inner data objects themselves. The reason is that hstack/vstack converts the sparse matrix to coo format which can be very slow when the matrix is very large not and not in coo format. Here is the code for concatenating csc matrices, similar method can be used for csr matrices:

def concatenate_csc_matrices_by_columns(matrix1, matrix2):    new_data = np.concatenate((matrix1.data, matrix2.data))    new_indices = np.concatenate((matrix1.indices, matrix2.indices))    new_ind_ptr = matrix2.indptr + len(matrix1.data)    new_ind_ptr = new_ind_ptr[1:]    new_ind_ptr = np.concatenate((matrix1.indptr, new_ind_ptr))    return csc_matrix((new_data, new_indices, new_ind_ptr))


Okay, I found the answer. Using scipy.sparse.coo_matrix is much much faster than using lil_matrix. I converted the matrices to coo (painless and fast) and then just concatenated the data, rows and columns after adding the right padding.

data = scipy.concatenate((m1S.data,bridgeS.data,bridgeTS.data,m2S.data))rows = scipy.concatenate((m1S.row,bridgeS.row,bridgeTS.row + m1S.shape[0],m2S.row + m1S.shape[0]))cols = scipy.concatenate((m1S.col,bridgeS.col+ m1S.shape[1],bridgeTS.col ,m2S.col + m1S.shape[1])) scipy.sparse.coo_matrix((data,(rows,cols)),shape=(m1S.shape[0]+m2S.shape[0],m1S.shape[1]+m2S.shape[1]) )