scipy csr_matrix: understand indptr

Maybe this explanation can help understand the concept:

data is an array containing all the non zero elements of the sparse matrix.
indices is an array mapping each element in data to its column in the sparse matrix.
indptr then maps the elements of data and indices to the rows of the sparse matrix. This is done with the following reasoning:
1. If the sparse matrix has M rows, indptr is an array containing M+1 elements
2. for row i, [indptr[i]:indptr[i+1]] returns the indices of elements to take from data and indices corresponding to row i. So suppose indptr[i]=k and indptr[i+1]=l, the data corresponding to row i would be data[k:l] at columns indices[k:l]. This is the tricky part, and I hope the following example helps understanding it.

EDIT : I replaced the numbers in data by letters to avoid confusion in the following example.

Note: the values in indptr are necessarily increasing, because the next cell in indptr (the next row) is referring to the next values in data and indices corresponding to that row.

python scipy sparse-matrix

Sure, the elements inside indptr are in ascending order.But how to explain the indptr behavior? In short words, until the element inside indptr is the same or doesn't increase, you can skip row index of the sparse matrix.

The following example illustrates the above interpretation of indptr elements:

Example 1) imagine this matrix:

array([[0, 1, 0],       [8, 0, 0],       [0, 0, 0],       [0, 0, 0],       [0, 0, 7]])mat1 = csr_matrix(([1,8,7], [1,0,2], [0,1,2,2,2,3]), shape=(5,3))mat1.indptr# array([0, 1, 2, 2, 2, 3], dtype=int32)mat1.todense()  # to get the corresponding sparse matrix

Example 2) Array to CSR_matrix (the case when the sparse matrix already exists):

arr = np.array([[0, 0, 0],                [8, 0, 0],                [0, 5, 4],                [0, 0, 0],                [0, 0, 7]])mat2 = csr_matrix(arr))mat2.indptr# array([0, 0, 1, 3, 3, 4], dtype=int32)mat2.indices# array([0, 1, 2, 2], dtype=int32)mat.data# array([8, 5, 4, 7], dtype=int32)

python scipy sparse-matrix

indptr = np.array([0, 2, 3, 6])indices = np.array([0, 2, 2, 0, 1, 2])data = np.array([1, 2, 3, 4, 5, 6])csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()array([[1, 0, 2],      [0, 0, 3],      [4, 5, 6]])

In the above example from scipy documentation.

The data array contains the non-zero elements present in the sparse matrix traversed row-wise.
The indices array gives the column number for each non-zero data point.
For example :-col[0] for the first element in data i.e. 1, col[2] for second element in data i.e. 2 and so on till the last data element, so the size of the data array and the indices array is same.
The indptr array basically indicates the location of the first element of the row. Its size is one more than the number of rows.
For example :- the first element of indptr is 0 indicating the first element of row[0] present at data[0] i.e. '1', the second element of indptr is 2 indicating the first element in row[1] which is present at data[2] i.e. the element '3' and the third element of indptr is 3 indicating that the first element of row[2] is at data[3] i.e. '4'.
Hope you get the point.

CodeHunter

scipy csr_matrix: understand indptr

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last