Counting non-zero elements within each row and within each column of a 2D NumPy array Counting non-zero elements within each row and within each column of a 2D NumPy array arrays arrays

Counting non-zero elements within each row and within each column of a 2D NumPy array


import numpy as npa = np.array([[1, 0, 1],              [2, 3, 4],              [0, 0, 7]])columns = (a != 0).sum(0)rows    = (a != 0).sum(1)

The variable (a != 0) is an array of the same shape as original a and it contains True for all non-zero elements.

The .sum(x) function sums the elements over the axis x. Sum of True/False elements is the number of True elements.

The variables columns and rows contain the number of non-zero (element != 0) values in each column/row of your original array:

columns = np.array([2, 1, 3])rows    = np.array([2, 3, 1])

EDIT: The whole code could look like this (with a few simplifications in your original code):

ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)for j, TestID in enumerate(TestIDs):    ReadOrWrite = 'Read'    fileName = inputFileName    directory = GetCurrentDirectory(arguments that return correct directory)    # use directory or filename to get the CSV file?    with open(directory, 'r') as csvfile:        ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)

EDIT 2:

To get the mean value of all columns/rows, use the following:

colMean = a.sum(0) / (a != 0).sum(0)rowMean = a.sum(1) / (a != 0).sum(1)

What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.


A fast way to count nonzero elements per row in a scipy sparse matrix m is:

np.diff(m.tocsr().indptr)

The indptr attribute of a CSR matrix indicates the indices within the data corresponding to the boundaries between rows. So calculating the difference between each entry will provide the number of non-zero elements in each row.

Similarly, for the number of nonzero elements in each column, use:

np.diff(m.tocsc().indptr)

If the data is already in the appropriate form, these will run in O(m.shape[0]) and O(m.shape[1]) respectively, rather than O(m.getnnz()) in Marat and Finn's solutions.

If you need both row and column nozero counts, and, say, m is already a CSR, you might use:

row_nonzeros = np.diff(m.indptr)col_nonzeros = np.bincount(m.indices)

which is not asymptotically faster than first converting to CSC (which is O(m.getnnz())) to get col_nonzeros, but is faster because of implementation details.


The faster way is to clone your matrix with ones instead of real values. Then just sum up by rows or columns:

X_clone = X.tocsc()X_clone.data = np.ones( X_clone.data.shape )NumNonZeroElementsByColumn = X_clone.sum(0)NumNonZeroElementsByRow = X_clone.sum(1)

That worked 50 times faster for me than Finn Årup Nielsen's solution (1 second against 53)

edit:Perhaps you will need to translate NumNonZeroElementsByColumn into 1-dimensional array by

np.array(NumNonZeroElementsByColumn)[0]