Counting non-zero elements within each row and within each column of a 2D NumPy array
import numpy as npa = np.array([[1, 0, 1], [2, 3, 4], [0, 0, 7]])columns = (a != 0).sum(0)rows = (a != 0).sum(1)
The variable (a != 0)
is an array of the same shape as original a
and it contains True
for all non-zero elements.
The .sum(x)
function sums the elements over the axis x
. Sum of True/False
elements is the number of True
elements.
The variables columns
and rows
contain the number of non-zero (element != 0) values in each column/row of your original array:
columns = np.array([2, 1, 3])rows = np.array([2, 3, 1])
EDIT: The whole code could look like this (with a few simplifications in your original code):
ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)for j, TestID in enumerate(TestIDs): ReadOrWrite = 'Read' fileName = inputFileName directory = GetCurrentDirectory(arguments that return correct directory) # use directory or filename to get the CSV file? with open(directory, 'r') as csvfile: ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)
EDIT 2:
To get the mean value of all columns/rows, use the following:
colMean = a.sum(0) / (a != 0).sum(0)rowMean = a.sum(1) / (a != 0).sum(1)
What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.
A fast way to count nonzero elements per row in a scipy sparse matrix m
is:
np.diff(m.tocsr().indptr)
The indptr
attribute of a CSR matrix indicates the indices within the data corresponding to the boundaries between rows. So calculating the difference between each entry will provide the number of non-zero elements in each row.
Similarly, for the number of nonzero elements in each column, use:
np.diff(m.tocsc().indptr)
If the data is already in the appropriate form, these will run in O(m.shape[0]
) and O(m.shape[1]
) respectively, rather than O(m.getnnz()
) in Marat and Finn's solutions.
If you need both row and column nozero counts, and, say, m
is already a CSR, you might use:
row_nonzeros = np.diff(m.indptr)col_nonzeros = np.bincount(m.indices)
which is not asymptotically faster than first converting to CSC (which is O(m.getnnz()
)) to get col_nonzeros
, but is faster because of implementation details.
The faster way is to clone your matrix with ones instead of real values. Then just sum up by rows or columns:
X_clone = X.tocsc()X_clone.data = np.ones( X_clone.data.shape )NumNonZeroElementsByColumn = X_clone.sum(0)NumNonZeroElementsByRow = X_clone.sum(1)
That worked 50 times faster for me than Finn Årup Nielsen's solution (1 second against 53)
edit:Perhaps you will need to translate NumNonZeroElementsByColumn into 1-dimensional array by
np.array(NumNonZeroElementsByColumn)[0]