Counting non-zero elements within each row and within each column of a 2D NumPy array

python arrays count numpy

import numpy as npa = np.array([[1, 0, 1],              [2, 3, 4],              [0, 0, 7]])columns = (a != 0).sum(0)rows    = (a != 0).sum(1)

The variable (a != 0) is an array of the same shape as original a and it contains True for all non-zero elements.

The .sum(x) function sums the elements over the axis x. Sum of True/False elements is the number of True elements.

The variables columns and rows contain the number of non-zero (element != 0) values in each column/row of your original array:

columns = np.array([2, 1, 3])rows    = np.array([2, 3, 1])

EDIT: The whole code could look like this (with a few simplifications in your original code):

ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)for j, TestID in enumerate(TestIDs):    ReadOrWrite = 'Read'    fileName = inputFileName    directory = GetCurrentDirectory(arguments that return correct directory)    # use directory or filename to get the CSV file?    with open(directory, 'r') as csvfile:        ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)

EDIT 2:

To get the mean value of all columns/rows, use the following:

colMean = a.sum(0) / (a != 0).sum(0)rowMean = a.sum(1) / (a != 0).sum(1)

What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.

python arrays count numpy

A fast way to count nonzero elements per row in a scipy sparse matrix m is:

np.diff(m.tocsr().indptr)

The indptr attribute of a CSR matrix indicates the indices within the data corresponding to the boundaries between rows. So calculating the difference between each entry will provide the number of non-zero elements in each row.

Similarly, for the number of nonzero elements in each column, use:

np.diff(m.tocsc().indptr)

If the data is already in the appropriate form, these will run in O(m.shape[0]) and O(m.shape[1]) respectively, rather than O(m.getnnz()) in Marat and Finn's solutions.

If you need both row and column nozero counts, and, say, m is already a CSR, you might use:

row_nonzeros = np.diff(m.indptr)col_nonzeros = np.bincount(m.indices)

which is not asymptotically faster than first converting to CSC (which is O(m.getnnz())) to get col_nonzeros, but is faster because of implementation details.

python arrays count numpy

The faster way is to clone your matrix with ones instead of real values. Then just sum up by rows or columns:

X_clone = X.tocsc()X_clone.data = np.ones( X_clone.data.shape )NumNonZeroElementsByColumn = X_clone.sum(0)NumNonZeroElementsByRow = X_clone.sum(1)

That worked 50 times faster for me than Finn Årup Nielsen's solution (1 second against 53)

edit:Perhaps you will need to translate NumNonZeroElementsByColumn into 1-dimensional array by

np.array(NumNonZeroElementsByColumn)[0]

CodeHunter

Counting non-zero elements within each row and within each column of a 2D NumPy array

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last