Numpy sum running length of non-zero values

python arrays performance numpy vectorization

This post lists a vectorized approach which basically consists of two steps:

Initialize a zeros vector of the same size as input vector, x and set ones at places corresponding to non-zeros of x.
Next up, in that vector, we need to put minus of runlengths of each island right after the ending/stop positions for each "island". The intention is to use cumsum again later on, which would result in sequential numbers for the "islands" and zeros elsewhere.

Here's the implementation -

import numpy as np#Append zeros at the start and end of input array, xxa = np.hstack([[0],x,[0]])# Get an array of ones and zeros, with ones for nonzeros of x and zeros elsewherexa1 =(xa!=0)+0# Find consecutive differences on xa1xadf = np.diff(xa1)# Find start and stop+1 indices and thus the lengths of "islands" of non-zerosstarts = np.where(xadf==1)[0]stops_p1 = np.where(xadf==-1)[0]lens = stops_p1 - starts# Mark indices where "minus ones" are to be put for applying cumsumput_m1 = stops_p1[[stops_p1 < x.size]]# Setup vector with ones for nonzero x's, "minus lens" at stops +1 & zeros elsewherevec = xa1[1:-1] # Note: this will change xa1, but it's okay as not needed anymorevec[put_m1] = -lens[0:put_m1.size]# Perform cumsum to get the desired outputout = vec.cumsum()

Sample run -

In [116]: xOut[116]: array([ 0. ,  2.3,  1.2,  4.1,  0. ,  0. ,  5.3,  0. ,  1.2,  3.1,  0. ])In [117]: outOut[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)

Runtime tests -

Here's some runtimes tests comparing the proposed approach against the other itertools.groupby based approach -

In [21]: N = 1000000    ...: x = np.random.rand(1,N)    ...: x[x>0.5] = 0.0    ...: x = x.ravel()    ...: In [19]: %timeit sumrunlen_vectorized(x)10 loops, best of 3: 19.9 ms per loopIn [20]: %timeit sumrunlen_loopy(x)1 loops, best of 3: 2.86 s per loop

python arrays performance numpy vectorization

You can use itertools.groupby and np.hstack :

>>> import numpy as np>>> x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])>>> from itertools import groupby>>> np.hstack([[i if j!=0 else j for i,j in enumerate(g,1)] for _,g in groupby(x,key=lambda x: x!=0)])array([ 1.,  2.,  3.,  0.,  0.,  1.,  0.,  1.,  2.])

We can group the array elements based on non-zero elements then use a list comprehension and enumerate to replace the non-zero sub-arrays with those index then flatten the list with np.hstack.

python arrays performance numpy vectorization

This sub-problem came up in Kick Start 2021 Round A for me. My solution:

def current_run_len(a):    a_ = np.hstack([0, a != 0, 0])  # first in starts and last in stops defined    d = np.diff(a_)    starts = np.where(d == 1)[0]    stops = np.where(d == -1)[0]    a_[stops + 1] = -(stops - starts)  # +1 for behind-last    return a_[1:-1].cumsum()

In fact, the problem also required a version where you count down consecutive sequences. Thus here another version with an optional keyword argument which does the same for rev=False:

def current_run_len(a, rev=False):    a_ = np.hstack([0, a != 0, 0])  # first in starts and last in stops defined    d = np.diff(a_)    starts = np.where(d == 1)[0]    stops = np.where(d == -1)[0]    if rev:        a_[starts] = -(stops - starts)        cs = -a_.cumsum()[:-2]    else:        a_[stops + 1] = -(stops - starts)  # +1 for behind-last        cs = a_.cumsum()[1:-1]    return cs

Results:

a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1])print('a                             = ', a)print('current_run_len(a)            = ', current_run_len(a))print('current_run_len(a, rev=True)  = ', current_run_len(a, rev=True))

a                             =  [1 1 1 1 0 0 0 1 1 0 1 0 0 0 1]current_run_len(a)            =  [1 2 3 4 0 0 0 1 2 0 1 0 0 0 1]current_run_len(a, rev=True)  =  [4 3 2 1 0 0 0 2 1 0 1 0 0 0 1]

For an array that consists of 0s and 1s only, you can simplify [0, a != 0, 0] to [0, a, 0]. But the version as-posted also works for arbitrary non-zero numbers.

CodeHunter

Numpy sum running length of non-zero values

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last