Python/NumPy: implementing a running sum (but not quite) Python/NumPy: implementing a running sum (but not quite) numpy numpy

Python/NumPy: implementing a running sum (but not quite)


You can start with a convolution, choose the values that exceed 1, and finally use a "dilation":

b = numpy.convolve(a, [1, 1, 1], mode="same") > 1b = b | numpy.r_[0, b[:-1]] | numpy.r_[b[1:], 0]

Since this avoids the Python loop, it should be faster than your approach, but I didn't do timings.

An alternative is to use a second convolution to dilate:

kernel = [1, 1, 1]b = numpy.convolve(a, kernel, mode="same") > 1b = numpy.convolve(b, kernel, mode="same") > 0

If you have SciPy available, yet another option for the dilation is

b = numpy.convolve(a, [1, 1, 1], mode="same") > 1b = scipy.ndimage.morphology.binary_dilation(b)

Edit: By doing some timings, I found that this solution seems to be fastest for large arrays:

b = numpy.convolve(a, kernel) > 1b[:-1] |= b[1:]  # Shift and "smearing" to the *left* (smearing with b[1:] |= b[:-1] does not work)b[:-1] |= b[1:]  # … and again!b = b[:-2]

For an array of one million entries, it was more than 200 times faster than your original approach on my machine. As pointed out by EOL in the comments, this solution might be considered a bit fragile, though, since it depends on implementation details of NumPy.


You can calculate the "convolution" sums in an efficient way with:

>>> a0 = a[:-2]>>> a1 = a[1:-1]>>> a2 = a[2:]>>> a_large_sum = a0 + a1 + a2 > 1

Updating b can then be done efficiently by writing something that means "at least one of the three neighboring a_large_sum values is True": you first extend you a_large_sum array back to the same number of elements as a (to the right, to the left and to the right, and then to the left):

>>> a_large_sum_0 = np.hstack([a_large_sum, [False, False]])>>> a_large_sum_1 = np.hstack([[False], a_large_sum, [False]])>>> a_large_sum_2 = np.hstack([[False, False], a_large_sum])

You then obtain b in an efficient way:

>>> b = a_large_sum_0 | a_large_sum_1 | a_large_sum_2

This gives the result that you obtain, but in a very efficient way, through a leveraging of NumPy internal fast loops.

PS: This approach is essentially the same as Sven's first solution, but is way more pedestrian than Sven's elegant code; it is as fast, however. Sven's second solution (double convolve()) is even more elegant, and it is twice as fast.


You might also like to have a look at NumPy's stride_tricks. Using Sven's timing setup (see link in Sven's answer), I found that for (very) large arrays, this is also a fast way to do what you want (i.e. with your definition of a):

shape = (len(a)-2,3)strides = a.strides+a.stridesa_strided = numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)b = np.r_[numpy.sum(a_strided, axis=-1) > 1, False, False]b[2:] |= b[1:-1] | b[:-2]

After edit (see comments below) it is no longer the fastest way.

This creates a specially strided view on your original array. The data in a is not copied, but is simply viewed in a new way. We want to basically make a new array in which the last index contains the sub-arrays that we want to sum (i.e. the three elements that you want to sum). This way, we can easily sum in the end with the last command.

The last element of this new shape therefore has to be 3, and the first element will be the length of the old a minus 2 (because we can only sum up to the -2nd element).

The strides list contains the strides, in bytes, that the new array a_strided needs to make to get to the next element in each of the dimensions of the shape. If you set these equal, it means that a_strided[0,1] and a_strided[1,0] will both be a[1], which is exactly what we want. In a normal array this would not be the case (the first stride would be "size-of-first-dimension times length-of-array-first-dimension (= shape[0])"), but in this case we can make good use of it.

Not sure if I explained this all really well, but just print out a_strided and you'll see what the result is and how easy this makes the operation.