Find large number of consecutive values fulfilling condition in a numpy array Find large number of consecutive values fulfilling condition in a numpy array numpy numpy

Find large number of consecutive values fulfilling condition in a numpy array


Here's a numpy-based solution.

I think (?) it should be faster than the other options. Hopefully it's fairly clear.

However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...

import numpy as npdef main():    # Generate some random data    x = np.cumsum(np.random.random(1000) - 0.5)    condition = np.abs(x) < 1        # Print the start and stop indices of each region where the absolute     # values of x are below 1, and the min and max of each of these regions    for start, stop in contiguous_regions(condition):        segment = x[start:stop]        print start, stop        print segment.min(), segment.max()def contiguous_regions(condition):    """Finds contiguous True regions of the boolean array "condition". Returns    a 2D array where the first column is the start index of the region and the    second column is the end index."""    # Find the indicies of changes in "condition"    d = np.diff(condition)    idx, = d.nonzero()     # We need to start things after the change in "condition". Therefore,     # we'll shift the index by 1 to the right.    idx += 1    if condition[0]:        # If the start of condition is True prepend a 0        idx = np.r_[0, idx]    if condition[-1]:        # If the end of condition is True, append the length of the array        idx = np.r_[idx, condition.size] # Edit    # Reshape the result into two columns    idx.shape = (-1,2)    return idxmain()


There is a very convenient solution to this using scipy.ndimage. For an array:

a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0])

which can be the result of a condition applied to another array, finding the contiguous regions is as simple as:

regions = scipy.ndimage.find_objects(scipy.ndimage.label(a)[0])

Then, applying any function to those regions can be done e.g. like:

[np.sum(a[r]) for r in regions]


Slightly sloppy, but simple and fast-ish, if you don't mind using scipy:

from scipy.ndimage import gaussian_filtersigma = 3threshold = 1above_threshold = gaussian_filter(data, sigma=sigma) > threshold

The idea is that quiet portions of the data will smooth down to low amplitude, and loud regions won't. Tune 'sigma' to affect how long a 'quiet' region must be; tune 'threshold' to affect how quiet it must be. This slows down for large sigma, at which point using FFT-based smoothing might be faster.

This has the added benefit that single 'hot pixels' won't disrupt your silence-finding, so you're a little less sensitive to certain types of noise.