find length of sequences of identical values in a numpy array (run length encoding) find length of sequences of identical values in a numpy array (run length encoding) python python

find length of sequences of identical values in a numpy array (run length encoding)


Fully numpy vectorized and generic RLE for any array (works with strings, booleans etc too).

Outputs tuple of run lengths, start positions, and values.

import numpy as npdef rle(inarray):        """ run length encoding. Partial credit to R rle function.             Multi datatype arrays catered for including non Numpy            returns: tuple (runlengths, startpositions, values) """        ia = np.asarray(inarray)                # force numpy        n = len(ia)        if n == 0:             return (None, None, None)        else:            y = ia[1:] != ia[:-1]               # pairwise unequal (string safe)            i = np.append(np.where(y), n - 1)   # must include last element posi            z = np.diff(np.append(-1, i))       # run lengths            p = np.cumsum(np.append(0, z))[:-1] # positions            return(z, p, ia[i])

Pretty fast (i7):

xx = np.random.randint(0, 5, 1000000)%timeit yy = rle(xx)100 loops, best of 3: 18.6 ms per loop

Multiple data types:

rle([True, True, True, False, True, False, False])Out[8]: (array([3, 1, 1, 2]), array([0, 3, 4, 5]), array([ True, False,  True, False], dtype=bool))rle(np.array([5, 4, 4, 4, 4, 0, 0]))Out[9]: (array([1, 4, 2]), array([0, 1, 5]), array([5, 4, 0]))rle(["hello", "hello", "my", "friend", "okay", "okay", "bye"])Out[10]: (array([2, 1, 1, 2, 1]), array([0, 2, 3, 4, 6]), array(['hello', 'my', 'friend', 'okay', 'bye'],        dtype='|S6'))

Same results as Alex Martelli above:

xx = np.random.randint(0, 2, 20)xxOut[60]: array([1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1])am = runs_of_ones_array(xx)tb = rle(xx)amOut[63]: array([4, 5, 2, 5])tb[0][tb[2] == 1]Out[64]: array([4, 5, 2, 5])%timeit runs_of_ones_array(xx)10000 loops, best of 3: 28.5 µs per loop%timeit rle(xx)10000 loops, best of 3: 38.2 µs per loop

Slightly slower than Alex (but still very fast), and much more flexible.


While not numpy primitives, itertools functions are often very fast, so do give this one a try (and measure times for various solutions including this one, of course):

def runs_of_ones(bits):  for bit, group in itertools.groupby(bits):    if bit: yield sum(group)

If you do need the values in a list, just can use list(runs_of_ones(bits)), of course; but maybe a list comprehension might be marginally faster still:

def runs_of_ones_list(bits):  return [sum(g) for b, g in itertools.groupby(bits) if b]

Moving to "numpy-native" possibilities, what about:

def runs_of_ones_array(bits):  # make sure all runs of ones are well-bounded  bounded = numpy.hstack(([0], bits, [0]))  # get 1 at run starts and -1 at run ends  difs = numpy.diff(bounded)  run_starts, = numpy.where(difs > 0)  run_ends, = numpy.where(difs < 0)  return run_ends - run_starts

Again: be sure to benchmark solutions against each others in realistic-for-you examples!


Here is a solution using only arrays: it takes an array containing a sequence of bools and counts the length of the transitions.

>>> from numpy import array, arange>>> b = array([0,0,0,1,1,1,0,0,0,1,1,1,1,0,0], dtype=bool)>>> sw = (b[:-1] ^ b[1:]); print sw[False False  True False False  True False False  True False False False  True False]>>> isw = arange(len(sw))[sw]; print isw[ 2  5  8 12]>>> lens = isw[1::2] - isw[::2]; print lens[3 4]

sw contains a true where there is a switch, isw converts them in indexes. The items of isw are then subtracted pairwise in lens.

Notice that if the sequence started with an 1 it would count the length of the 0s sequences: this can be fixed in the indexing to compute lens. Also, I have not tested corner cases such sequences of length 1.


Full function that returns start positions and lengths of all True-subarrays.

import numpy as npdef count_adjacent_true(arr):    assert len(arr.shape) == 1    assert arr.dtype == np.bool    if arr.size == 0:        return np.empty(0, dtype=int), np.empty(0, dtype=int)    sw = np.insert(arr[1:] ^ arr[:-1], [0, arr.shape[0]-1], values=True)    swi = np.arange(sw.shape[0])[sw]    offset = 0 if arr[0] else 1    lengths = swi[offset+1::2] - swi[offset:-1:2]    return swi[offset:-1:2], lengths

Tested for different bool 1D-arrays (empty array; single/multiple elements; even/odd lengths; started with True/False; with only True/False elements).