Sliding window of M-by-N shape numpy.ndarray

You can do a vectorized sliding window in numpy using fancy indexing.

>>> import numpy as np>>> a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])>>> aarray([[ 0,  1],       [10, 11],       [20, 21],                      #define our 2d numpy array       [30, 31],       [40, 41],       [50, 51]])>>> a = a.flatten()>>> aarray([ 0,  1, 10, 11, 20, 21, 30, 31, 40, 41, 50, 51])    #flattened numpy array>>> indexer = np.arange(6)[None, :] + 2*np.arange(4)[:, None]>>> indexerarray([[ 0,  1,  2,  3,  4,  5],       [ 2,  3,  4,  5,  6,  7],            #sliding window indices       [ 4,  5,  6,  7,  8,  9],       [ 6,  7,  8,  9, 10, 11]])>>> a[indexer]array([[ 0,  1, 10, 11, 20, 21],       [10, 11, 20, 21, 30, 31],            #values of a over sliding window       [20, 21, 30, 31, 40, 41],       [30, 31, 40, 41, 50, 51]])>>> np.sum(a[indexer], axis=1)array([ 63, 123, 183, 243])         #sum of values in 'a' under the sliding window.

Explanation for what this code is doing.

The np.arange(6)[None, :] creates a row vector 0 through 6, and np.arange(4)[:, None] creates a column vector 0 through 4. This results in a 4x6 matrix where each row (six of them) represents a window, and the number of rows (four of them) represents the number of windows. The multiple of 2 makes the sliding window slide 2 units at a time which is necessary for sliding over each tuple. Using numpy array slicing you can pass the sliding window into the flattened numpy array and do aggregates on them like sum.

python numpy time-series sliding-window

In [1]: import numpy as npIn [2]: a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])In [3]: w = np.hstack((a[:-2],a[1:-1],a[2:]))In [4]: wOut[4]: array([[ 0,  1, 10, 11, 20, 21],       [10, 11, 20, 21, 30, 31],       [20, 21, 30, 31, 40, 41],       [30, 31, 40, 41, 50, 51]])

You could write this in as a function as so:

def window_stack(a, stepsize=1, width=3):    n = a.shape[0]    return np.hstack( a[i:1+n+i-width:stepsize] for i in range(0,width) )

This doesn't really depend on the shape of the original array, as long as a.ndim = 2. Note that I never use either lengths in the interactive version. The second dimension of the shape is irrelevant; each row can be as long as you want. Thanks to @Jaime's suggestion, you can do it without checking the shape at all:

def window_stack(a, stepsize=1, width=3):    return np.hstack( a[i:1+i-width or None:stepsize] for i in range(0,width) )

python numpy time-series sliding-window

One solution is

np.lib.stride_tricks.as_strided(a, shape=(4,6), strides=(8,4)).

Using strides is intuitive when you start thinking in terms of pointers/addresses.

The as_strided() method has 3 arguments.

data
shape
strides

data is the array on which we would operate.

To use as_strided() for implementing sliding window functions, we must compute the shape of the output beforehand. In the question, (4,6) is the shape of output. If the dimensions are not correct, we end up reading garbage values. This is because we are accessing data by moving the pointer by a couple of bytes (depending on data type).

Determining the correct value of strides is essential to get expected results.Before calculating strides, find out the memory occupied by each element using arr.strides[-1]. In this example, the memory occupied by one element is 4 bytes.Numpy arrays are created in row major fashion. The first element of the next row is right next to the last element of the current row.

Ex:

0 , 1 | 10, 11 | ...

10 is right next to 1.

Imagine the 2D array reshaped to 1D (This is acceptable as the data is stored in a row-major format). The first element of each row in the output is the odd indexed element in the 1D array.

0, 10, 20, 30, ..

Therefore, the number of steps in memory we need to take to move from 0 to 10, 10 to 20, and so on is 2 * mem size of element. Each row has a stride of 2 * 4bytes = 8.For a given row in the output, all the elements are adjacent to each other in our imaginary 1D array. To get the next element in a row, just take one stride equal to the size of an element. The value of column stride is 4 bytes.

Therefore, strides=(8,4)

An alternate explanation:The output has a shape of (4,6). Column stride 4. So, the first row elements start at index 0 and have 6 elements each spaced 4 bytes apart.After the first row is collected, the second row starts 8 bytes away from the starting of the current row. The third row starts 8 bytes away from the starting point of the second row and so on.

The shape determines the number of rows and columns we need. strides define the memory steps to start a row and collect a column element

CodeHunter

Sliding window of M-by-N shape numpy.ndarray

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last