dataframe representation of a rolling window dataframe representation of a rolling window numpy numpy

dataframe representation of a rolling window


We could use NumPy to get views into those sliding windows with its esoteric strided tricks. If you are using this new dimension for some reduction like matrix-multiplication, this would be ideal. If for some reason, you want to have a 2D output, we need to use a reshape at the end, which will result in creating a copy though.

Thus, the implementation would look something like this -

from numpy.lib.stride_tricks import as_strided as strideddef get_sliding_window(df, W, return2D=0):    a = df.values                     s0,s1 = a.strides    m,n = a.shape    out = strided(a,shape=(m-W+1,W,n),strides=(s0,s0,s1))    if return2D==1:        return out.reshape(a.shape[0]-W+1,-1)    else:        return out

Sample run for 2D/3D output -

In [68]: dfOut[68]:       A     B0  0.44  0.411  0.46  0.472  0.46  0.023  0.85  0.824  0.78  0.76In [70]: get_sliding_window(df, 3,return2D=1)Out[70]: array([[ 0.44,  0.41,  0.46,  0.47,  0.46,  0.02],       [ 0.46,  0.47,  0.46,  0.02,  0.85,  0.82],       [ 0.46,  0.02,  0.85,  0.82,  0.78,  0.76]])

Here's how the 3D views output would look like -

In [69]: get_sliding_window(df, 3,return2D=0)Out[69]: array([[[ 0.44,  0.41],        [ 0.46,  0.47],        [ 0.46,  0.02]],       [[ 0.46,  0.47],        [ 0.46,  0.02],        [ 0.85,  0.82]],       [[ 0.46,  0.02],        [ 0.85,  0.82],        [ 0.78,  0.76]]])

Let's time it for views 3D output for various window sizes -

In [331]: df = pd.DataFrame(np.random.rand(1000, 3).round(2))In [332]: %timeit get_3d_shfted_array(df,2) # @Yakym Pirozhenko's soln10000 loops, best of 3: 47.9 µs per loopIn [333]: %timeit get_sliding_window(df,2)10000 loops, best of 3: 39.2 µs per loopIn [334]: %timeit get_3d_shfted_array(df,5) # @Yakym Pirozhenko's soln10000 loops, best of 3: 89.9 µs per loopIn [335]: %timeit get_sliding_window(df,5)10000 loops, best of 3: 39.4 µs per loopIn [336]: %timeit get_3d_shfted_array(df,15) # @Yakym Pirozhenko's soln1000 loops, best of 3: 258 µs per loopIn [337]: %timeit get_sliding_window(df,15)10000 loops, best of 3: 38.8 µs per loop

Let's verify that we are indeed getting views -

In [338]: np.may_share_memory(get_sliding_window(df,2), df.values)Out[338]: True

The almost constant timings with get_sliding_window even across various window sizes suggest the huge benefit of getting the view instead of copying.


Disclaimers:

First, I would not call the method you provide clunky. It is readable and you can easily generalize with a list comprehension to any window size. At the same time, this is somewhat of an open ended question that may have many solutions, including your own.

/Disclaimers

Here is one other method that I think qualifies under your description:

Use np.dstack on df.values. One benefit over existing approach is construction speed.

import pandas as pdimport numpy as npfrom io import StringIOdf = pd.read_csv(StringIO('''      A     B     Ca  0.44  0.41  0.46b  0.47  0.46  0.02c  0.85  0.82  0.78d  0.76  0.93  0.83e  0.88  0.93  0.72f  0.12  0.15  0.20g  0.44  0.10  0.28h  0.61  0.09  0.84i  0.74  0.87  0.69j  0.38  0.23  0.44'''), sep=r' +')window = 2def get_3d_shfted_array(df, window=window):    rows = df.values    res  = np.dstack((rows[i:i-window] for i in range(window)))    return res# 100000 loops, best of 3: 15.5 µs per loopres  = get_3d_shfted_array(df)zero = res[...,0]one  = res[...,1]# current methoddef get_multiindexed_array(df, window=window):    return pd.concat([df, df.shift(-1)], axis=1, keys=[0, 1]).dropna()# 1000 loops, best of 3: 928 µs per loop