dataframe representation of a rolling window

python performance pandas numpy

We could use NumPy to get views into those sliding windows with its esoteric strided tricks. If you are using this new dimension for some reduction like matrix-multiplication, this would be ideal. If for some reason, you want to have a 2D output, we need to use a reshape at the end, which will result in creating a copy though.

Thus, the implementation would look something like this -

from numpy.lib.stride_tricks import as_strided as strideddef get_sliding_window(df, W, return2D=0):    a = df.values                     s0,s1 = a.strides    m,n = a.shape    out = strided(a,shape=(m-W+1,W,n),strides=(s0,s0,s1))    if return2D==1:        return out.reshape(a.shape[0]-W+1,-1)    else:        return out

Sample run for 2D/3D output -

In [68]: dfOut[68]:       A     B0  0.44  0.411  0.46  0.472  0.46  0.023  0.85  0.824  0.78  0.76In [70]: get_sliding_window(df, 3,return2D=1)Out[70]: array([[ 0.44,  0.41,  0.46,  0.47,  0.46,  0.02],       [ 0.46,  0.47,  0.46,  0.02,  0.85,  0.82],       [ 0.46,  0.02,  0.85,  0.82,  0.78,  0.76]])

Here's how the 3D views output would look like -

In [69]: get_sliding_window(df, 3,return2D=0)Out[69]: array([[[ 0.44,  0.41],        [ 0.46,  0.47],        [ 0.46,  0.02]],       [[ 0.46,  0.47],        [ 0.46,  0.02],        [ 0.85,  0.82]],       [[ 0.46,  0.02],        [ 0.85,  0.82],        [ 0.78,  0.76]]])

Let's time it for views 3D output for various window sizes -

In [331]: df = pd.DataFrame(np.random.rand(1000, 3).round(2))In [332]: %timeit get_3d_shfted_array(df,2) # @Yakym Pirozhenko's soln10000 loops, best of 3: 47.9 µs per loopIn [333]: %timeit get_sliding_window(df,2)10000 loops, best of 3: 39.2 µs per loopIn [334]: %timeit get_3d_shfted_array(df,5) # @Yakym Pirozhenko's soln10000 loops, best of 3: 89.9 µs per loopIn [335]: %timeit get_sliding_window(df,5)10000 loops, best of 3: 39.4 µs per loopIn [336]: %timeit get_3d_shfted_array(df,15) # @Yakym Pirozhenko's soln1000 loops, best of 3: 258 µs per loopIn [337]: %timeit get_sliding_window(df,15)10000 loops, best of 3: 38.8 µs per loop

Let's verify that we are indeed getting views -

In [338]: np.may_share_memory(get_sliding_window(df,2), df.values)Out[338]: True

The almost constant timings with get_sliding_window even across various window sizes suggest the huge benefit of getting the view instead of copying.

python performance pandas numpy

Disclaimers:

First, I would not call the method you provide clunky. It is readable and you can easily generalize with a list comprehension to any window size. At the same time, this is somewhat of an open ended question that may have many solutions, including your own.

/Disclaimers

Here is one other method that I think qualifies under your description:

Use np.dstack on df.values. One benefit over existing approach is construction speed.

import pandas as pdimport numpy as npfrom io import StringIOdf = pd.read_csv(StringIO('''      A     B     Ca  0.44  0.41  0.46b  0.47  0.46  0.02c  0.85  0.82  0.78d  0.76  0.93  0.83e  0.88  0.93  0.72f  0.12  0.15  0.20g  0.44  0.10  0.28h  0.61  0.09  0.84i  0.74  0.87  0.69j  0.38  0.23  0.44'''), sep=r' +')window = 2def get_3d_shfted_array(df, window=window):    rows = df.values    res  = np.dstack((rows[i:i-window] for i in range(window)))    return res# 100000 loops, best of 3: 15.5 µs per loopres  = get_3d_shfted_array(df)zero = res[...,0]one  = res[...,1]# current methoddef get_multiindexed_array(df, window=window):    return pd.concat([df, df.shift(-1)], axis=1, keys=[0, 1]).dropna()# 1000 loops, best of 3: 928 µs per loop

CodeHunter

dataframe representation of a rolling window

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last