Is it possible to use pandas.DataFrame.rolling with a step greater than 1? Is it possible to use pandas.DataFrame.rolling with a step greater than 1? r r

Is it possible to use pandas.DataFrame.rolling with a step greater than 1?


You can using rolling again, just need a little bit work with you assign index

Here by = 2

by = 2df.loc[df.index[np.arange(len(df))%by==1],'New']=df.Price.rolling(window=4).mean()df    Price    New0      63    NaN1      92    NaN2      92    NaN3       5  63.004      90    NaN5       3  47.506      81    NaN7      98  68.008     100    NaN9      58  84.2510     38    NaN11     15  52.7512     75    NaN13     19  36.75


If the data size is not too large, here is an easy way:

by = 2win = 4start = 3 ## it is the index of your 1st valid value.df.rolling(win).mean()[start::by] ## calculate all, choose what you need.


So, I know it is a long time since the question was asked, by I bumped into this same problem and when dealing with long time series you really would want to avoid the unnecessary calculation of the values you are not interested at. Since Pandas rolling method does not implement a step argument, I wrote a workaround using numpy.

It is basically a combination of the solution in this link and the indexing proposed by BENY.

def apply_rolling_data(data, col, function, window, step=1, labels=None):    """Perform a rolling window analysis at the column `col` from `data`    Given a dataframe `data` with time series, call `function` at    sections of length `window` at the data of column `col`. Append    the results to `data` at a new columns with name `label`.    Parameters    ----------    data : DataFrame        Data to be analyzed, the dataframe must stores time series        columnwise, i.e., each column represent a time series and each        row a time index    col : str        Name of the column from `data` to be analyzed    function : callable        Function to be called to calculate the rolling window        analysis, the function must receive as input an array or        pandas series. Its output must be either a number or a pandas        series    window : int        length of the window to perform the analysis    step : int        step to take between two consecutive windows    labels : str        Name of the column for the output, if None it defaults to        'MEASURE'. It is only used if `function` outputs a number, if        it outputs a Series then each index of the series is going to        be used as the names of their respective columns in the output    Returns    -------    data : DataFrame        Input dataframe with added columns with the result of the        analysis performed    """    x = _strided_app(data[col].to_numpy(), window, step)    rolled = np.apply_along_axis(function, 1, x)    if labels is None:        labels = [f"metric_{i}" for i in range(rolled.shape[1])]    for col in labels:        data[col] = np.nan    data.loc[        data.index[            [False]*(window-1)            + list(np.arange(len(data) - (window-1)) % step == 0)],        labels] = rolled    return datadef _strided_app(a, L, S):  # Window len = L, Stride len/stepsize = S    """returns an array that is strided    """    nrows = ((a.size-L)//S)+1    n = a.strides[0]    return np.lib.stride_tricks.as_strided(        a, shape=(nrows, L), strides=(S*n, n))