Pandas: rolling mean by time interval Pandas: rolling mean by time interval python python

Pandas: rolling mean by time interval


In the meantime, a time-window capability was added. See this link.

In [1]: df = DataFrame({'B': range(5)})In [2]: df.index = [Timestamp('20130101 09:00:00'),   ...:             Timestamp('20130101 09:00:02'),   ...:             Timestamp('20130101 09:00:03'),   ...:             Timestamp('20130101 09:00:05'),   ...:             Timestamp('20130101 09:00:06')]In [3]: dfOut[3]:                      B2013-01-01 09:00:00  02013-01-01 09:00:02  12013-01-01 09:00:03  22013-01-01 09:00:05  32013-01-01 09:00:06  4In [4]: df.rolling(2, min_periods=1).sum()Out[4]:                        B2013-01-01 09:00:00  0.02013-01-01 09:00:02  1.02013-01-01 09:00:03  3.02013-01-01 09:00:05  5.02013-01-01 09:00:06  7.0In [5]: df.rolling('2s', min_periods=1).sum()Out[5]:                        B2013-01-01 09:00:00  0.02013-01-01 09:00:02  1.02013-01-01 09:00:03  3.02013-01-01 09:00:05  3.02013-01-01 09:00:06  7.0


What about something like this:

First resample the data frame into 1D intervals. This takes the mean of the values for all duplicate days. Use the fill_method option to fill in missing date values. Next, pass the resampled frame into pd.rolling_mean with a window of 3 and min_periods=1 :

pd.rolling_mean(df.resample("1D", fill_method="ffill"), window=3, min_periods=1)            favorable  unfavorable     otherenddate2012-10-25   0.495000     0.485000  0.0250002012-10-26   0.527500     0.442500  0.0325002012-10-27   0.521667     0.451667  0.0283332012-10-28   0.515833     0.450000  0.0358332012-10-29   0.488333     0.476667  0.0383332012-10-30   0.495000     0.470000  0.0383332012-10-31   0.512500     0.460000  0.0291672012-11-01   0.516667     0.456667  0.0266672012-11-02   0.503333     0.463333  0.0333332012-11-03   0.490000     0.463333  0.0466672012-11-04   0.494000     0.456000  0.0433332012-11-05   0.500667     0.452667  0.0366672012-11-06   0.507333     0.456000  0.0233332012-11-07   0.510000     0.443333  0.013333

UPDATE: As Ben points out in the comments, with pandas 0.18.0 the syntax has changed. With the new syntax this would be:

df.resample("1d").sum().fillna(0).rolling(window=3, min_periods=1).mean()


I just had the same question but with irregularly spaced datapoints. Resample is not really an option here. So I created my own function. Maybe it will be useful for others too:

from pandas import Series, DataFrameimport pandas as pdfrom datetime import datetime, timedeltaimport numpy as npdef rolling_mean(data, window, min_periods=1, center=False):    ''' Function that computes a rolling mean    Parameters    ----------    data : DataFrame or Series           If a DataFrame is passed, the rolling_mean is computed for all columns.    window : int or string             If int is passed, window is the number of observations used for calculating              the statistic, as defined by the function pd.rolling_mean()             If a string is passed, it must be a frequency string, e.g. '90S'. This is             internally converted into a DateOffset object, representing the window size.    min_periods : int                  Minimum number of observations in window required to have a value.    Returns    -------    Series or DataFrame, if more than one column        '''    def f(x):        '''Function to apply that actually computes the rolling mean'''        if center == False:            dslice = col[x-pd.datetools.to_offset(window).delta+timedelta(0,0,1):x]                # adding a microsecond because when slicing with labels start and endpoint                # are inclusive        else:            dslice = col[x-pd.datetools.to_offset(window).delta/2+timedelta(0,0,1):                         x+pd.datetools.to_offset(window).delta/2]        if dslice.size < min_periods:            return np.nan        else:            return dslice.mean()    data = DataFrame(data.copy())    dfout = DataFrame()    if isinstance(window, int):        dfout = pd.rolling_mean(data, window, min_periods=min_periods, center=center)    elif isinstance(window, basestring):        idx = Series(data.index.to_pydatetime(), index=data.index)        for colname, col in data.iterkv():            result = idx.apply(f)            result.name = colname            dfout = dfout.join(result, how='outer')    if dfout.columns.size == 1:        dfout = dfout.ix[:,0]    return dfout# Exampleidx = [datetime(2011, 2, 7, 0, 0),       datetime(2011, 2, 7, 0, 1),       datetime(2011, 2, 7, 0, 1, 30),       datetime(2011, 2, 7, 0, 2),       datetime(2011, 2, 7, 0, 4),       datetime(2011, 2, 7, 0, 5),       datetime(2011, 2, 7, 0, 5, 10),       datetime(2011, 2, 7, 0, 6),       datetime(2011, 2, 7, 0, 8),       datetime(2011, 2, 7, 0, 9)]idx = pd.Index(idx)vals = np.arange(len(idx)).astype(float)s = Series(vals, index=idx)rm = rolling_mean(s, window='2min')