Pandas: rolling mean by time interval
In the meantime, a time-window capability was added. See this link.
In [1]: df = DataFrame({'B': range(5)})In [2]: df.index = [Timestamp('20130101 09:00:00'), ...: Timestamp('20130101 09:00:02'), ...: Timestamp('20130101 09:00:03'), ...: Timestamp('20130101 09:00:05'), ...: Timestamp('20130101 09:00:06')]In [3]: dfOut[3]: B2013-01-01 09:00:00 02013-01-01 09:00:02 12013-01-01 09:00:03 22013-01-01 09:00:05 32013-01-01 09:00:06 4In [4]: df.rolling(2, min_periods=1).sum()Out[4]: B2013-01-01 09:00:00 0.02013-01-01 09:00:02 1.02013-01-01 09:00:03 3.02013-01-01 09:00:05 5.02013-01-01 09:00:06 7.0In [5]: df.rolling('2s', min_periods=1).sum()Out[5]: B2013-01-01 09:00:00 0.02013-01-01 09:00:02 1.02013-01-01 09:00:03 3.02013-01-01 09:00:05 3.02013-01-01 09:00:06 7.0
What about something like this:
First resample the data frame into 1D intervals. This takes the mean of the values for all duplicate days. Use the fill_method
option to fill in missing date values. Next, pass the resampled frame into pd.rolling_mean
with a window of 3 and min_periods=1 :
pd.rolling_mean(df.resample("1D", fill_method="ffill"), window=3, min_periods=1) favorable unfavorable otherenddate2012-10-25 0.495000 0.485000 0.0250002012-10-26 0.527500 0.442500 0.0325002012-10-27 0.521667 0.451667 0.0283332012-10-28 0.515833 0.450000 0.0358332012-10-29 0.488333 0.476667 0.0383332012-10-30 0.495000 0.470000 0.0383332012-10-31 0.512500 0.460000 0.0291672012-11-01 0.516667 0.456667 0.0266672012-11-02 0.503333 0.463333 0.0333332012-11-03 0.490000 0.463333 0.0466672012-11-04 0.494000 0.456000 0.0433332012-11-05 0.500667 0.452667 0.0366672012-11-06 0.507333 0.456000 0.0233332012-11-07 0.510000 0.443333 0.013333
UPDATE: As Ben points out in the comments, with pandas 0.18.0 the syntax has changed. With the new syntax this would be:
df.resample("1d").sum().fillna(0).rolling(window=3, min_periods=1).mean()
I just had the same question but with irregularly spaced datapoints. Resample is not really an option here. So I created my own function. Maybe it will be useful for others too:
from pandas import Series, DataFrameimport pandas as pdfrom datetime import datetime, timedeltaimport numpy as npdef rolling_mean(data, window, min_periods=1, center=False): ''' Function that computes a rolling mean Parameters ---------- data : DataFrame or Series If a DataFrame is passed, the rolling_mean is computed for all columns. window : int or string If int is passed, window is the number of observations used for calculating the statistic, as defined by the function pd.rolling_mean() If a string is passed, it must be a frequency string, e.g. '90S'. This is internally converted into a DateOffset object, representing the window size. min_periods : int Minimum number of observations in window required to have a value. Returns ------- Series or DataFrame, if more than one column ''' def f(x): '''Function to apply that actually computes the rolling mean''' if center == False: dslice = col[x-pd.datetools.to_offset(window).delta+timedelta(0,0,1):x] # adding a microsecond because when slicing with labels start and endpoint # are inclusive else: dslice = col[x-pd.datetools.to_offset(window).delta/2+timedelta(0,0,1): x+pd.datetools.to_offset(window).delta/2] if dslice.size < min_periods: return np.nan else: return dslice.mean() data = DataFrame(data.copy()) dfout = DataFrame() if isinstance(window, int): dfout = pd.rolling_mean(data, window, min_periods=min_periods, center=center) elif isinstance(window, basestring): idx = Series(data.index.to_pydatetime(), index=data.index) for colname, col in data.iterkv(): result = idx.apply(f) result.name = colname dfout = dfout.join(result, how='outer') if dfout.columns.size == 1: dfout = dfout.ix[:,0] return dfout# Exampleidx = [datetime(2011, 2, 7, 0, 0), datetime(2011, 2, 7, 0, 1), datetime(2011, 2, 7, 0, 1, 30), datetime(2011, 2, 7, 0, 2), datetime(2011, 2, 7, 0, 4), datetime(2011, 2, 7, 0, 5), datetime(2011, 2, 7, 0, 5, 10), datetime(2011, 2, 7, 0, 6), datetime(2011, 2, 7, 0, 8), datetime(2011, 2, 7, 0, 9)]idx = pd.Index(idx)vals = np.arange(len(idx)).astype(float)s = Series(vals, index=idx)rm = rolling_mean(s, window='2min')