Use center in pandas rolling when using a time-series Use center in pandas rolling when using a time-series pandas pandas

Use center in pandas rolling when using a time-series


Try the following (tested with pandas==0.23.3):

series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')

This will center your rolling sum in the 7-day window (by shifting -3.5 days), and will allow you to use a 'datetimelike' string for defining the window size. Note that shift() only takes an integer, thus defining with hours.

This will produce your desired output:

series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')['2014-01-01':].head(10)2014-01-01 12:00:00    4.02014-01-02 12:00:00    5.02014-01-03 12:00:00    6.02014-01-04 12:00:00    7.02014-01-05 12:00:00    7.02014-01-06 12:00:00    7.02014-01-07 12:00:00    7.02014-01-08 12:00:00    7.02014-01-09 12:00:00    7.02014-01-10 12:00:00    7.0Freq: D, dtype: float64

Note that the rolling sum is assigned to the center of the 7-day windows (using midnight to midnight timestamps), so the centered timestamp includes '12:00:00'.

Another option (as you show at the end of your question) is to resample the data to make sure it has even Datetime frequency, then use an integer for window size (window = 7) and center=True. However, you state that other parts of your code benefit from defining window with a 'datetimelike' string, so perhaps this option is not ideal.


You could try to resample your serie/dataframe in order to convert the offset window to a fixed width window.

# Parameters window_timedelta = '7D'resample_timedelta = '1D' # Convert offset to window sizewindow_size = pd.Timedelta(structure_duration) // pd.Timedelta(resample_timedelta)# Resample serieseries_res = series.resample(resample_timedelta, on='datetime').first() # Perform the sum on the resampled serieseries_res['window_sum'] = series_res.rolling(window_size, center=True, min_periods=1).sum()

Note: the first hack in the resampling only works if you know that you have at maximum 1 pt/day. If you have more, you can replace it by sum or whatever is relevant to your data.

Note 2: the introduced NaN for missing dates will not cause the sum value to be NaN, Pandas ignores them while summing


From pandas version 1.3 this is * directly possible with pandas.

* Or will be (the work is merged, but 1.3 is not yet released as of today; I tested the lines below against the pandas main branch).

import pandas as pdseries = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))series.rolling(7, min_periods=1, center=True).sum().head(10)

Output is as expected:

2014-01-01    4.02014-01-02    5.02014-01-03    6.02014-01-04    7.02014-01-05    7.02014-01-06    7.02014-01-07    7.02014-01-08    7.02014-01-09    7.02014-01-10    7.0Freq: D, dtype: float64