Python regularise irregular time series with linear interpolation Python regularise irregular time series with linear interpolation pandas pandas

Python regularise irregular time series with linear interpolation


You can do this with traces. First, create a TimeSeries with your irregular measurements like you would a dictionary:

ts = traces.TimeSeries([    (datetime(1992, 8, 27, 7, 46, 48), 28.0),    (datetime(1992, 8, 27, 8, 0, 48), 28.2),    ...    (datetime(1992, 8, 27, 9, 3, 48), 30.0),])

Then regularize using the sample method:

ts.sample(    sampling_period=timedelta(minutes=15),    start=datetime(1992, 8, 27, 8),    end=datetime(1992, 8, 27, 9),    interpolate='linear',)

This results in the following regularized version, where the gray dots are the original data and the orange is the regularized version with linear interpolation.

time series with linear interpolation

The interpolated values are:

1992-08-27 08:00:00    28.189 1992-08-27 08:15:00    28.286  1992-08-27 08:30:00    28.3771992-08-27 08:45:00    28.8481992-08-27 09:00:00    29.891


The same result that @mstringer gets can be achieved purely in pandas. The trick is to first resample by second, using interpolation to fill in the intermediate values (.resample('s').interpolate()), and then upsample in 15-minute periods (.resample('15T').asfreq()).

import ioimport pandas as pddata = io.StringIO('''\Values1992-08-27 07:46:48,28.0  1992-08-27 08:00:48,28.2  1992-08-27 08:33:48,28.4  1992-08-27 08:43:48,28.8  1992-08-27 08:48:48,29.0  1992-08-27 08:51:48,29.2  1992-08-27 08:53:48,29.6  1992-08-27 08:56:48,29.8  1992-08-27 09:03:48,30.0''')s = pd.read_csv(data, squeeze=True)s.index = pd.to_datetime(s.index)res = s.resample('s').interpolate().resample('15T').asfreq().dropna()print(res)

Output:

1992-08-27 08:00:00    28.1885711992-08-27 08:15:00    28.2860611992-08-27 08:30:00    28.3769701992-08-27 08:45:00    28.8480001992-08-27 09:00:00    29.891429Freq: 15T, Name: Values, dtype: float64


It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.

# empty frame with desired indexrs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)# array of indexes corresponding with closest timestamp after resampleidx_after = np.searchsorted(df.index.values, rs.index.values)# values and timestamp before/after resamplers['after'] = df.loc[df.index[idx_after], 'Values'].valuesrs['before'] = df.loc[df.index[idx_after - 1], 'Values'].valuesrs['after_time'] = df.index[idx_after]rs['before_time'] = df.index[idx_after - 1]#calculate new weighted valuers['span'] = (rs['after_time'] - rs['before_time'])rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']# I got errors here unless I turn the index to a seriesrs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']rs['Values'] = rs.eval('before * before_weight + after * after_weight')

After all that, hopefully the right answer:

In [161]: rs['Values']Out[161]: 1992-08-27 08:00:00    28.0114291992-08-27 08:15:00    28.3139391992-08-27 08:30:00    28.2230301992-08-27 08:45:00    28.9520001992-08-27 09:00:00    29.908571Freq: 15T, Name: Values, dtype: float64