Python regularise irregular time series with linear interpolation
You can do this with traces. First, create a TimeSeries
with your irregular measurements like you would a dictionary:
ts = traces.TimeSeries([ (datetime(1992, 8, 27, 7, 46, 48), 28.0), (datetime(1992, 8, 27, 8, 0, 48), 28.2), ... (datetime(1992, 8, 27, 9, 3, 48), 30.0),])
Then regularize using the sample
method:
ts.sample( sampling_period=timedelta(minutes=15), start=datetime(1992, 8, 27, 8), end=datetime(1992, 8, 27, 9), interpolate='linear',)
This results in the following regularized version, where the gray dots are the original data and the orange is the regularized version with linear interpolation.
The interpolated values are:
1992-08-27 08:00:00 28.189 1992-08-27 08:15:00 28.286 1992-08-27 08:30:00 28.3771992-08-27 08:45:00 28.8481992-08-27 09:00:00 29.891
The same result that @mstringer gets can be achieved purely in pandas. The trick is to first resample by second, using interpolation to fill in the intermediate values (.resample('s').interpolate()
), and then upsample in 15-minute periods (.resample('15T').asfreq()
).
import ioimport pandas as pddata = io.StringIO('''\Values1992-08-27 07:46:48,28.0 1992-08-27 08:00:48,28.2 1992-08-27 08:33:48,28.4 1992-08-27 08:43:48,28.8 1992-08-27 08:48:48,29.0 1992-08-27 08:51:48,29.2 1992-08-27 08:53:48,29.6 1992-08-27 08:56:48,29.8 1992-08-27 09:03:48,30.0''')s = pd.read_csv(data, squeeze=True)s.index = pd.to_datetime(s.index)res = s.resample('s').interpolate().resample('15T').asfreq().dropna()print(res)
Output:
1992-08-27 08:00:00 28.1885711992-08-27 08:15:00 28.2860611992-08-27 08:30:00 28.3769701992-08-27 08:45:00 28.8480001992-08-27 09:00:00 29.891429Freq: 15T, Name: Values, dtype: float64
It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted
is used to find dates closest to the resample point.
# empty frame with desired indexrs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)# array of indexes corresponding with closest timestamp after resampleidx_after = np.searchsorted(df.index.values, rs.index.values)# values and timestamp before/after resamplers['after'] = df.loc[df.index[idx_after], 'Values'].valuesrs['before'] = df.loc[df.index[idx_after - 1], 'Values'].valuesrs['after_time'] = df.index[idx_after]rs['before_time'] = df.index[idx_after - 1]#calculate new weighted valuers['span'] = (rs['after_time'] - rs['before_time'])rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']# I got errors here unless I turn the index to a seriesrs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']rs['Values'] = rs.eval('before * before_weight + after * after_weight')
After all that, hopefully the right answer:
In [161]: rs['Values']Out[161]: 1992-08-27 08:00:00 28.0114291992-08-27 08:15:00 28.3139391992-08-27 08:30:00 28.2230301992-08-27 08:45:00 28.9520001992-08-27 09:00:00 29.908571Freq: 15T, Name: Values, dtype: float64