Python regularise irregular time series with linear interpolation

python pandas time-series linear-interpolation

You can do this with traces. First, create a TimeSeries with your irregular measurements like you would a dictionary:

ts = traces.TimeSeries([    (datetime(1992, 8, 27, 7, 46, 48), 28.0),    (datetime(1992, 8, 27, 8, 0, 48), 28.2),    ...    (datetime(1992, 8, 27, 9, 3, 48), 30.0),])

Then regularize using the sample method:

ts.sample(    sampling_period=timedelta(minutes=15),    start=datetime(1992, 8, 27, 8),    end=datetime(1992, 8, 27, 9),    interpolate='linear',)

This results in the following regularized version, where the gray dots are the original data and the orange is the regularized version with linear interpolation.

The interpolated values are:

1992-08-27 08:00:00    28.189 1992-08-27 08:15:00    28.286  1992-08-27 08:30:00    28.3771992-08-27 08:45:00    28.8481992-08-27 09:00:00    29.891

python pandas time-series linear-interpolation

The same result that @mstringer gets can be achieved purely in pandas. The trick is to first resample by second, using interpolation to fill in the intermediate values (.resample('s').interpolate()), and then upsample in 15-minute periods (.resample('15T').asfreq()).

import ioimport pandas as pddata = io.StringIO('''\Values1992-08-27 07:46:48,28.0  1992-08-27 08:00:48,28.2  1992-08-27 08:33:48,28.4  1992-08-27 08:43:48,28.8  1992-08-27 08:48:48,29.0  1992-08-27 08:51:48,29.2  1992-08-27 08:53:48,29.6  1992-08-27 08:56:48,29.8  1992-08-27 09:03:48,30.0''')s = pd.read_csv(data, squeeze=True)s.index = pd.to_datetime(s.index)res = s.resample('s').interpolate().resample('15T').asfreq().dropna()print(res)

Output:

1992-08-27 08:00:00    28.1885711992-08-27 08:15:00    28.2860611992-08-27 08:30:00    28.3769701992-08-27 08:45:00    28.8480001992-08-27 09:00:00    29.891429Freq: 15T, Name: Values, dtype: float64

python pandas time-series linear-interpolation

It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.

# empty frame with desired indexrs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)# array of indexes corresponding with closest timestamp after resampleidx_after = np.searchsorted(df.index.values, rs.index.values)# values and timestamp before/after resamplers['after'] = df.loc[df.index[idx_after], 'Values'].valuesrs['before'] = df.loc[df.index[idx_after - 1], 'Values'].valuesrs['after_time'] = df.index[idx_after]rs['before_time'] = df.index[idx_after - 1]#calculate new weighted valuers['span'] = (rs['after_time'] - rs['before_time'])rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']# I got errors here unless I turn the index to a seriesrs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']rs['Values'] = rs.eval('before * before_weight + after * after_weight')

After all that, hopefully the right answer:

In [161]: rs['Values']Out[161]: 1992-08-27 08:00:00    28.0114291992-08-27 08:15:00    28.3139391992-08-27 08:30:00    28.2230301992-08-27 08:45:00    28.9520001992-08-27 09:00:00    29.908571Freq: 15T, Name: Values, dtype: float64

CodeHunter

Python regularise irregular time series with linear interpolation

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last