Time series database with time-weighted-average aggregation function for irregular time series? Time series database with time-weighted-average aggregation function for irregular time series? database database

Time series database with time-weighted-average aggregation function for irregular time series?


OpenTSDB performs aggregation across all series in a query at the time(s) implied by the query. For any series not having a data value at a timestamp, it linearly interpolates a value from the values before and after. It does this "upsampling" at query time -- the original data is always stored as it was when it arrived. You can perform a trailing windowed time average, but not an exponentially weighted moving average (I believe that is what you meant by time-weighted?)

http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html

(I should add, that's not a blanket recommendation for OpenTSDB as the db you should use, I'm just responding to your question)


I recently had to provide a solution for weighted average over irregular samples for our own SCADA/IoT product, with data stored in PostgreSQL. If you'd like to roll your own, here's how you can do it.

Let's assume the following table:

create table samples (  stamp  timestamptz,  series integer,  value  float);insert into samples values  ('2018-04-30 23:00:00+02', 1, 12.3),  ('2018-05-01 01:45:00+02', 1, 22.2),  ('2018-05-01 02:13:00+02', 1, 21.6),  ('2018-05-01 02:26:00+02', 1, 14.9),  ('2018-05-01 03:02:00+02', 1, 16.9);

To calculate a regular, weighted average, we'll need to do the following:

  • "Partition" the irregular samples into regular periods
  • Determine how long each sample was held (its duration)
  • Calculate a weight for each sample (its duration divided by the period)
  • Sum up value times weight for each period

Before presenting the code, we'll make the following assumptions:

  • The weighted average is calculated for a given time range, with a given period.
  • We don't need to deal with null values, which would make the solution slightly more complicated (namely when it comes to calculating weights).
  • The code is written for PostgreSQL using two techniques: common table expressions and window functions. If you use another DB, you might need to write it differently.

1. Converting irregular samples into regular periods

Suppose we are interested in calculating hourly weighted average for the time period between 2018-05-01 00:00:00+02 and 2018-05-01 04:00:00+02 for series 1. We'll start by querying for the given time range, adding an aligned stamp:

select  stamp,  to_timestamp(extract (epoch from stamp)::integer / 3600 * 3600)    as stamp_aligned,  valuefrom sampleswhere  series = 1 and  stamp >= '2018-05-01 00:00:00+02' and  stamp <= '2018-05-01 04:00:00+02';

Which gives us:

         stamp          |     stamp_aligned      | value ------------------------+------------------------+------- 2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 |  22.2 2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 |  21.6 2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 |  14.9 2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 |  16.9(4 rows)

We will have noticed that:

  • From the result we can't tell the value for 00:00:00, nor the value for 01:00:00.
  • The stamp_aligned column tell us to which time period the record belongs, but in fact the table is missing the value at the beginning of each period.

To solve these problems, we'll query for the last known value before the given time range, and add records for round hours, which we'll later fill with correct values:

witht_values as (  select * from (    -- select last value prior to time range    (select      stamp,      to_timestamp(extract(epoch from stamp)::integer / 3600 * 3600)        as stamp_aligned,      value,      false as filled_in    from samples    where      series = 1 and      stamp <  '2018-05-01 00:00:00+02'    order by      stamp desc    limit 1) union    -- select records from given time range    (select       stamp,      to_timestamp(extract(epoch from stamp)::integer / 3600 * 3600)        as stamp_aligned,      value,      false as filled_in    from samples    where      series = 1 and      stamp >= '2018-05-01 00:00:00+02' and      stamp <= '2018-05-01 04:00:00+02'    order by      stamp) union    -- select all regular periods for given time range    (select      stamp,      stamp as stamp_aligned,      null as value,      true as filled_in    from generate_series(      '2018-05-01 00:00:00+02',      '2018-05-01 04:00:00+02',      interval '3600 seconds'    ) stamp)  ) states  order by stamp)select * from t_values;

Which gives us

         stamp          |     stamp_aligned      | value | filled_in ------------------------+------------------------+-------+----------- 2018-04-30 23:00:00+02 | 2018-04-30 23:00:00+02 |  12.3 | f 2018-05-01 00:00:00+02 | 2018-05-01 00:00:00+02 |     ¤ | t 2018-05-01 01:00:00+02 | 2018-05-01 01:00:00+02 |     ¤ | t 2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 |  22.2 | f 2018-05-01 02:00:00+02 | 2018-05-01 02:00:00+02 |     ¤ | t 2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 |  21.6 | f 2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 |  14.9 | f 2018-05-01 03:00:00+02 | 2018-05-01 03:00:00+02 |     ¤ | t 2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 |  16.9 | f 2018-05-01 04:00:00+02 | 2018-05-01 04:00:00+02 |     ¤ | t(10 rows)

So we have at least one record for each time period, but we still need to fill in values for the filled-in records:

witht_values as (  ...),-- since records generated using generate_series do not contain values,-- we need to copy the value from the last non-generated record.t_with_filled_in_values as (  -- the outer query serves to remove any record prior to the given   -- time range  select *  from (    select       stamp,      stamp_aligned,      -- fill in value from last non-filled record (the first record       -- having the same filled_in_partition value)      (case when filled_in then        first_value(value) over (partition by filled_in_partition        order by stamp) else value end) as value    from (      select        stamp,         stamp_aligned,         value,        filled_in,        -- this field is incremented on every non-filled record        sum(case when filled_in then 0 else 1 end)           over (order by stamp) as filled_in_partition      from         t_values    ) t_filled_in_partition  ) t_filled_in_values  -- we wrap the filling-in query in order to remove any record before the  -- beginning of the given time range  where stamp >= '2018-05-01 00:00:00+02'  order by stamp)select * from t_with_filled_in_values;

Which gives us the following:

         stamp          |     stamp_aligned      | value ------------------------+------------------------+------- 2018-05-01 00:00:00+02 | 2018-05-01 00:00:00+02 |  12.3 2018-05-01 01:00:00+02 | 2018-05-01 01:00:00+02 |  12.3 2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 |  22.2 2018-05-01 02:00:00+02 | 2018-05-01 02:00:00+02 |  22.2 2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 |  21.6 2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 |  14.9 2018-05-01 03:00:00+02 | 2018-05-01 03:00:00+02 |  14.9 2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 |  16.9 2018-05-01 04:00:00+02 | 2018-05-01 04:00:00+02 |  16.9(9 rows)

So we're all good - we've added records with correct values for all round hours and we also removed the first record which gave us the value for the beginning of the time range, but was lying outside of it. No we are ready for the next step.

2. Calculating weighted average

We'll continue by calculating the duration for each record:

witht_values as (  ...),t_with_filled_in_values (  ...),t_with_weight as (  select    stamp,    stamp_aligned,    value,    -- use window to get stamp from next record in order to calculate     -- the duration of the record which, divided by the period, gives     -- us the weight.    coalesce(extract(epoch from (lead(stamp)      over (order by stamp) - stamp)), 3600)::float / 3600 as weight  from t_with_filled_in_values  order by stamp)select * from t_with_weight;

Which gives us:

         stamp          |     stamp_aligned      | value |       weight       ------------------------+------------------------+-------+-------------------- 2018-05-01 00:00:00+02 | 2018-05-01 00:00:00+02 |  12.3 |                  1 2018-05-01 01:00:00+02 | 2018-05-01 01:00:00+02 |  12.3 |               0.75 2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 |  22.2 |               0.25 2018-05-01 02:00:00+02 | 2018-05-01 02:00:00+02 |  22.2 |  0.216666666666667 2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 |  21.6 |  0.216666666666667 2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 |  14.9 |  0.566666666666667 2018-05-01 03:00:00+02 | 2018-05-01 03:00:00+02 |  14.9 | 0.0333333333333333 2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 |  16.9 |  0.966666666666667 2018-05-01 04:00:00+02 | 2018-05-01 04:00:00+02 |  16.9 |                  1(9 rows)

All that's left is to sum it up:

witht_values as (  ...),t_with_filled_in_values (  ...),t_with_weight as (  ...)select  stamp_aligned as stamp,  sum(value * weight) as avgfrom t_with_weightgroup by stamp_alignedorder by stamp_aligned;

The result:

         stamp          |       avg        ------------------------+------------------ 2018-05-01 00:00:00+02 |             12.3 2018-05-01 01:00:00+02 |           14.775 2018-05-01 02:00:00+02 | 17.9333333333333 2018-05-01 03:00:00+02 | 16.8333333333333 2018-05-01 04:00:00+02 |             16.9(5 rows)

You can find the complete code in this gist.


Weighted time average aggregator (wtavg) is supported in Axibase Time Series Database: http://axibase.com/products/axibase-time-series-database/visualization/widgets/configuring-the-widgets/aggregators/

wtavg weighs older samples at a linearly decreasing rate compared to current time.

This aggregator is supported in REST API, SQL layer, and the rule engine.

EDIT 2016-06-15T12:52:00Z: Supported interpolation functions:

  1. LINEAR
  2. PREVIOUS
  3. NEXT
  4. VALUE(v)
  5. NONE

Disclosure: I work for Axibase.