Merging different granularity time series in influxdb Merging different granularity time series in influxdb database database

Merging different granularity time series in influxdb


NB. Just a clarification on InfluxDB terminology. You're probably storing trade and tick data in different measurements(analogous to a table). Series is a subdivision withing a measurement based on tag values. e.g

   Time       Ask        Bid     Market00:00.763     100         99    bar-baz

is one series

   Time       Ask        Bid     Market00:01.010     101         99    foo-bar

is another series(assuming you are storing Market name/id as a tag and not a field)

Answer

  1. InfluxQL https://docs.influxdata.com/influxdb/v1.7/query_language/spec/ - I can't think of a way to achieve what you need with InfluxQL (Influx Query Language) as it does not support joins.Perhaps what you could do on the client side is instead of requesting all tick data for a period and discarding most of it, make a request per trade and market to get exactly the (the most recent with respect to the trade) ask/bid datapoint that you need. Something like:
function merge(trades, market)  points = <empty list>  for next_trade in trades    quote = db.query("select last(ask), last(bid) from tick_data where time<=next_trade.timestamp and Market=market and time>next_trade.timestamp - 1m")    // or to get a list per market with one query    // quote_per_market = db.query("select last(ask), last(bid) from tick_data where time<=next_trade.timestamp group by Market")    points = points + join(next_trade, quote)  return points

Of course you'd have the overhead of querying the database more frequently but depending on the number of trades and your resource constraints it may be more efficient. NB. A potential pitfall here is that ask and bid retrieved this way are not retrieved as a pair but independently and while they are returned as a pair it could happen that they have different timestamps. If for some timestamp for some reason you only have an ask or a bid price you might run into this problem. However, as long as you write them in pairs and have no missing data it should be ok.

  1. Flux https://www.influxdata.com/products/flux/ - Flux is a more sophisticated query language that is part of Influxdb 1.7 and 2 that allows you to do joins and operations across different measurements. I can't give you any examples yet but it's worth having a look at.

  2. Other (relational) Times Series DBs that you could have a look at that would also allow you to do joins are CrateDB https://crate.io/ or Postgres + TimescaleDB https://www.timescale.com/products