Clickhouse as time-series storage
There are more than one ways to use CH as a time series database.My personal preference is to use one string array for metric names and one Float64 array for metric values.
This is a sample time series table:
CREATE TABLE ts1( entity String, ts UInt64, -- timestamp, milliseconds from January 1 1970 m Array(String), -- names of the metrics v Array(Float32), -- values of the metrics d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column) ENGINE = MergeTree(d, entity, 8192)
Here we are loading two metrics (load, temperature) for an entity(cpu):
INSERT INTO ts1(entity, ts, m, v) VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68])
And querying cpu load:
SELECT entity, dt, ts, v[indexOf(m, 'load')] AS loadFROM ts1 WHERE entity = 'cpu'┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │└────────┴─────────────────────┴───────────────┴──────┘
Get data as array of tuples:
SELECT entity, dt, ts, arrayMap((mm, vv) -> (mm, vv), m, v) AS metricsFROM ts1 ┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │└────────┴─────────────────────┴───────────────┴─────────────────────────────┘
Get data as rows of tuples:
SELECT entity, dt, ts, arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metricFROM ts1 ┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) ││ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │└────────┴─────────────────────┴───────────────┴───────────────┘
Get rows with the metric you want:
SELECT entity, dt, ts, arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metricsFROM ts1 WHERE metrics.1 = 'load'┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │└────────┴─────────────────────┴───────────────┴───────────────┘
Get metric names and values as columns:
SELECT entity, dt, ts, arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric, metric.1 AS metric_name, metric.2 AS metric_valueFROM ts1 ┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load │ 0.85 ││ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │ temp │ 68 │└────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘
Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.
It would probably be better to modify your schema to have 4 columns:
"some_entity_id", "timestamp", "metric_name", "metric_value"
You can include "metric_name" in the MergeTree index, to improve performance when searching for a specific metric of an entity. Test with and without it, to see if it's useful for the kind of queries you make.
did you see https://clickhouse.yandex/reference_en.html#ALTER ?
it's used only for *MergeTree clickhouse table engine