Extrapolate daily historical values from a table that only records when a value changes (Postgresql 9.3) Extrapolate daily historical values from a table that only records when a value changes (Postgresql 9.3) postgresql postgresql

Extrapolate daily historical values from a table that only records when a value changes (Postgresql 9.3)


You could achieve it with usage of correlated subqueries and LATERAL:

SELECT sub.date, sub.location_id, scoreFROM (SELECT * FROM dw_dim_date      CROSS JOIN (SELECT DISTINCT location_id FROM score_history) s      WHERE date >= '2019-01-01'::date) sub,LATERAL(SELECT score FROM score_history sc          WHERE sc.happened_at::date <= sub.date           AND sc.location_id = sub.location_id         ORDER BY happened_at DESC LIMIT 1) l,LATERAL(SELECT MIN(happened_at::date) m1, MAX(happened_at::date) m2          FROM score_history sc         WHERE sc.location_id = sub.location_id) lmWHERE sub.date BETWEEN lm.m1 AND lm.m2ORDER BY location_id, date;

db<>fiddle demo

How it works:

1) s (it is cross join of all dates per location_id)

2) l (selecting score per location)

3) lm (selecting min/max date per location for filtering)

4) WHERE filter dates on range that is available, it could be relaxed if needed


I think you can try something like this. The main things I changed are wrapping things in DATE() and using another SO answer for the date finder:

SELECT  dw_dim_date.date,  (    SELECT      score    FROM      score_history    WHERE      DATE(score_history.happened_at) <= dw_dim_date.date    ORDER BY      score_history.happened_at DESC    LIMIT      1  ) as last_scoreFROM  dw_dim_dateWHERE  dw_dim_date.date >= DATE('2019-01-01')

This uses the SQL method from here to find the nearest past data to the one requested: PostgreSQL return exact or closest date to queried date


WITHmax_per_day_location AS (SELECT    SH.happened_at::DATE as day,    SH.location_id,    max(SH.happened_at) as happened_atFROM    score_history SHGROUP BY    SH.happened_at::DATE,    SH.location_id),date_location AS (SELECT DISTINCT    DD."date",    SH.location_idFROM    dw_dim_date DD,    max_per_day_location SH),value_partition AS (SELECT    DD."date",    DD.location_id,    SH.score,    SH.happened_at,    MPD.happened_at as hap2,    sum(case when score is null then 0 else 1 end) OVER    (PARTITION BY DD.location_id ORDER BY "date", SH.happened_at desc) AS value_partitionFROM    date_location DD    LEFT JOIN score_history SH    ON DD."date" = SH.happened_at::DATE    AND DD.location_id = SH.location_id    LEFT join max_per_day_location MPD    ON SH.happened_at = MPD.happened_atWHERE NOT (MPD.happened_at IS NULL           AND           SH.happened_at IS NOT NULL)ORDER BY    DD."date"),final AS (SELECT    "date",    location_id,    first_value(score) over wFROM    value_partitionWINDOW w AS (PARTITION BY location_id, value_partition             ORDER BY happened_at rows between unbounded preceding and unbounded following)order by "date")SELECT DISTINCT * FROM final ORDER BY location_id, date;

I'm sure there are less verbose ways to do this.

I've got a SQLFiddle with some test data here:http://sqlfiddle.com/#!17/9d122/1

The main thing that makes this work is making a "value partition" to access the previous non null value. More here:

The date_location subquery just makes a single row per location_id per day since that is the base "row level" desired in the output.

The max_per_day_location subquery is used to filter out the earlier entries for location/day combos that have multiple scores and only keep the last one for that day.