Extrapolate daily historical values from a table that only records when a value changes (Postgresql 9.3)
You could achieve it with usage of correlated subqueries and LATERAL
:
SELECT sub.date, sub.location_id, scoreFROM (SELECT * FROM dw_dim_date CROSS JOIN (SELECT DISTINCT location_id FROM score_history) s WHERE date >= '2019-01-01'::date) sub,LATERAL(SELECT score FROM score_history sc WHERE sc.happened_at::date <= sub.date AND sc.location_id = sub.location_id ORDER BY happened_at DESC LIMIT 1) l,LATERAL(SELECT MIN(happened_at::date) m1, MAX(happened_at::date) m2 FROM score_history sc WHERE sc.location_id = sub.location_id) lmWHERE sub.date BETWEEN lm.m1 AND lm.m2ORDER BY location_id, date;
How it works:
1) s
(it is cross join of all dates per location_id)
2) l
(selecting score per location)
3) lm
(selecting min/max date per location for filtering)
4) WHERE
filter dates on range that is available, it could be relaxed if needed
I think you can try something like this. The main things I changed are wrapping things in DATE() and using another SO answer for the date finder:
SELECT dw_dim_date.date, ( SELECT score FROM score_history WHERE DATE(score_history.happened_at) <= dw_dim_date.date ORDER BY score_history.happened_at DESC LIMIT 1 ) as last_scoreFROM dw_dim_dateWHERE dw_dim_date.date >= DATE('2019-01-01')
This uses the SQL method from here to find the nearest past data to the one requested: PostgreSQL return exact or closest date to queried date
WITHmax_per_day_location AS (SELECT SH.happened_at::DATE as day, SH.location_id, max(SH.happened_at) as happened_atFROM score_history SHGROUP BY SH.happened_at::DATE, SH.location_id),date_location AS (SELECT DISTINCT DD."date", SH.location_idFROM dw_dim_date DD, max_per_day_location SH),value_partition AS (SELECT DD."date", DD.location_id, SH.score, SH.happened_at, MPD.happened_at as hap2, sum(case when score is null then 0 else 1 end) OVER (PARTITION BY DD.location_id ORDER BY "date", SH.happened_at desc) AS value_partitionFROM date_location DD LEFT JOIN score_history SH ON DD."date" = SH.happened_at::DATE AND DD.location_id = SH.location_id LEFT join max_per_day_location MPD ON SH.happened_at = MPD.happened_atWHERE NOT (MPD.happened_at IS NULL AND SH.happened_at IS NOT NULL)ORDER BY DD."date"),final AS (SELECT "date", location_id, first_value(score) over wFROM value_partitionWINDOW w AS (PARTITION BY location_id, value_partition ORDER BY happened_at rows between unbounded preceding and unbounded following)order by "date")SELECT DISTINCT * FROM final ORDER BY location_id, date;
I'm sure there are less verbose ways to do this.
I've got a SQLFiddle with some test data here:http://sqlfiddle.com/#!17/9d122/1
The main thing that makes this work is making a "value partition" to access the previous non null value. More here:
The date_location
subquery just makes a single row per location_id per day since that is the base "row level" desired in the output.
The max_per_day_location
subquery is used to filter out the earlier entries for location/day combos that have multiple scores and only keep the last one for that day.