Aggregating the most recent joined records per week Aggregating the most recent joined records per week postgresql postgresql

Aggregating the most recent joined records per week


You need one data item per week and goal (before aggregating counts per company). That's a plain CROSS JOIN between generate_series() and goals. The (possibly) expensive part is to get the current state from updates for each. Like @Paul already suggested, a LATERAL join seems like the best tool. Do it only for updates, though, and use a faster technique with LIMIT 1.

And simplify date handling with date_trunc().

SELECT w_start     , g.company_id     , count(*) FILTER (WHERE u.status = 'green') AS green_count     , count(*) FILTER (WHERE u.status = 'amber') AS amber_count     , count(*) FILTER (WHERE u.status = 'red')   AS red_countFROM   generate_series(date_trunc('week', NOW() - interval '2 months')                     , date_trunc('week', NOW())                     , interval '1 week') w_startCROSS  JOIN goals gLEFT   JOIN LATERAL (   SELECT status   FROM   updates   WHERE  goal_id = g.id   AND    created_at < w_start   ORDER  BY created_at DESC   LIMIT  1   ) u ON trueGROUP  BY w_start, g.company_idORDER  BY w_start, g.company_id;

To make this fast you need a multicolumn index:

CREATE INDEX updates_special_idx ON updates (goal_id, created_at DESC, status);

Descending order for created_at is best, but not strictly necessary. Postgres can scan indexes backwards almost exactly as fast. (Not applicable for inverted sort order of multiple columns, though.)

Index columns in that order. Why?

And the third column status is only appended to allow fast index-only scans on updates. Related case:

1k goals for 9 weeks (your interval of 2 months overlaps with at least 9 weeks) only require 9k index look-ups for the 2nd table of only 1k rows. For small tables like this, performance shouldn't be much of a problem. But once you have a couple of thousand more in each table, performance will deteriorate with sequential scans.

w_start represents the start of each week. Consequently, counts are for the start of the week. You can still extract year and week (or any other details represent your week), if you insist:

   EXTRACT(isoyear from w_start) AS year , EXTRACT(week    from w_start) AS week

Best with ISOYEAR, like @Paul explained.

SQL Fiddle.

Related:


This seems like a good use for LATERAL joins:

SELECT  EXTRACT(ISOYEAR FROM s) AS year,        EXTRACT(WEEK FROM s) AS week,        u.company_id,        COUNT(u.goal_id) FILTER (WHERE u.status = 'green') AS green_count,        COUNT(u.goal_id) FILTER (WHERE u.status = 'amber') AS amber_count,        COUNT(u.goal_id) FILTER (WHERE u.status = 'red') AS red_countFROM    generate_series(NOW() - INTERVAL '2 months', NOW(), '1 week') s(w)LEFT OUTER JOIN LATERAL (  SELECT  DISTINCT ON (g.company_id, u2.goal_id) g.company_id, u2.goal_id, u2.status  FROM    updates u2  INNER JOIN goals g  ON      g.id = u2.goal_id  WHERE   u2.created_at <= s.w  ORDER BY g.company_id, u2.goal_id, u2.created_at DESC) u ON trueWHERE   u.company_id IS NOT NULLGROUP BY year, week, u.company_idORDER BY u.company_id, year, week;

Btw I am extracting ISOYEAR not YEAR to ensure I get sensible results around the beginning of January. For instance EXTRACT(YEAR FROM '2016-01-01 08:49:56.734556-08') is 2016 but EXTRACT(WEEK FROM '2016-01-01 08:49:56.734556-08') is 53!

EDIT: You should test on your real data, but I feel like this ought to be faster:

SELECT  year,        week,        company_id,        COUNT(goal_id) FILTER (WHERE last_status = 'green') AS green_count,        COUNT(goal_id) FILTER (WHERE last_status = 'amber') AS amber_count,        COUNT(goal_id) FILTER (WHERE last_status = 'red') AS red_countFROM    (  SELECT  EXTRACT(ISOYEAR FROM s) AS year,          EXTRACT(WEEK FROM s) AS week,          u.company_id,          u.goal_id,          (array_agg(u.status ORDER BY u.created_at DESC))[1] AS last_status  FROM    generate_series(NOW() - INTERVAL '2 months', NOW(), '1 week') s(t)  LEFT OUTER JOIN (     SELECT  g.company_id, u2.goal_id, u2.created_at, u2.status    FROM    updates u2    INNER JOIN goals g     ON      g.id = u2.goal_id  ) u   ON      s.t >= u.created_at  WHERE   u.company_id IS NOT NULL  GROUP BY year, week, u.company_id, u.goal_id) xGROUP BY year, week, company_idORDER BY company_id, year, week;

Still no window functions though. :-) Also you can speed it up a bit more by replacing (array_agg(...))[1] with a real first function. You'll have to define that yourself, but there are implementations on the Postgres wiki that are easy to Google for.


I use PostgreSQL 9.3. I'm interested in your question. I examined your data structure. Than I create the following tables.

Data structure

I insert the following records;

Company

Company records

Goals

Goals records

Updates

Updates Records

After that I wrote the following query, for correction

SELECT c.id company_id, c.name company_name, u.status goal_status,          EXTRACT(week from u.created_at) goal_status_week,         EXTRACT(year from u.created_at) AS goal_status_year FROM company cINNER JOIN goals g ON g.company_id = c.id INNER JOIN updates u ON u.goal_id = g.idORDER BY goal_status_year DESC, goal_status_week DESC;

I get the following results;Inner Sql result

At last I merge this query with week series

SELECT             gs.company_id,             gs.company_name,             gs.goal_status,             EXTRACT(year from w) AS year,        EXTRACT(week from w) AS week,             COUNT(gs.*) cntFROM generate_series(NOW() - INTERVAL '3 MONTHS', NOW(), '1 week') wLEFT JOIN(SELECT c.id company_id, c.name company_name, u.status goal_status,              EXTRACT(week from u.created_at) goal_status_week,       EXTRACT(year from u.created_at) AS goal_status_year FROM company cINNER JOIN goals g ON g.company_id = c.id INNER JOIN updates u ON u.goal_id = g.id ) gs ON gs.goal_status_week = EXTRACT(week from w) AND gs.goal_status_year = EXTRACT(year from w)GROUP BY company_id, company_name, goal_status, year, weekORDER BY  year DESC, week DESC;

I get this result

Final result

Have a good day.