Aggregating the most recent joined records per week

sql postgresql greatest-n-per-group

You need one data item per week and goal (before aggregating counts per company). That's a plain CROSS JOIN between generate_series() and goals. The (possibly) expensive part is to get the current state from updates for each. Like @Paul already suggested, a LATERAL join seems like the best tool. Do it only for updates, though, and use a faster technique with LIMIT 1.

And simplify date handling with date_trunc().

SELECT w_start     , g.company_id     , count(*) FILTER (WHERE u.status = 'green') AS green_count     , count(*) FILTER (WHERE u.status = 'amber') AS amber_count     , count(*) FILTER (WHERE u.status = 'red')   AS red_countFROM   generate_series(date_trunc('week', NOW() - interval '2 months')                     , date_trunc('week', NOW())                     , interval '1 week') w_startCROSS  JOIN goals gLEFT   JOIN LATERAL (   SELECT status   FROM   updates   WHERE  goal_id = g.id   AND    created_at < w_start   ORDER  BY created_at DESC   LIMIT  1   ) u ON trueGROUP  BY w_start, g.company_idORDER  BY w_start, g.company_id;

To make this fast you need a multicolumn index:

CREATE INDEX updates_special_idx ON updates (goal_id, created_at DESC, status);

Descending order for created_at is best, but not strictly necessary. Postgres can scan indexes backwards almost exactly as fast. (Not applicable for inverted sort order of multiple columns, though.)

Index columns in that order. Why?

Multicolumn index and performance

And the third column status is only appended to allow fast index-only scans on updates. Related case:

Slow index scans in large table

1k goals for 9 weeks (your interval of 2 months overlaps with at least 9 weeks) only require 9k index look-ups for the 2nd table of only 1k rows. For small tables like this, performance shouldn't be much of a problem. But once you have a couple of thousand more in each table, performance will deteriorate with sequential scans.

w_start represents the start of each week. Consequently, counts are for the start of the week. You can still extract year and week (or any other details represent your week), if you insist:

   EXTRACT(isoyear from w_start) AS year , EXTRACT(week    from w_start) AS week

Best with ISOYEAR, like @Paul explained.

SQL Fiddle.

sql postgresql greatest-n-per-group

This seems like a good use for LATERAL joins:

SELECT  EXTRACT(ISOYEAR FROM s) AS year,        EXTRACT(WEEK FROM s) AS week,        u.company_id,        COUNT(u.goal_id) FILTER (WHERE u.status = 'green') AS green_count,        COUNT(u.goal_id) FILTER (WHERE u.status = 'amber') AS amber_count,        COUNT(u.goal_id) FILTER (WHERE u.status = 'red') AS red_countFROM    generate_series(NOW() - INTERVAL '2 months', NOW(), '1 week') s(w)LEFT OUTER JOIN LATERAL (  SELECT  DISTINCT ON (g.company_id, u2.goal_id) g.company_id, u2.goal_id, u2.status  FROM    updates u2  INNER JOIN goals g  ON      g.id = u2.goal_id  WHERE   u2.created_at <= s.w  ORDER BY g.company_id, u2.goal_id, u2.created_at DESC) u ON trueWHERE   u.company_id IS NOT NULLGROUP BY year, week, u.company_idORDER BY u.company_id, year, week;

Btw I am extracting ISOYEAR not YEAR to ensure I get sensible results around the beginning of January. For instance EXTRACT(YEAR FROM '2016-01-01 08:49:56.734556-08') is 2016 but EXTRACT(WEEK FROM '2016-01-01 08:49:56.734556-08') is 53!

EDIT: You should test on your real data, but I feel like this ought to be faster:

SELECT  year,        week,        company_id,        COUNT(goal_id) FILTER (WHERE last_status = 'green') AS green_count,        COUNT(goal_id) FILTER (WHERE last_status = 'amber') AS amber_count,        COUNT(goal_id) FILTER (WHERE last_status = 'red') AS red_countFROM    (  SELECT  EXTRACT(ISOYEAR FROM s) AS year,          EXTRACT(WEEK FROM s) AS week,          u.company_id,          u.goal_id,          (array_agg(u.status ORDER BY u.created_at DESC))[1] AS last_status  FROM    generate_series(NOW() - INTERVAL '2 months', NOW(), '1 week') s(t)  LEFT OUTER JOIN (     SELECT  g.company_id, u2.goal_id, u2.created_at, u2.status    FROM    updates u2    INNER JOIN goals g     ON      g.id = u2.goal_id  ) u   ON      s.t >= u.created_at  WHERE   u.company_id IS NOT NULL  GROUP BY year, week, u.company_id, u.goal_id) xGROUP BY year, week, company_idORDER BY company_id, year, week;

Still no window functions though. :-) Also you can speed it up a bit more by replacing (array_agg(...))[1] with a real first function. You'll have to define that yourself, but there are implementations on the Postgres wiki that are easy to Google for.

sql postgresql greatest-n-per-group

I use PostgreSQL 9.3. I'm interested in your question. I examined your data structure. Than I create the following tables.

I insert the following records;

Company

Goals

Updates

After that I wrote the following query, for correction

SELECT c.id company_id, c.name company_name, u.status goal_status,          EXTRACT(week from u.created_at) goal_status_week,         EXTRACT(year from u.created_at) AS goal_status_year FROM company cINNER JOIN goals g ON g.company_id = c.id INNER JOIN updates u ON u.goal_id = g.idORDER BY goal_status_year DESC, goal_status_week DESC;

I get the following results;

At last I merge this query with week series

SELECT             gs.company_id,             gs.company_name,             gs.goal_status,             EXTRACT(year from w) AS year,        EXTRACT(week from w) AS week,             COUNT(gs.*) cntFROM generate_series(NOW() - INTERVAL '3 MONTHS', NOW(), '1 week') wLEFT JOIN(SELECT c.id company_id, c.name company_name, u.status goal_status,              EXTRACT(week from u.created_at) goal_status_week,       EXTRACT(year from u.created_at) AS goal_status_year FROM company cINNER JOIN goals g ON g.company_id = c.id INNER JOIN updates u ON u.goal_id = g.id ) gs ON gs.goal_status_week = EXTRACT(week from w) AND gs.goal_status_year = EXTRACT(year from w)GROUP BY company_id, company_name, goal_status, year, weekORDER BY  year DESC, week DESC;

I get this result

Have a good day.

CodeHunter

Aggregating the most recent joined records per week

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last