Find most common elements in array with a group by Find most common elements in array with a group by postgresql postgresql

Find most common elements in array with a group by


Quick and dirty:

SELECT group_name, color, count(*) AS ctFROM (   SELECT group_name, unnest(favorite_colors) AS color   FROM   tbl   ) subGROUP  BY 1,2ORDER  BY 1,3 DESC;

Better with a LATERAL JOIN

In Postgres 9.3 or later this is the cleaner form:

SELECT group_name, color, count(*) AS ctFROM   tbl t, unnest(t.favorite_colors) AS colorGROUP  BY 1,2ORDER  BY 1,3 DESC;

The above is shorthand for

...FROM tbl tJOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE...

And like with any other INNER JOIN, it would exclude rows without color (favorite_colors IS NULL) - as did the first query.

To include such rows in the result, use instead:

SELECT group_name, color, count(*) AS ctFROM   tbl tLEFT   JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUEGROUP  BY 1,2ORDER  BY 1,3 DESC;

You can easily aggregate the "most common" colors per group in the next step, but you'd need to define "most common colors" first ...

Most common colors

As per comment, pick colors with > 3 occurrences.

SELECT t.group_name, color, count(*) AS ctFROM   tbl t, unnest(t.favorite_colors) AS colorGROUP  BY 1,2HAVING count(*) > 3ORDER  BY 1,3 DESC;

To aggregate the top colors in an array (in descending order):

SELECT group_name, array_agg(color) AS top_colorsFROM  (   SELECT group_name, color   FROM   tbl t, unnest(t.favorite_colors) AS color   GROUP  BY 1,2   HAVING count(*) > 3   ORDER  BY 1, count(*) DESC   ) subGROUP BY 1;

-> SQLfiddle demonstrating all.