Fastest way to PostgreSQL Distinct and Format
I think you shouldn't select
distinct from this huge table. Instead try to generate a short years sequence (say from 1900 to 2100) and select from this sequence only years that exist in
acs_objects table. Result set will be the same but I think it will be faster. EXISTS subquery have to run fast on an indexed field
SELECT y FROM( select generate_series(1900,2100) as y) as tWHERE EXISTS (SELECT 1 FROM acs_objects WHERE creation_date >= DATE (''||t.y||'-01-01') AND creation_date < DATE (''||t.y + 1||'-01-01'))
In your second attempt you get distinct dates from the sub-query which you then all convert to a string representation and then you select the distinct ones. That is rather inefficient. Better is it to first extract the distinct years from the
creation_date in a sub-query and simply cast those to text in the main query:
SELECT year::textFROM ( SELECT DISTINCT extract(year FROM creation_date) AS year FROM acs_objects) AS distinct_years;
If you create an
INDEX on the table, the query should run much faster still:
CREATE INDEX really_fast ON acs_objects((extract(year FROM creation_date)));
However, this may impact other uses of your table, in particular if you have many modifying statements (insert, update, delete). And this will only work if
creation_date has a data type of
timestamp (specifically not
timestamp with timezone).
The below option looked promising because it does not use a sub-query, but it is in fact much slower (see comments below), probably because the
DISTINCT clause is applied on a string:
SELECT DISTINCT extract(year FROM creation_date)::textFROM acs_objects;
I'm no sure what you use it for. I probably would consider using a using a materialized view.
Now you can refresh the view when needed and have a very fast way to retrieve the (distinct) year list (since the data is basically static stored).
Have a look here: