Fastest way to PostgreSQL Distinct and Format
I think you shouldn't select distinct
from this huge table. Instead try to generate a short years sequence (say from 1900 to 2100) and select from this sequence only years that exist in acs_objects
table. Result set will be the same but I think it will be faster. EXISTS subquery have to run fast on an indexed field creation_date
.
SELECT y FROM( select generate_series(1900,2100) as y) as tWHERE EXISTS (SELECT 1 FROM acs_objects WHERE creation_date >= DATE (''||t.y||'-01-01') AND creation_date < DATE (''||t.y + 1||'-01-01'))
In your second attempt you get distinct dates from the sub-query which you then all convert to a string representation and then you select the distinct ones. That is rather inefficient. Better is it to first extract the distinct years from the creation_date
in a sub-query and simply cast those to text in the main query:
SELECT year::textFROM ( SELECT DISTINCT extract(year FROM creation_date) AS year FROM acs_objects) AS distinct_years;
If you create an INDEX
on the table, the query should run much faster still:
CREATE INDEX really_fast ON acs_objects((extract(year FROM creation_date)));
However, this may impact other uses of your table, in particular if you have many modifying statements (insert, update, delete). And this will only work if creation_date
has a data type of date
or timestamp
(specifically not timestamp with timezone
).
The below option looked promising because it does not use a sub-query, but it is in fact much slower (see comments below), probably because the DISTINCT
clause is applied on a string:
SELECT DISTINCT extract(year FROM creation_date)::textFROM acs_objects;
I'm no sure what you use it for. I probably would consider using a using a materialized view.
Now you can refresh the view when needed and have a very fast way to retrieve the (distinct) year list (since the data is basically static stored).
Have a look here: