postgresql COUNT(DISTINCT ...) very slow

You can use this:

SELECT COUNT(*) FROM (SELECT DISTINCT column_name FROM table_name) AS temp;

This is much faster than:

COUNT(DISTINCT column_name)

performance postgresql count distinct

-- My default settings (this is basically a single-session machine, so work_mem is pretty high)SET effective_cache_size='2048MB';SET work_mem='16MB';\echo originalEXPLAIN ANALYZESELECT        COUNT (distinct val) as aantalFROM one        ;\echo group by+count(*)EXPLAIN ANALYZESELECT        distinct val       -- , COUNT(*)FROM oneGROUP BY val;\echo with CTEEXPLAIN ANALYZEWITH agg AS (    SELECT distinct val    FROM one    GROUP BY val    )SELECT COUNT (*) as aantalFROM agg        ;

Results:

original                                                      QUERY PLAN                                                      ---------------------------------------------------------------------------------------------------------------------- Aggregate  (cost=36448.06..36448.07 rows=1 width=4) (actual time=1766.472..1766.472 rows=1 loops=1)   ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=31.371..185.914 rows=1499845 loops=1) Total runtime: 1766.642 ms(3 rows)group by+count(*)                                                         QUERY PLAN                                                         ---------------------------------------------------------------------------------------------------------------------------- HashAggregate  (cost=36464.31..36477.31 rows=1300 width=4) (actual time=412.470..412.598 rows=1300 loops=1)   ->  HashAggregate  (cost=36448.06..36461.06 rows=1300 width=4) (actual time=412.066..412.203 rows=1300 loops=1)         ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=26.134..166.846 rows=1499845 loops=1) Total runtime: 412.686 ms(4 rows)with CTE                                                             QUERY PLAN                                                             ------------------------------------------------------------------------------------------------------------------------------------ Aggregate  (cost=36506.56..36506.57 rows=1 width=0) (actual time=408.239..408.239 rows=1 loops=1)   CTE agg     ->  HashAggregate  (cost=36464.31..36477.31 rows=1300 width=4) (actual time=407.704..407.847 rows=1300 loops=1)           ->  HashAggregate  (cost=36448.06..36461.06 rows=1300 width=4) (actual time=407.320..407.467 rows=1300 loops=1)                 ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=24.321..165.256 rows=1499845 loops=1)       ->  CTE Scan on agg  (cost=0.00..26.00 rows=1300 width=0) (actual time=407.707..408.154 rows=1300 loops=1)     Total runtime: 408.300 ms    (7 rows)

The same plan as for the CTE could probably also be produced by other methods (window functions)

performance postgresql count distinct

If your count(distinct(x)) is significantly slower than count(x) then you can speed up this query by maintaining x value counts in different table, for example table_name_x_counts (x integer not null, x_count int not null), using triggers. But your write performance will suffer and if you update multiple x values in single transaction then you'd need to do this in some explicit order to avoid possible deadlock.

CodeHunter

postgresql COUNT(DISTINCT ...) very slow

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last