Does EXCEPT execute faster than a JOIN when the table columns are the same
There is no way anyone can tell you that EXCEPT
will always or never out-perform an equivalent OUTER JOIN
. The optimizer will choose an appropriate execution plan regardless of how you write your intent.
That said, here is my guideline:
Use EXCEPT
when at least one of the following is true:
- The query is more readable (this will almost always be true).
- Performance is improved.
And BOTH of the following are true:
- The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
- Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
It is important to note that it can be a challenge to write an equivalent EXCEPT
query as the JOIN
becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS
equivalent, while slightly less readable than EXCEPT
should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS
or NEVER
, except in the way I just did).
In the following example, the LEFT JOIN
is faster than EXCEPT
by 70%(PostgreSQL 9.4.3)
Example:
There are three tables. suppliers
, parts
, shipments
.We need to get all parts not supplied by any supplier in London.
Database(has indexes on all involved columns):
CREATE TABLE suppliers ( id bigint primary key, city character varying NOT NULL);CREATE TABLE parts ( id bigint primary key, name character varying NOT NULL,);CREATE TABLE shipments ( id bigint primary key, supplier_id bigint NOT NULL, part_id bigint NOT NULL);
Records count:
db=# SELECT COUNT(*) FROM suppliers; count--------- 1281280(1 row)db=# SELECT COUNT(*) FROM parts; count--------- 1280000(1 row)db=# SELECT COUNT(*) FROM shipments; count--------- 1760161(1 row)
Query using EXCEPT
.
SELECT parts.* FROM partsEXCEPTSELECT parts.* FROM parts LEFT JOIN shipments ON (parts.id = shipments.part_id) LEFT JOIN suppliers ON (shipments.supplier_id = suppliers.id) WHERE suppliers.city = 'London';-- Execution time: 3327.728 ms
Query using LEFT JOIN
with table, returned by subquery.
SELECT parts.* FROM parts LEFT JOIN ( SELECT parts.id FROM parts LEFT JOIN shipments ON (parts.id = shipments.part_id) LEFT JOIN suppliers ON (shipments.supplier_id = suppliers.id) WHERE suppliers.city = 'London' ) AS subquery_tbl ON (parts.id = subquery_tbl.id)WHERE subquery_tbl.id IS NULL;-- Execution time: 1136.393 ms