Postgresql UNION takes 10 times as long as running the individual queries
With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.
For this reason, especially combined with your except statements "union all" would likely be faster.
Read more here:http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html
In addition to combining the results of the first and second query, UNION
by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL
.
SELECT * FROM tableA EXCEPT SELECT * FROM tableBUNION ALLSELECT * FROM tableB EXCEPT SELECT * FROM tableA;
I don't think your code returns resultset you intend it to. I rather think you want to do this:
SELECT * FROM ( SELECT * FROM tableA EXCEPT SELECT * FROM tableB ) AS T1UNION SELECT * FROM ( SELECT * FROM tableB EXCEPT SELECT * FROM tableA ) AS T2;
In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:
SELECT * FROM tableA UNION SELECT * FROM tableBEXCEPT SELECT * FROM tableA INTERSECTSELECT * FROM tableB;
FWIW, using subqueries (derived tables T1
and T2
) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:
SELECT * FROM ( SELECT * FROM ( SELECT * FROM tableA EXCEPT SELECT * FROM tableB ) AS T2 UNION SELECT * FROM tableB ) AS T1EXCEPT SELECT * FROM tableA;
The above can be relationalised to:
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
...and I think not what is intended.