Query with ORDER BY is 13 times as slow when I add LIMIT 1
Ok, this is a pretty classic case.
Whenever you use
LIMIT (or the like such as
FETCH FIRST ... ROWS ONLY) the optimizer attempts to optimize the query so that fetching only the first rows(s) is as fast as possible. That means that the optimizer has a preference for execution plans where the first cost value is low, not the second one shown in the execution plan. Remember: the two cost values shown by PostgreSQL (e.g.,
cost=48.150..6,416.240 are the setup cost (48.150) and the total execution cost (6,416.240).
The "problem" here is that you have an index which supports your
ORDER BY clause. So, PostgreSQL thinks that it can just go through this index (in reverse order due to the
DESC modifier in your query) and check for each row in the other table whether it satisfies the other
WHERE clause or not. The problem is that the optimizer has no way of knowing whether that will be one of the very first rows or rather one at the end (according to the
ORDER BY). The optimizer does an arbitrary guess an believes the matching row will be more towards the begin than the end. This optimistic estimate is then used to calculate the cost value which turns out to be too optimistic so that PostgreSQL finally settles down on a bad execution plan.
When you change the
ORDER BY ... DESC to
ORDER BY ... ASC the optimizer does the same arbitrary but optimistic estimate which turns out to be more correct in that case, hence you get better execution time.
However, from optimizations perspective, the root cause is that the optimizer estimates that 2,491 rows will match the
tango = 650727. When the optimizer would correctly estimate that this just hits a few rows, then the problem would likely not occur.
WHERE clause is sufficiently trivial that a good estimate should be no problem. So, the main question is: how about your statistics on that table?
There are several ways to cope with this problem:
- Update your statistics (
ANALYZE) and see if that alone helps.
- Increase the number of most common values stored for that column (
ALTER TABLE ... SET STATISTICS). This also increases the sample size used to gather the statistics which means
ANALYZEtakes longer but yields more accurate results.
In theory, this should be enough to fix that problem. However, other options are:
- If you don't need the index on
created_atfor other reasons (like other queries), get rid of it.
- Re-write the query so that the bad execution plan is no option any more. In particular, it would be great if you could write the query so that the
ORDER BYclause uses the same table as the
WHEREclause: if you are lucky, you might have a column in
join_tablethat has the same order as
table_1.created_atso that it does not make any difference on which you order. However, be careful, this is easy to get wrong (e.g., sequential numbers filled by sequences might have outliners).
Although you are only adding limit 1 but any change to a query affects its execution plan, and indexes used.
To fix your issue, since you are saying that when order is ASC your query performance is good:
It seems the index created on table_1.created_at is ASC.I know in db2 you can specify when creating an index to be bi-directional ASC/DESC. I guess in postgresql you should have the same, if not you can create 2 indexes on the same field 1 with sort DESC and another wih SORT ASC