How do I force Postgres to use a particular index? How do I force Postgres to use a particular index? postgresql postgresql

How do I force Postgres to use a particular index?


Assuming you're asking about the common "index hinting" feature found in many databases, PostgreSQL doesn't provide such a feature. This was a conscious decision made by the PostgreSQL team. A good overview of why and what you can do instead can be found here. The reasons are basically that it's a performance hack that tends to cause more problems later down the line as your data changes, whereas PostgreSQL's optimizer can re-evaluate the plan based on the statistics. In other words, what might be a good query plan today probably won't be a good query plan for all time, and index hints force a particular query plan for all time.

As a very blunt hammer, useful for testing, you can use the enable_seqscan and enable_indexscan parameters. See:

These are not suitable for ongoing production use. If you have issues with query plan choice, you should see the documentation for tracking down query performance issues. Don't just set enable_ params and walk away.

Unless you have a very good reason for using the index, Postgres may be making the correct choice. Why?

  • For small tables, it's faster to do sequential scans.
  • Postgres doesn't use indexes when datatypes don't match properly, you may need to include appropriate casts.
  • Your planner settings might be causing problems.

See also this old newsgroup post.


Probably the only valid reason for using

set enable_seqscan=false

is when you're writing queries and want to quickly see what the query plan would actually be were there large amounts of data in the table(s). Or of course if you need to quickly confirm that your query is not using an index simply because the dataset is too small.


Fix your query planner configuration

This problem typically happens when the query planner's estimated cost of an index scan is too high and doesn't correctly reflect reality. To fix this, you need to lower the random_page_cost configuration parameter. From the Postgres documentation:

Reducing this value [...] will cause the system to prefer index scans; raising it will make index scans look relatively more expensive.

You can do a quick test whether this will actually make Postgres use the index:

EXPLAIN <query>;              # Uses sequential scanSET random_page_cost = 1;EXPLAIN <query>;              # May use index scan now

You can restore the default value with SET random_page_cost = DEFAULT; again. You can change the global default permanently with ALTER SYSTEM SET random_page_cost = 1;

Background

Index scans require non-sequential disk page fetches. Postgres uses random_page_cost to estimate the cost of such non-sequential fetches in relation to sequential fetches. The default value is 4.0, thus assuming an average cost factor of 4 compared to sequential fetches (taking caching effects into account).

The problem however is that this default value is unsuitable in the following scenarios:

1) Cached indices

If an index is already cached in RAM, then an index scan will always be significantly faster than a sequential scan. The query planner however doesn't exactly know which parts of the index are already cached, and thus might make an incorrect decision. The Postgres documentation says:

If your data is likely to be completely in cache, [...] decreasing random_page_cost can be appropriate.

So, how do you know whether "your data is likely to be cached"? Well, if a specific index is frequently used, and if the system has sufficient RAM, then data is likely to be cached eventually, and random_page_cost should be set to a lower value. You'll have to experiment with different values and see what works for you.

You could also use the pg_prewarm extension for explicit data caching.

2) Solid-state drives

As per the documentation:

Storage that has a low random read cost relative to sequential, e.g., solid-state drives, might also be better modeled with a lower value for random_page_cost, e.g., 1.1.

This slide from a speak at PostgresConf 2018 also says that random_page_cost should be set to something between 1.0 and 2.0 for solid-state drives.