Postgresql: How do I ensure that indexes are in memory Postgresql: How do I ensure that indexes are in memory postgresql postgresql

Postgresql: How do I ensure that indexes are in memory


Those seq scans make it look like you didn't run analyze on the table after importing your data.

http://www.postgresql.org/docs/current/static/sql-analyze.html

During normal operation, scheduling to run vacuum analyze isn't useful, because the autovacuum periodically kicks in. But it is important when doing massive writes, such as during imports.

On a slightly related note, see this reversed index tip on Pavel's PostgreSQL Tricks site, if you ever need to run anchord queries at the end, rather than at the beginning, e.g. like '%.com'

http://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#section_20


Regarding your actual questions, be wary that some of the suggestions in that post you liked to are dubious at best. Changing the cost of index use is frequently dubious and disabling seq scan is downright silly. (Sometimes, it is cheaper to seq scan a table than itis to use an index.)

With that being said:

  1. Postgres primarily caches indexes based on how often they're used, and it will not use an index if the stats suggest that it shouldn't -- hence the need to analyze after an import. Giving Postgres plenty of memory will, of course, increase the likelihood it's in memory too, but keep the latter points in mind.
  2. and 3. Full text search works fine.

For further reading on fine-tuning, see the manual and:

http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Two last notes on your schema:

  1. Last I checked, bigint (bigserial in your case) was slower than plain int. (This was a while ago, so the difference might now be negligible on modern, 64-bit servers.) Unless you foresee that you'll actually need more than 2.3 billion entries, int is plenty and takes less space.
  2. From an implementation standpoint, the only difference between a varchar(300) and a varchar without a specified length (or text, for that matter) is an extra check constraint on the length. If you don't actually need data to fit that size and are merely doing so for no reason other than habit, your db inserts and updates will run faster by getting rid of that constraint.


Unless your encoding or collation is C or POSIX, an ordinary btree index cannot efficiently satisfy an anchored like query. You may have to declare a btree index with the varchar_pattern_ops op class to benefit.


The problem is that you're getting hit with a full table scan for each of those lookups ("index in memory" isn't really an issue). Each time you run one of those queries the database is visiting every single row, which is causing the high disk usage. You might check here for a little more information (especially follow the links to the docs on operator classes and index types). If you follow that advice you should be able to get prefix lookups working fine, i.e. those situations where you're matching something like 'orange%'.

Full text search is nice for more natural text search, like written documents, but it might be more difficult to get it working well for URL searching. There was also this thread in the mailing lists a few months back that might have more domain-specific information for what you're trying to do.