Slow simple update query on PostgreSQL database with 3 million rows Slow simple update query on PostgreSQL database with 3 million rows postgresql postgresql

Slow simple update query on PostgreSQL database with 3 million rows


I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%).My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.

  1. Transactions of bunch of "UPDATE myTable SET myField=value WHEREmyId=id"
    Gives 1,500 updates/sec. which means each run wouldtake at least 18 hours.
  2. HOT updates solution as described here with FILLFACTOR=50. Gives1,600 updates/sec. I use SSD's so it's a costly improvement as itdoubles the storage size.
  3. Insert in a temporary table of updated value and merge them afterwith UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUMfor each partition; 100,000 up/s otherwise. Cooool.
    Here is thesequence of operations:

CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,CONSTRAINT tempTable_pkey PRIMARY KEY (id));

Accumulate a bunch of updates in a buffer depending of available RAMWhen it's filled, or need to change of table/partition, or completed:

COPY tempTable FROM buffer;UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;COMMIT;TRUNCATE TABLE tempTable;VACUUM FULL ANALYZE myTable;

That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included. To save time, it's not necessary to make a vacuum FULL at the end but even a fast regular vacuum is usefull to control your transaction ID on the database and not get unwanted autovacuum during rush hours.


Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates

First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:

SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';

HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.

Ps. You need version 8.3 or better.


After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:

CREATE TABLE table2 AS SELECT   all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdatefrom myTable

Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)

Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.