Postgres pg_dump dumps database in a different order every time Postgres pg_dump dumps database in a different order every time postgresql postgresql

Postgres pg_dump dumps database in a different order every time


Here is a handy script for pre-processing pg_dump output to make it more suitable for diffing and storing in version control:

https://github.com/akaihola/pgtricks

pg_dump_splitsort.py splits the dump into the following files:

  • 0000_prologue.sql: everything up to the first COPY
  • 0001_<schema>.<table>.sql
    .
    .
    NNNN_<schema>.<table>.sql: data for each table sorted by the first field
  • 9999_epilogue.sql: everything after the last COPY

The files for table data are numbered so a simple sorted concatenation of all files can be used to re-create the database:

$ cat *.sql | psql <database>

I've found that a good way to take a quick look at differences between dumps is to use the meld tool on the whole directory:

$ meld old-dump/ new-dump/

Storing the dump in version control also gives a decent view on the differences. Here's how to configure git to use color in diffs:

# ~/.gitconfig[color]        diff = true[color "diff"]        frag = white blue bold        meta = white green bold        commit = white red bold

Note: If you have created/dropped/renamed tables, remember to delete all .sql files before post-processing the new dump.


It's worth distinguishing schema and data here. The schema is dumped in a fairly deterministic order, most objects alphabetically, constrained by inter-object dependencies. There are some limited cases where the order is not fully constrained and may appear random to an outside observer, but that may get fixed in the next version.

The data on the other hand is dumped in disk order. This is usually what you want, because you want dumps to be fast and not use insane amounts of resources to do sorting. What you might be observing is that when you "modify the DB" you are doing an UPDATE, which will actually delete the old value and append the new value at the end. And that will of course upset your diff strategy.

A tool that might be more suitable for your purpose is pg_comparator.


It is impossible to force pg_dump to dump data in any particular order, as it dumps data in disk order - it is much faster this way.

You can use "-a -d" options for pg_dump and then "sort" output, but newlines in data will make sorted output unusable. But for basic comparison, whether anything changed, it would suffice.