Postgres pg_dump dumps database in a different order every time
Here is a handy script for pre-processing pg_dump
output to make it more suitable for diffing and storing in version control:
https://github.com/akaihola/pgtricks
pg_dump_splitsort.py
splits the dump into the following files:
0000_prologue.sql
: everything up to the first COPY0001_<schema>.<table>.sql
.
.NNNN_<schema>.<table>.sql
: data for each table sorted by the first field9999_epilogue.sql
: everything after the last COPY
The files for table data are numbered so a simple sorted concatenation of all files can be used to re-create the database:
$ cat *.sql | psql <database>
I've found that a good way to take a quick look at differences between dumps is to use the meld
tool on the whole directory:
$ meld old-dump/ new-dump/
Storing the dump in version control also gives a decent view on the differences. Here's how to configure git to use color in diffs:
# ~/.gitconfig[color] diff = true[color "diff"] frag = white blue bold meta = white green bold commit = white red bold
Note: If you have created/dropped/renamed tables, remember to delete all .sql
files before post-processing the new dump.
It's worth distinguishing schema and data here. The schema is dumped in a fairly deterministic order, most objects alphabetically, constrained by inter-object dependencies. There are some limited cases where the order is not fully constrained and may appear random to an outside observer, but that may get fixed in the next version.
The data on the other hand is dumped in disk order. This is usually what you want, because you want dumps to be fast and not use insane amounts of resources to do sorting. What you might be observing is that when you "modify the DB" you are doing an UPDATE, which will actually delete the old value and append the new value at the end. And that will of course upset your diff strategy.
A tool that might be more suitable for your purpose is pg_comparator.
It is impossible to force pg_dump to dump data in any particular order, as it dumps data in disk order - it is much faster this way.
You can use "-a -d" options for pg_dump and then "sort" output, but newlines in data will make sorted output unusable. But for basic comparison, whether anything changed, it would suffice.