Postgres pg_dump dumps database in a different order every time

php database linux postgresql dump

Here is a handy script for pre-processing pg_dump output to make it more suitable for diffing and storing in version control:

https://github.com/akaihola/pgtricks

pg_dump_splitsort.py splits the dump into the following files:

0000_prologue.sql: everything up to the first COPY
0001_<schema>.<table>.sql
.
.
NNNN_<schema>.<table>.sql: data for each table sorted by the first field
9999_epilogue.sql: everything after the last COPY

The files for table data are numbered so a simple sorted concatenation of all files can be used to re-create the database:

$ cat *.sql | psql <database>

I've found that a good way to take a quick look at differences between dumps is to use the meld tool on the whole directory:

$ meld old-dump/ new-dump/

Storing the dump in version control also gives a decent view on the differences. Here's how to configure git to use color in diffs:

# ~/.gitconfig[color]        diff = true[color "diff"]        frag = white blue bold        meta = white green bold        commit = white red bold

Note: If you have created/dropped/renamed tables, remember to delete all .sql files before post-processing the new dump.

php database linux postgresql dump

It's worth distinguishing schema and data here. The schema is dumped in a fairly deterministic order, most objects alphabetically, constrained by inter-object dependencies. There are some limited cases where the order is not fully constrained and may appear random to an outside observer, but that may get fixed in the next version.

The data on the other hand is dumped in disk order. This is usually what you want, because you want dumps to be fast and not use insane amounts of resources to do sorting. What you might be observing is that when you "modify the DB" you are doing an UPDATE, which will actually delete the old value and append the new value at the end. And that will of course upset your diff strategy.

A tool that might be more suitable for your purpose is pg_comparator.

php database linux postgresql dump

It is impossible to force pg_dump to dump data in any particular order, as it dumps data in disk order - it is much faster this way.

You can use "-a -d" options for pg_dump and then "sort" output, but newlines in data will make sorted output unusable. But for basic comparison, whether anything changed, it would suffice.

CodeHunter

Postgres pg_dump dumps database in a different order every time

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last