PostgreSQL: Defining a primary key on a large database PostgreSQL: Defining a primary key on a large database database database

PostgreSQL: Defining a primary key on a large database


Just did this exact test for a rather medium-large DB (200GB+), bigserial won by quite a large margin. It was faster to generate, faster to join, less code, smaller footprint. Because of the way postgres stores it, a bigint is negligible compared to a normal int. You'll run out of storage space from your content long before you ever have to worry about overflowing the bigint. Having done the computed hash vs bigint - surrogate bigint all the way.


I would choose to use a surrogate key, ie. a key that isn't part of the business data of your application. The additional space requirements of an additional 64-bit integer when you're dealing with upto 50 kilobytes of text per record is negligible. You will actually be using less space as soon as you're starting using this key as a foreign key in other tables.

Using a hash of the data stored in a record is a very bad candidate for a primary key, should the data on which the hash is based ever change. You will then have changed the primary key as well, resulting in updates all over the place if you have relations from other tables to this one.

PS. A similar question has been asked and answered here before.

Here's another nice write-up about the topic:http://www.agiledata.org/essays/keys.html


You'd have to have an awful lot of records before your primary key integer ran out.

The integer will be faster for joins than a 64 character string primary key would be. Also it is much easier for people writing queries to deal with.

If a collision is ever possible, you can't use the hash as your primary key. Primary keys must be guarnateed to be unique by definintion.

I've seen hundreds of production databases for different corporations and government entities and not one used a hash primary key. Think there might be a reason?

But, it seems stupid and a waste of disc space, because the field wouldn't serve any purpose but to be a primary key.

Since a surrogate primary key should always be meaningless except as a primary key, I'm not sure what your objection would be.