Distributed primary key - UUID, simple auto increment or custom sequential values? Distributed primary key - UUID, simple auto increment or custom sequential values? database database

Distributed primary key - UUID, simple auto increment or custom sequential values?


in broad terms, there are two approaches.

  1. you use sequential values. these may be divided up into groups, interleaved, whatever. they are the most efficient approach, but require collaboration and coordination.

  2. you use random values (this includes UIDs). these are much simpler but require more space. from "birthday collisions" we know that you if you need to store N values then a random key must be chosen from (more than) a range of N*N - http://en.wikipedia.org/wiki/Birthday_problem. working backwards, a 64 bit integer can hold about 32 bits of data if used as a random key - that's about 4 billion values. but that's for a probability of 50% collisions. you want a much lower probability, so a practical limit is around 10 million entries.

so, in simple terms, if you have a 64 bit key, a random approach would work for around 10 million entries a sequential approach for many more. in either case, that is probably more than you need.

if you have a 32 bit key then a random approach works for around a thousand values (a sequential approach goes to about 4 billion, as above).

obviously if you have a text value then you need to modify this accordingly, but UUIDs are designed to have "enough" values anyway http://en.wikipedia.org/wiki/Universally_unique_identifier

typically a database will provide a sequential ID and that is all you need. if not, the 64 bit random approach is usually simplest and worth the extra space.