Advantages and disadvantages of GUID / UUID database keys Advantages and disadvantages of GUID / UUID database keys database database

Advantages and disadvantages of GUID / UUID database keys


Advantages:

  • Can generate them offline.
  • Makes replication trivial (as opposed to int's, which makes it REALLY hard)
  • ORM's usually like them
  • Unique across applications. So We can use the PK's from our CMS (guid) in our app (also guid) and know we are NEVER going to get a clash.

Disadvantages:

  • Larger space use, but space is cheap(er)
  • Can't order by ID to get the insert order.
  • Can look ugly in a URL, but really, WTF are you doing putting a REAL DB key in a URL!? (This point disputed in comments below)
  • Harder to do manual debugging, but not that hard.

Personally, I use them for most PK's in any system of a decent size, but I got "trained" on a system which was replicated all over the place, so we HAD to have them. YMMV.

I think the duplicate data thing is rubbish - you can get duplicate data however you do it. Surrogate keys are usually frowned upon where ever I've been working. We DO use the WordPress-like system though:

  • unique ID for the row (GUID/whatever). Never visible to the user.
  • public ID is generated ONCE from some field (e.g. the title - make it the-title-of-the-article)

UPDATE:So this one gets +1'ed a lot, and I thought I should point out a big downside of GUID PK's: Clustered Indexes.

If you have a lot of records, and a clustered index on a GUID, your insert performance will SUCK, as you get inserts in random places in the list of items (thats the point), not at the end (which is quick)

So if you need insert performance, maybe use a auto-inc INT, and generate a GUID if you want to share it with someone else (ie, show it to a user in a URL)


Why doesn't anyone mention performance? When you have multiple joins, all based on these nasty GUIDs the performance will go through the floor, been there :(


@Matt Sheppard:

Say you have a table of customers. Surely you don't want a customer to exist in the table more than once, or lots of confusion will happen throughout your sales and logistics departments (especially if the multiple rows about the customer contain different information).

So you have a customer identifier which uniquely identifies the customer and you make sure that the identifier is known by the customer (in invoices), so that the customer and the customer service people have a common reference in case they need to communicate. To guarantee no duplicated customer records, you add a uniqueness-constraint to the table, either through a primary key on the customer identifier or via a NOT NULL + UNIQUE constraint on the customer identifier column.

Next, for some reason (which I can't think of), you are asked to add a GUID column to the customer table and make that the primary key. If the customer identifier column is now left without a uniqueness-guarantee, you are asking for future trouble throughout the organization because the GUIDs will always be unique.

Some "architect" might tell you that "oh, but we handle the real customer uniqueness constraint in our app tier!". Right. Fashion regarding that general purpose programming languages and (especially) middle tier frameworks changes all the time, and will generally never out-live your database. And there is a very good chance that you will at some point need to access the database without going through the present application. == Trouble. (But fortunately, you and the "architect" are long gone, so you will not be there to clean up the mess.) In other words: Do maintain obvious constraints in the database (and in other tiers, as well, if you have the time).

In other words: There may be good reasons to add GUID columns to tables, but please don't fall for the temptation to make that lower your ambitions for consistency within the real (==non-GUID) information.