SQL: Primary key column. Artificial "Id" column vs "Natural" columns [duplicate] SQL: Primary key column. Artificial "Id" column vs "Natural" columns [duplicate] sql sql

SQL: Primary key column. Artificial "Id" column vs "Natural" columns [duplicate]

This is a bit of a religious debate. My personal preference is to have synthetic primary keys rather than natural primary keys but there are good arguments on both sides. Realistically, so long as you are consistent and reasonable, either approach can work well.

If you use natural keys, the two major downsides are the presence of composite keys and mutating primary key values. If you have composite primary keys, you'd obviously have to have multiple columns in each child table. That can get unwieldy from a data model perspective when there are many relationships among entities. But it can also cause grief for people developing queries-- it's awfully easy to create queries that use N-1 of N join conditions and get almost the right result. If you have natural keys, you'll also inevitably encounter a situation where the natural key value changes and you then have to ripple that change through many different entities-- that's vastly more complicated than changing a unique value in the table.

On the other hand, if you use synthetic keys, you're wasting space by adding additional columns, adding additional overhead to maintain an additional index, and you're increasing the risk that you'll get functionally duplicated results. It's awfully easy to either forget to create a unique constraint on the business key or to see that there is a non-unique index on the combination and just assume that it was a unique index. I actually just got bitten by this particular failing a couple days ago-- I had indexed the composite natural key (with a non-unique index) rather than creating a unique constraint. Dumb mistake but one that's relatively easy to make.

From a query writing and naming convention standpoint, I would also tend to prefer synthetic keys because it's nice to know when you're joining tables that the primary key of A is going to be A_ID and the primary key of B is going to be B_ID. That's far more self-documenting than trying to remember that the primary key of A is the combination of A_NAME and A_REVISION_NUMBER and that the primary key of B is B_CODE.

There is little or no difference between a key enforced through a PRIMARY KEY constraint and a key enforced through a UNIQUE constraint. What's important is that you enforce ALL the keys necessary from a data integrity perspective. Usually that means at least one "natural" key (a key exposed to the users/consumers of the data and used to identify the facts about the universe of discourse) per table.

Optionally you might also want to create "technical" keys to support the application and database features rather than the end user (usually called surrogate keys). That should be very much a secondary consideration however. In the interests of simplicity (and very often performance as well) it usually makes sense only to create surrogate keys where you have identified a particular need for them and not before.

It depends on your natural columns. If they are small and steadily increasing, then they are good candidates for the primary key.

  • Small - the smaller the key, the more values you can get into a single row, and the faster your index scans will be
  • Steadily increasing - produces fewer index reshuffles as the table grows, improving performance.