Surrogate vs. natural/business keys [closed] Surrogate vs. natural/business keys [closed] database database

Surrogate vs. natural/business keys [closed]


Just a few reasons for using surrogate keys:

  1. Stability: Changing a key because of a business or natural need will negatively affect related tables. Surrogate keys rarely, if ever, need to be changed because there is no meaning tied to the value.

  2. Convention: Allows you to have a standardized Primary Key column naming convention rather than having to think about how to join tables with various names for their PKs.

  3. Speed: Depending on the PK value and type, a surrogate key of an integer may be smaller, faster to index and search.


Both. Have your cake and eat it.

Remember there is nothing special about a primary key, except that it is labelled as such. It is nothing more than a NOT NULL UNIQUE constraint, and a table can have more than one.

If you use a surrogate key, you still want a business key to ensure uniqueness according to the business rules.


It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...

A disadvantage of surrogate keys is that they are meaningless (cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:

select sum(t.hours)from timesheets twhere t.dept_code = 'HR'and t.status = 'VALID'and t.project_code = 'MYPROJECT'and t.task = 'BUILD';

against:

select sum(t.hours)from timesheets t     join departents d on d.dept_id = t.dept_id     join timesheet_statuses s on s.status_id = t.status_id     join projects p on p.project_id = t.project_id     join tasks k on k.task_id = t.task_idwhere d.dept_code = 'HR'and s.status = 'VALID'and p.project_code = 'MYPROJECT'and k.task_code = 'BUILD';

Unless anyone seriously thinks the following is a good idea?:

select sum(t.hours)from timesheets twhere t.dept_id = 34394and t.status_id = 89    and t.project_id = 1253and t.task_id = 77;

"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you need to change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.

See also: my answer to another question