Should primary keys be always assigned as clustered index Should primary keys be always assigned as clustered index sql sql

Should primary keys be always assigned as clustered index


Yes, it is possible to have a non-clustered primary key, and it is possible to have a clustered key that is completely unrelated to the primary key. By default a primary keys gets to be the clustered index key too, but this is not a requirement.

The primary key is a logical concept: is the key used in your data model to reference entities.
The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.

Choosing a different clustered key is driven by a variety of factors, like key width when you desire a narrower clustered key than the primary key (because the clustered key gets replicated in every non-clustered index. Or support for frequent range scans (common in time series) when the data is frequently accessed with queries like date between '20100101' and '20100201' (a clustered index key on date would be appropriate).

This subject has been discussed here ad nauseam before, see also What column should the clustered index be put on?.


The ideal clustered index key is:

  1. Sequential
  2. Selective (no dupes, unique for each record)
  3. Narrow
  4. Used in Queries

In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.

EDIT FOR CLARITY:

PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.

In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.


First, I have to say that I have misgivings about the choice of a GUID as the primary key for this table. I am of the opinion that EmployeeNumber would probably be a better choice, and something naturally unique about the employee would be better than that, such as an SSN (or ATIN), which employers must legally obtain anyway (at least in the US).

Putting that aside, you should never base a clustered index on a GUID column. The clustered index specifies the physical order of rows in the table. Since GUID values are (in theory) completely random, every new row will fall at a random location. This is very bad for performance. There is something called 'sequential' GUIDs, but I would consider this a bit of a hack.