Do clustered indexes have to be unique? Do clustered indexes have to be unique? database database

Do clustered indexes have to be unique?


They don't have to be unique but it certainly is encouraged.
I haven't encountered a scenario yet where I wanted to create a CI on a non-unique column.

What happens if you create a CI on a non-unique column

If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier

Does this lead to bad performance?

Adding a uniqueifier certainly adds some overhead in calculating and in storing it.
If this overhead will be noticable depends on several factors.

  • How much data the table contains.
  • What is the rate of inserts.
  • How often is the CI used in a select (when no covering indexes exist, pretty much always).

Edit
as been pointed out by Remus in comments, there do exist use cases where creating a non-unique CI would be a reasonable choice. Me not having encountered one off those scenarios merely shows my own lack of exposure or competence (pick your choice).


I like to check out what The Queen of Indexing, Kimberly Tripp, has to say on the topic:

I'm going to start with my recommendation for the Clustering Key - for a couple of reasons. First, it's an easy decision to make and second, making this decision early helps to proactively prevent some types of fragmentation. If you can prevent certain types of base-table fragmentation then you can minimize some maintenance activities (some of which, in SQL Server 2000 AND less of which, in SQL Server 2005) require that your table be offline. OK, I'll get to the rebuild stuff later.....

Let's start with the key things that I look for in a clustering key:

* Unique* Narrow* Static

Why Unique?A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes. Take for example an index in the back of a book - if you need to find the data that an index entry points to - that entry (the index entry) must be unique otherwise, which index entry would be the one you're looking for? So, when you create the clustered index - it must be unique. But, SQL Server doesn't require that your clustering key is created on a unique column. You can create it on any column(s) you'd like. Internally, if the clustering key is not unique then SQL Server will “uniquify” it by adding a 4-byte integer to the data. So if the clustered index is created on something which is not unique then not only is there additional overhead at index creation, there's wasted disk space, additional costs on INSERTs and UPDATEs, and in SQL Server 2000, there's an added cost on a clustereD index rebuild (which because of the poor choice for the clustering key is now more likely).

Source: Ever-increasing clustering key debate - again!


Do clustered indexes have to be unique?

They don't, and there are times where it's better if they're not.

Consider a table with a semi-random, unique EmployeeId, and a DepartmentId for each employee: if your select statement is

SELECT * FROM EmployeeTable WHERE DepartmentId=%DepartmentValue%

then it's best for performance if the DepartmentId is the clustered index even though (or even especially because) it's not the unique index (best for performance because it ensures that all the records within a given DepartmentId are clustered).


Do you have any references?

There's Clustered Index Design Guidelines for example, which says,

With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:

  • Can be used for frequently used queries.
  • Provide a high degree of uniqueness.
  • Can be used in range queries.

My understanding of "high degree of uniqueness" for example is that it isn't good to choose "Country" as the clusted index if most of your queries want to select the records within a given town.