How to know when to use indexes and which type?
Well, the first question is easy:
When should a clustered index be used?
Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:
- narrow
- static (never changes)
- unique
- if ever possible: ever increasing
INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.
Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....
Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!
Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.
Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.
Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).
Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.
Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column
There is more but you could do worse than using SQL Server's missing index DMVs
An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes
Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?
A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.
Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.
As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.