Table Scan and Index Scan in SQL Table Scan and Index Scan in SQL sql sql

Table Scan and Index Scan in SQL


Most query engines have a query optimizer, which tries to generate an effective query execution strategy. If indexes are available, which can make a query faster, then the query optimizer will perform an index scan or index seek, otherwise a table scan.

Example:

SELECT * FROM tbl WHERE category_id = 5;

If there is no index on category_id then a table scan will be performed, i.e. every single record in the table will be inspected for the right category_id.

If, however, category_id is indexed the things become more complicated. If the table is very large, an index seek will probably be chosen. However, if the table is small, then the optimizer might decide that a table scan is still faster, since some overhead is required to access an index. If the category_id is not selective enough, for instance if there are only two categories, scanning the table might be faster even for big tables.

Indexes are usually organized as tree structures. Finding an item in a tree is an O(log n) operation. A table scan is an O(n) operation. The speed is mainly determined by the number of disk accesses required to perform the query. Seeking the index first and then accessing the table for the found entries can generate more disk accesses for small tables.

Let us have a look at another query:

SELECT category_id FROM tbl WHERE category_id BETWEEN 10 AND 100;

Here there is another option available. An index seek might not be faster than a table scan in this situation, but, since we are only retrieving catergory_id's an index scan (not index seek) might be even faster. An index scan reads every entry of the index table instead of taking advantage of the tree structure (what the index seek does). However, since the requested information is fully contained in the index, no access to the data table will be required. Index scan is, like the table scan an O(n) operation, but since the index is usually smaller than the table, fewer disk accesses are required to scan the index than to scan the table.

The whole matter is very complicated and depends very much on the database engine. If you want to know more, read the documentation provided by the db vendor.


Table scan means iterate over all table rows.

Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index.

Usualy index scan is less expensive than a table scan because index is more flat than a table.

They are lot of bibliografy about this issue. Sample:

Index access is an access method in which SQL Server uses an existing index to read and write data pages. Because index access significantly reduces the number of I/O read operations, it often outperforms a table scan.

In this method, a row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.


As @danihp has answered the first part of the question I'll attempt to answer the second "where is it used specifically". This is for Oracle but it holds true for most RDBMS.

Let's assume we have a table my_table, which is indexed uniquely on a column id and has a second index, which is non-unique, on the column yet_another_column:

create my_table ( id varchar2(20) not null                , another_column not null                , yet_another_column                , constraint pk_my_table primary key (id)                 );create index i_my_table on my_table ( yet_another_column );

Now, if we were to select * from my_table where id = '1' this would / should do a unique index scan of the index pk_my_table. Then we re-enter the table, using the index, to return everything in my_table where id = '1'.

If the query were, instead, select id from my_table where id = 'a' then there is no need for the second stage as all the values we need are contained within the index. In this case the query would solely do an unique index scan.

Next, if our query were select * from my_table where yet_another_column = 'y' then we have an index on the column but it is not unique so we're going to have to look through the entire index to try to find all values that match our where condition, i.e. an index scan. Once again we're select columns that are not in our index so we have to re-enter the table to get them.

Lastly, if our query were select id from my_table where another_column = 'yes'. We have no index on another_column so we have to do a table scan to find the value, i.e. we have to find everything in the table where another_column = 'yes'.

Now, there might not seem to be much of a difference, between a table scan and an index scan in these instances. We still have to go and find a value in an object in the database. However, as the index is much smaller and specially designed to be scanned ( see other answers ) it is generally much faster to do an index scan if you only want a small proportion of the rows in the table. If you want say 10% of the table then this point becomes "it depends".