Create Hive index on complex column Create Hive index on complex column hadoop hadoop

Create Hive index on complex column


It is not possible to create a index on element of complex data type.The reason is hive does not provide the separate column to element of complex data type and indexing is only possible on column of a table.To Understand more clearly read below.

The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table. Without an index, queries with predicates like 'WHERE tab1.col1 = 10' load the entire table or partition and process all the rows. But if an index exists for col1, then only a portion of the file needs to be loaded and processed.The improvement in query speed that an index can provide comes at the cost of additional processing to create the index and disk space to store the index.

Following is the correct way to create a index on complex data type

CREATE INDEX employees_indexON TABLE employees (address)AS ‘org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler’WITH DEFERRED REBUILDIN TABLE employees_index_tablePARTITIONED BY (country,name)COMMENT ‘index based on complex column’;

Lets understand how this program will work,Suppose we write a following query,

 select * from employees where address.street='baker';

In this baker is element of address(complex type STRUCT)
(e.g street:’baker’,city:’london’,state:’XYZ’, zip:84902)

In the above example ,query will search for address.street=’baker’ in index table instead of loading the entire table

Hope you find it useful.Thank you.


We can create index only on columns not on the elements of column.

https://cwiki.apache.org/confluence/display/Hive/IndexDev

In your given sample table address is column and address.street is element of that column.

Index on address is possible..