Using sorted tables in Hive Using sorted tables in Hive hadoop hadoop

Using sorted tables in Hive


Have you checked out the effect of set hive.enforce.bucketing=true? From http://svn.apache.org/repos/asf/hive/branches/branch-0.7/conf/hive-default.xml

<property>  <name>hive.enforce.sorting</name>  <value>false</value>  <description>Whether sorting is enforced. If true, while inserting into the table, sorting is enforced. </description></property>

You may also find reading the implementation of org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer#genBucketingSortingDest useful:

http://svn.apache.org/repos/asf/hive/branches/branch-0.7/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java


hive.enforce.bucketing does not do a global sort of the data set. Instead it writes the data sorted within the buckets (in your case 8/partition). Thus it requires a global sort step to satisfy the query you are looking for.

Hope this helps,Nat


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

The CLUSTERED BY and SORTED BY creation commands do not affect howdata is inserted into a table – only how it is read. This means thatusers must be careful to insert data correctly by specifying thenumber of reducers to be equal to the number of buckets, and usingCLUSTER BY and SORT BY commands in their query.

Also look at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy