Using sorted tables in Hive
Have you checked out the effect of set hive.enforce.bucketing=true
? From http://svn.apache.org/repos/asf/hive/branches/branch-0.7/conf/hive-default.xml
<property> <name>hive.enforce.sorting</name> <value>false</value> <description>Whether sorting is enforced. If true, while inserting into the table, sorting is enforced. </description></property>
You may also find reading the implementation of org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer#genBucketingSortingDest
useful:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
The CLUSTERED BY and SORTED BY creation commands do not affect howdata is inserted into a table – only how it is read. This means thatusers must be careful to insert data correctly by specifying thenumber of reducers to be equal to the number of buckets, and usingCLUSTER BY and SORT BY commands in their query.
Also look at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy