Hive: Is there a better way to percentile rank a column? Hive: Is there a better way to percentile rank a column? hadoop hadoop

Hive: Is there a better way to percentile rank a column?


Try removing one of your derived tables

select item    , characteristic    , case when characteristic <= char_perc[0] then 0        when characteristic <= char_perc[1] then 1        when characteristic <= char_perc[2] then 2        when characteristic <= char_perc[3] then 3        when characteristic <= char_perc[4] then 4        when characteristic <= char_perc[5] then 5        when characteristic <= char_perc[6] then 6        when characteristic <= char_perc[7] then 7        when characteristic <= char_perc[8] then 8        else 9      end as char_percentile_rankfrom (     select item, characteristic,         , PERCENTILE(BIGINT(characteristic),array(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)) over () as char_perc      from (       select item         , sum(characteristic) as characteristic                    from table       group by item                 ) t1) t2