Hive Tez reducers are running super slow Hive Tez reducers are running super slow hadoop hadoop

Hive Tez reducers are running super slow


Do not have query plan yet, so maybe there is something else, but these settings definitely are limiting reducers parallelism:

set hive.exec.reducers.max=100;set hive.exec.reducers.bytes.per.reducer=1024000000;

I'd suggest to increase the number of reducers allowed and reduce bytes per reducer, this will increase parallelism on reducers:

set hive.exec.reducers.max=5000; set hive.exec.reducers.bytes.per.reducer=67108864;

Also Hive 1.2.0+ provides auto-rewrite optimization for count(distinct). Check this setting, it should be true by default:

hive.optimize.distinct.rewrite=true;

And if the query stuck on the last reducer, then there is a skew in join keys