Hive Tez reducers are running super slow
Do not have query plan yet, so maybe there is something else, but these settings definitely are limiting reducers parallelism:
set hive.exec.reducers.max=100;set hive.exec.reducers.bytes.per.reducer=1024000000;
I'd suggest to increase the number of reducers allowed and reduce bytes per reducer, this will increase parallelism on reducers:
set hive.exec.reducers.max=5000; set hive.exec.reducers.bytes.per.reducer=67108864;
Also Hive 1.2.0+ provides auto-rewrite optimization for count(distinct). Check this setting, it should be true
by default:
hive.optimize.distinct.rewrite=true;
And if the query stuck on the last reducer, then there is a skew in join keys