Is Hive faster than Spark? Is Hive faster than Spark? hadoop hadoop

Is Hive faster than Spark?


Spark is convenient but does not handle scale all that well as regards SQL performance.

Hive has amazing support for co-partitioned joins. When the tables you were joining have hundreds of millions to billions of rows you will really appreciate the fine grained join support via:

  • similar distribute by and sort by (or cluster by)
  • bucketed joins

Hive has extensive support for metadata-only queries: Spark has only had a glimmer of it since 2.1

Spark runs out of steam quickly when the number of partitions exceeds maybe 10K+. Hive does not suffer from this limitation.


Hive is just a framework that gives sql functionality to MapReduce type workloads.

These workloads can run on mapreduce or yarn.

So comparing Hive on tez vs Hive on spark. Nice article below discussing this When to go with ETL on Hive using Tez VS When to go with Spark ETL? (Gist use Hive on spark if not sure).

Benchmark information

Lower the better


Fast forward to 2018, Hive is much faster (and more stable) than SparkSQL, especially in concurrent environments, according to the following article:

https://mr3.postech.ac.kr/blog/2018/10/31/performance-evaluation-0.4/

The article compares several SQL-on-Hadoop systems using the TPC-DS benchmark (1TB, 3TB, 10TB) using three clusters (11 nodes, 21 nodes, 42 nodes):

  • Hive-LLAP included in HDP(Hortonworks Data Platform) 2.6.4
  • Hive-LLAP included in HDP 3.0.1
  • Presto 0.203e (with cost-based optimization enabled)
  • Presto 0.208e (with cost-based optimization enabled)
  • SparkSQL 2.2.0 included in HDP 2.6.4
  • SparkSQL 2.3.1 included in HDP 3.0.1
  • Hive 3.1.0 running on top of Tez
  • Hive on Tez included in HDP 3.0.1
  • Hive 3.1.0 running on top of MR3 0.4
  • Hive 2.3.3 running on top of MR3 0.4

So, in comparison with Hive-based systems and Presto, SparkSQL is very slow and does not scale in concurrent environments. (Note that the experiment uses SparkSQL running on vanilla Spark.)