Performance of Apache Drill

hadoop hive impala apache-drill apache-tez

There are some performance numbers on the site http://allegro.tech/fast-data-hackathon.html.

In general, we see Drill and Impala are comparable in performance for the interactive queries with the differentiation of Drill being its ability to query without metadata definitions and its ease of use working with JSON data.

Note that these tests are on much older versions on Drill such as 0.8/0.9 (also not configured appropriately for data locality). Now Drill is 1.1 with a lot of improvements on SQL (window functions etc) and performance.

hadoop hive impala apache-drill apache-tez

You cannot do benchmark like this, it's no sense and you should never trust a such benchmark.

Everything will depend on your own data, you have JSON files ? prefer Drill. You want to query more than 1TB, prefer Hive and so on.

Also, you may consider file format, JSON, Kudu, Parquet or ORC.

Then come the optimization, Hive+Tez seems better for parrarel queries but very slow for single query. Whereas Impala is the opposite (MapReduce versus MassiveParrarelProcessing).

Also, you want to consider the hardware ressource, disk SSD or not etc..

I recommend, start with Apache Drill + JSON file, then try Apache Drill with Parquet or ORC.

If you want help, describe exactly what you have (data + hardware) and what you want.

CodeHunter

Performance of Apache Drill

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last