Why is Select Count() slower than Select in hive

sql hadoop hive

select * from table

It can be a Map only job But

Select Count(*) from table

It can be a Map and Reduce job

Hope this helps.

sql hadoop hive

There are three types of operations that a hive query can perform.

In order of cheapest and fastest to more expensive and slower here they are.

A hive query can be a metadata only request.

Show tables, describe table are examples. In these queries the hive process performs a lookup in the metadata server. The metadata server is a SQL database, probably MySQL, but the actual DB is configurable.

A hive query can be an hdfs get request.Select * from table, would be an example. In this case hive can return the results by performing an hdfs operation. hadoop fs -get, more or less.

A hive query can be a Map Reduce job.

Hive has to ship the jar to hdfs, the jobtracker queues the tasks, the tasktracker execute the tasks, the final data is put into hdfs or shipped to the client.

The Map Reduce job has different possibilities as well.

It can be a Map only job.Select * from table where id > 100 , for example all of that logic can be applied on the mapper.

It can be a Map and Reduce job,Select min(id) from table;Select * from table order by id ;

It can also lead to multiple map Reduce passes, but I think the above summarizes some behaviors.

sql hadoop hive

This is because the DB is using clustered primary keys so the query searches each row for the key individually, row by agonizing row, not from an index.

Run optimize table. This will ensure that the data pages arephysically stored in sorted order. This could conceivably speed up arange scan on a clustered primary key.
create an additional non-primary index on just the change_event_idcolumn. This will store a copy of that column in index pages which bemuch faster to scan. After creating it, check the explain plan tomake sure it's using the new index

CodeHunter

Why is Select Count() slower than Select in hive

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Why is Select Count(*) slower than Select * in hive

Recent Posts

Why is Select Count() slower than Select in hive