Hadoop Hive slow queries Hadoop Hive slow queries hadoop hadoop

Hadoop Hive slow queries


Yes..you have misinterpreted Hadoop. Hadoop, and Hive as well, are not meant for real time stuff. They are most suitable for offline, batch processing kinda stuff. They are not at all a replacement to RDBMSs. Though you can do some fine tuning but 'absolute real time' is not possible. There a lot of things which happen under the hood when you run a hive query, which I think you are not unaware of. First of all you Hive query gets converted into a corresponding MR job followed by few other things like split creation, records generation, mapper generation etc. I would never suggest Hadoop(or Hive) if you have real time needs.

You might wanna have a look at Impala for your real time needs.


Hive is not the appropriate tool for a real-time job, but if you want to leverage the Hadoop infrastructure with real-time or fast data access take a look at HBase. It's value-add is all about fast access. Not sure why you are selecting Hadoop for your solution, but Hbase sits on top of HDFS which some people like because of the inherent redundancy HDFS offers (you copy a file on there once and it is auto-replicated) which may be one of the reasons you are looking into Hadoop.

For more info: read this question


I am not sure how new you are to hadoop.Hive does not give you results at interactive speeds how small the tables are.In case you knew this already and trying to tune the query,you can try below:

select a.*, b.country, b.city from (select * from p_country_town_hotel where hotel= 'AdriaPraha') b  inner join  (select * from p_hotel_rev_agg_period where min_date < '20130701') a   on a.key.hotel = b.hotel order by a.min_date desc   limit 10;

If you know one of the tables is small enough to fit in memory, you can try map side join.