Why Hive is not supporting Stored Procedure? Why Hive is not supporting Stored Procedure? hadoop hadoop

Why Hive is not supporting Stored Procedure?


First of all, Hadoop or Hive is NOT an alternative to your SQL DB. You must never consider either of these 2 to be used as a replacement of your RDBMS.

Hive was developed just to provide warehousing capabilities on top of an existing Hadoop cluster keeping in mind the large base of SQL users, both expert database designers and administrators, as well as casual users who use SQL to extract information from their data warehouses. Although it provides you a SQL like interface, it is not a SQL DB. Hive is most suited for data warehouse applications, where relatively static data is analyzed, fast response times are not required, and when the data is not changing rapidly. Simply put for offline batch processing kind of stuff.

There is nothing like stored procedures in HBase as well. But they have something called as Coprocessor which resembles stored procedures in RDBMS. To find more on Coprocessor you can go here.

And as @zsxwing has said Sqoop is just a data migration tool, nothing more. Once you switch to the NoSQL world you need to be flexible and you need to abide by the NoSQL rules.

If you could elaborate your use case a bit, maybe we can help you better.


In response to your comment :

Yes Facebook uses Hadoop and Hive and other related tool extensively. Infact Hive was developed at Facebook. But These are not the only things. Wherever they have OLTP and full transactional need, they still depend on RDBMS. One example is their Timeline feature, which uses MySQL. They have a gigantic(and awesome) pipeline which consists of a lot of things and not just Hadoop and Hive. See the picture below.

enter image description here


Hive and Hbase are not support stored procedure. However, Hive plans to support Sp (HIVE-3087) in the future. HBase has no plan about supporting Sp since it only focuses on being a Storage and more like NoSQL.

Hive UDF could implement some function of stored procedure, though it's not enough.


Hive does not have stored procedures

Hive indeed does not have any stored procedures as explained in existing answers. However, here are 2 mitigating factors:

Hive has views

Of course it is not a proper substitute for stored procedures, but with smart use of views you can perhaps remove the need for some of your procedures.

You can call hive from another program

The last time I ran into the problem that hive does not have stored procedures, I realized that the thing I wanted to do (loop over all columns) was something that I could also do in another program. As such I followed the following workflow:

  1. Run a query to get the relevant (meta) data: Python calls hive to get column names
  2. Use the information to build the query: Python takes in all column names and builds the correspondng select statements
  3. Run the resulting query: Python does a system call with hive -e
  4. Optionally, go to 2 if needed

With views and external calls, I have so far been able to work around the lack of stored procedures.