How to Access Hive via Python? How to Access Hive via Python? hadoop hadoop

How to Access Hive via Python?


I believe the easiest way is to use PyHive.

To install you'll need these libraries:

pip install saslpip install thriftpip install thrift-saslpip install PyHive

Please note that although you install the library as PyHive, you import the module as pyhive, all lower-case.

If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. For Windows there are some options on GNU.org, you can download a binary installer. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install in Terminal)

After installation, you can connect to Hive like this:

from pyhive import hiveconn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

Now that you have the hive connection, you have options how to use it. You can just straight-up query:

cursor = conn.cursor()cursor.execute("SELECT cool_stuff FROM hive_table")for result in cursor.fetchall():  use_result(result)

...or to use the connection to make a Pandas dataframe:

import pandas as pddf = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)


I assert that you are using HiveServer2, which is the reason that makes the code doesn't work.

You may use pyhs2 to access your Hive correctly and the example code like that:

import pyhs2with pyhs2.connect(host='localhost',               port=10000,               authMechanism="PLAIN",               user='root',               password='test',               database='default') as conn:    with conn.cursor() as cur:        #Show databases        print cur.getDatabases()        #Execute query        cur.execute("select * from table")        #Return column info from query        print cur.getSchema()        #Fetch table results        for i in cur.fetch():            print i

Attention that you may install python-devel.x86_64 cyrus-sasl-devel.x86_64 before installing pyhs2 with pip.

Wish this can help you.

Reference: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver


Below python program should work to access hive tables from python:

import commandscmd = "hive -S -e 'SELECT * FROM db_name.table_name LIMIT 1;' "status, output = commands.getstatusoutput(cmd)if status == 0:   print outputelse:   print "error"