Automating Hive with python
One quick and dirty way to do this, is to automate hive from the command line
hive -e "sql command"
Something like this should work
def query(self,cmd): """Run a hive expression""" cmd='hive -e "'+cmd+'"'; prc = subprocess.Popen(cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True) ret=stdout.split('\n') ret=[r for r in ret if len(r)] if (len(ret)==0): return [] if (ret[0].find('\t')>0): return [[t.strip() for t in r.split('\t')] for r in ret] return ret
You could also access Hive using Thrift. https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python. It looks like pyhs2 is mostly a wrapper around using Thrift directly.
One alternative is to use the pyhs2 library to open a connection to Hive natively from within a Python process. The following is some sample code I had cobbled together to test a different use case, but it should hopefully illustrate use of this library.
# Python 2.7import pyhs2from pyhs2.error import Pyhs2Exceptionhql = "SELECT * FROM my_table"with pyhs2.connect( host='localhost', port=10000, authMechanism="PLAIN", user="root" database="default" # Use your own credentials and connection info here of course) as db: with db.cursor() as cursor: try: print "Trying default database" cursor.execute(hql) for row in cursor.fetch(): print row except Pyhs2Exception as error: print(str(error))
Depending on what is or is not already installed on your box, you may need to also install the development headers for both libpython
and libsasl2
.