Hive Data to Pandas Data frame

python pandas hadoop hive

pd.read_sql() (pandas 0.24.0) takes a DB connection. Use PyHive connection directly with pandas.read_sql() as follows:

from pyhive import hiveimport pandas as pd# open connectionconn = hive.Connection(host=host,port= 20000, ...)# query the table to a new dataframedataframe = pd.read_sql("SELECT id, name FROM test.example_table", conn)

Dataframe's columns will be named after the hive table's. One can change them during/after dataframe creation if needed:

via HiveQL: SELECT id AS new_column_name ...
via columns attribute in pd.read_sql()

python pandas hadoop hive

You can try this: ( I'm pretty sure it will work)

res = cur.getSchema()description = list(col['columnName'] for col in res)  ## for getting the column names of the table headers = [x.split(".")[1] for x in description] # for splitting the list if the column name contains a perioddf= pd.DataFrame(cur.fetchall(), columns = headers)df.head(n = 20)

python pandas hadoop hive

As I had fetched data before and was trying to fetch again, so was getting empty Data Frame.

cur.execute(query)val=cur.fetchall()columnNames = [a['columnName'] for a in  cur.getSchema()]df=pd.DataFrame(data=val,columns=columnNames)#print dfreturn df

CodeHunter

Hive Data to Pandas Data frame

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last