Spark can access Hive table from pyspark but not from spark-submit

Spark 2.x

The same problem may occur in Spark 2.x if SparkSession has been created without enabling Hive support.

Spark 1.x

It is pretty simple. When you use PySpark shell, and Spark has been build with Hive support, default SQLContext implementation (the one available as a sqlContext) is HiveContext.

In your standalone application you use plain SQLContext which doesn't provide Hive capabilities.

Assuming the rest of the configuration is correct just replace:

from pyspark.sql import SQLContextsqlContext = SQLContext(sc)

with

from pyspark.sql import HiveContextsqlContext = HiveContext(sc)

python hadoop apache-spark pyspark

In Spark 2.x (Amazon EMR 5+) you will run into this issue with spark-submit if you don't enable Hive support like this:

from pyspark.sql import SparkSessionspark = SparkSession.builder.master("yarn").appName("my app").enableHiveSupport().getOrCreate()

python hadoop apache-spark pyspark

Your problem may be related to your Hive configurations. If your configurations use local metastore, the metastore_db directory gets created in the directory that you started you Hive server from.

Since spark-submit is launched from a different directory, it is creating a new metastore_db in that directory which does not contain information about your previous tables.

A quick fix would be to start the Hive server from the same directory as spark-submit and re-create your tables.

A more permanent fix is referenced in this SO Post

You need to change your configuration in $HIVE_HOME/conf/hive-site.xml

property name = javax.jdo.option.ConnectionURLproperty value = jdbc:derby:;databaseName=/home/youruser/hive_metadata/metastore_db;create=true

You should now be able to run hive from any location and still find your tables

CodeHunter

Spark can access Hive table from pyspark but not from spark-submit

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last