spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
Although You haven't mentioned which version of spark you are using... you can try below....
import jars to both driver & executor. So, you need to edit conf/spark-defaults.conf
adding both lines below.
spark.driver.extraClassPath /home/hadoop/ojdbc7.jarspark.executor.extraClassPath /home/hadoop/ojdbc7.jar
or
you can try to pass while submitting job like below example :
--conf spark.driver.extraClassPath /home/hadoop/ojdbc7.jar--conf spark.executor.extraClassPath /home/hadoop/ojdbc7.jar
add codes below to your_spark_home_path/conf/spark-defaults.conf,'/opt/modules/extraClass/' is dir where i put extra jars:
spark.driver.extraClassPath = /opt/modules/extraClass/jodbc7.jarspark.executor.extraClassPath = /opt/modules/extraClass/jodbc7.jar
or you can simple add jodbc7.jar to your_spark_home_path/jars.
I was having the exact same problem on an AWS EMR cluster (emr-5.31.0).
Setting spark.driver.extraClassPath
and spark.executor.extraClassPath
in SparkSession.builder.config()
, or spark-defaults.conf
, or with the spark-submit --jars
command to the location of the jodbc6.jar
did not work.
I finally got it to work by passing the Maven coordinates to spark.jars.packages
and then I also had to set spark.driver.extraClassPath
and spark.executor.extraClassPath
to $HOME/.ivy2/jars/*
.
import osfrom pyspark.sql import SparkSessionspark_packages_list = [ 'io.delta:delta-core_2.11:0.6.1', 'com.oracle.database.jdbc:ojdbc6:11.2.0.4',]spark_packages = ",".join(spark_packages_list)home = os.getenv("HOME")spark = ( SparkSession .builder .config("spark.jars.packages", spark_packages) .config('spark.driver.extraClassPath', f"{home}/.ivy2/jars/*") .config('spark.executor.extraClassPath', f"{home}/.ivy2/jars/*"))
Then the following worked (change parameters accordingly):
host = "111.111.111.111"port = "1234"schema = "YourSchema"URL = f"jdbc:oracle:thin:@{host}:{port}/{schema}"with open(f"{home}/username.file", "r") as f: username = f.read()with open(f"{home}/password.file", "r") as f: password = f.read()query = "SELECT * FROM YourTable"df = (spark.read.format("jdbc") .option("url", URL) .option("query", query) .option("user", username) .option("password", password) .load())df.printSchema()df.show()
OR
properties = { "user": username, "password": password,}df = spark.read.jdbc( url=URL, table="YourTable", properties=properties, )df.printSchema()df.show()