spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver oracle oracle

spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver


Although You haven't mentioned which version of spark you are using... you can try below....

import jars to both driver & executor. So, you need to edit conf/spark-defaults.conf adding both lines below.

spark.driver.extraClassPath /home/hadoop/ojdbc7.jarspark.executor.extraClassPath /home/hadoop/ojdbc7.jar

or
you can try to pass while submitting job like below example :

--conf spark.driver.extraClassPath /home/hadoop/ojdbc7.jar--conf spark.executor.extraClassPath /home/hadoop/ojdbc7.jar


add codes below to your_spark_home_path/conf/spark-defaults.conf,'/opt/modules/extraClass/' is dir where i put extra jars:

spark.driver.extraClassPath = /opt/modules/extraClass/jodbc7.jarspark.executor.extraClassPath = /opt/modules/extraClass/jodbc7.jar

or you can simple add jodbc7.jar to your_spark_home_path/jars.


I was having the exact same problem on an AWS EMR cluster (emr-5.31.0).

Setting spark.driver.extraClassPath and spark.executor.extraClassPath in SparkSession.builder.config(), or spark-defaults.conf, or with the spark-submit --jars command to the location of the jodbc6.jar did not work.

I finally got it to work by passing the Maven coordinates to spark.jars.packages and then I also had to set spark.driver.extraClassPath and spark.executor.extraClassPath to $HOME/.ivy2/jars/*.

import osfrom pyspark.sql import SparkSessionspark_packages_list = [    'io.delta:delta-core_2.11:0.6.1',    'com.oracle.database.jdbc:ojdbc6:11.2.0.4',]spark_packages = ",".join(spark_packages_list)home = os.getenv("HOME")spark = (    SparkSession    .builder    .config("spark.jars.packages", spark_packages)    .config('spark.driver.extraClassPath', f"{home}/.ivy2/jars/*")    .config('spark.executor.extraClassPath', f"{home}/.ivy2/jars/*"))

Then the following worked (change parameters accordingly):

host = "111.111.111.111"port = "1234"schema = "YourSchema"URL = f"jdbc:oracle:thin:@{host}:{port}/{schema}"with open(f"{home}/username.file", "r") as f:    username = f.read()with open(f"{home}/password.file", "r") as f:    password = f.read()query  = "SELECT * FROM YourTable"df = (spark.read.format("jdbc")    .option("url", URL)    .option("query", query)    .option("user", username)    .option("password", password)    .load())df.printSchema()df.show()

OR

properties = {    "user": username,    "password": password,}df = spark.read.jdbc(    url=URL,     table="YourTable",    properties=properties,    )df.printSchema()df.show()