How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe? [closed] How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe? [closed] mysql mysql

How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe? [closed]


From pySpark, it work for me :

dataframe_mysql = mySqlContext.read.format("jdbc").options(    url="jdbc:mysql://localhost:3306/my_bd_name",    driver = "com.mysql.jdbc.Driver",    dbtable = "my_tablename",    user="root",    password="root").load()


With spark 2.0.x,you can use DataFrameReader and DataFrameWriter.Use SparkSession.read to access DataFrameReader and use Dataset.write to access DataFrameWriter.

Suppose using spark-shell.

read example

val prop=new java.util.Properties()prop.put("user","username")prop.put("password","yourpassword")val url="jdbc:mysql://host:port/db_name"val df=spark.read.jdbc(url,"table_name",prop) df.show()

read example 2

val jdbcDF = spark.read  .format("jdbc")  .option("url", "jdbc:mysql:dbserver")  .option("dbtable", "schema.tablename")  .option("user", "username")  .option("password", "password")  .load()

from spark doc

read example3

If you want to read data from a query result rather than a table.

val sql="""select * from db.your_table where id>1"""val jdbcDF = spark.read  .format("jdbc")  .option("url", "jdbc:mysql:dbserver")  .option("dbtable",  s"( $sql ) t")  .option("user", "username")  .option("password", "password")  .load()

write example

import org.apache.spark.sql.SaveModeval prop=new java.util.Properties()prop.put("user","username")prop.put("password","yourpassword")val url="jdbc:mysql://host:port/db_name"//df is a dataframe contains the data which you want to write.df.write.mode(SaveMode.Append).jdbc(url,"table_name",prop)

中文版戳我


Using Scala, this worked for me :Use the commands below:

sudo -u root spark-shell --jars /mnt/resource/lokeshtest/guava-12.0.1.jar,/mnt/resource/lokeshtest/hadoop-aws-2.6.0.jar,/mnt/resource/lokeshtest/aws-java-sdk-1.7.3.jar,/mnt/resource/lokeshtest/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar --packages com.databricks:spark-csv_2.10:1.2.0import org.apache.spark.sql.SQLContextval sqlcontext = new org.apache.spark.sql.SQLContext(sc)val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:mysql://Public_IP:3306/DB_NAME").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "tblage").option("user", "sqluser").option("password", "sqluser").load()dataframe_mysql.show()