Connect to SQLite in Apache Spark
There are two options you can try
Use JDBC directly
- Open a separate, plain JDBC connection in your Spark job
- Get the tables names from the JDBC meta data
- Feed these into your
for
comprehension
Use a SQL query for the "dbtable" argument
You can specify a query as the value for the dbtable
argument. Syntactically this query must "look" like a table, so it must be wrapped in a sub query.
In that query, get the meta data from the database:
val df = sqlContext.read.format("jdbc").options( Map( "url" -> "jdbc:postgresql:xxx", "user" -> "x", "password" -> "x", "dbtable" -> "(select * from pg_tables) as t")).load()
This example works with PostgreSQL, you have to adapt it for SQLite.
Update
It seems that the JDBC driver only supports to iterate over one result set.Anyway, when you materialize the list of table names using collect()
, then the following snippet should work:
val myTableNames = metaData.select("tbl_name").map(_.getString(0)).collect()for (t <- myTableNames) { println(t.toString) val tableData = sqlContext.read.format("jdbc") .options( Map( "url" -> "jdbc:sqlite:/x.db", "dbtable" -> t)).load() tableData.show()}