Read data from remote hive on spark over JDBC returns empty result
Paul Staab replied on this issue in Spark jira.Here is solution:
Create an Hive Dialect which uses the correct quotes for escaping the column names:
object HiveDialect extends JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") override def quoteIdentifier(colName: String): String = s"`$colName`"}
Register it before making the call with spark.read.jdbc
JdbcDialects.registerDialect(HiveDialect)
Execute spark.read.jdbc with fetchsize option
spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1",properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize": "10"}).show()
You should add this into your options:
.option("fetchsize", "10")