Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation hadoop hadoop

Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation


Catalog implementations

There is two catalog implementations :

  • in-memory to create in-memory tables only available in the Spark session,
  • hive to create persistent tables using an external Hive Metastore.

More details here.

Metastore catalog

In the same Hive Metastore can coexist multiple catalogs.For example HDP versions from 3.1.0 to 3.1.4 use a different catalog to save Spark tables and Hive tables.
You may want to use metastore.catalog.default=hive to read Hive external tables using Spark API. The table location in HDFS must be accessible to the user running the Spark app.

HDP 3.1.4 documentation

You can get informations on access patterns according to Hive table type, read/write features and security requirements in the following links :