Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database hadoop hadoop

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database


I was getting the same error while creating Data frames on Spark Shell :

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /metastore_db.

Cause:

I found that this is happening as there were multiple other instances of Spark-Shell already running and holding derby DB already, so when i was starting yet another Spark Shell and creating Data Frame on it using RDD.toDF() it was throwing error:

Solution:

I ran the ps command to find other instances of Spark-Shell:

ps -ef | grep spark-shell

and i killed them all using kill command:

kill -9 Spark-Shell-processID ( example: kill -9 4848)

after all the SPark-Shell instances were gone, i started a new SPark SHell and reran my Data frame function and it ran just fine :)


If you're running in spark shell, you shouldn't instantiate a HiveContext, there's one created automatically called sqlContext (the name is misleading - if you compiled Spark with Hive, it will be a HiveContext). See similar discussion here.

If you're not running in shell - this exception means you've created more than one HiveContext in the same JVM, which seems to be impossible - you can only create one.


Another case where you can see the same error is a Spark REPL of an AWS Glue dev endpoint, when you are trying to convert a dynamic frame into a dataframe.

There are actually several different exceptions like:

  • pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
  • ERROR XSDB6: Another instance of Derby may have already booted the database /home/glue/metastore_db.
  • java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader

The solution is hard to find with google but eventually it is described here.

The loaded REPL contains an instantiated SparkSession in a variable spark and you just need to stop it before creating a new SparkContext:

>>> spark.stop()>>> from pyspark.context import SparkContext>>> from awsglue.context import GlueContext>>>>>> glue_context = GlueContext(SparkContext.getOrCreate())>>> glue_frame = glue_context.create_dynamic_frame.from_catalog(database=DB_NAME, table_name=T_NAME)>>> df = glue_frame.toDF()