How to set the YARN queue when submitting a Spark application from the Airflow SparkSubmitOperator How to set the YARN queue when submitting a Spark application from the Airflow SparkSubmitOperator hadoop hadoop

How to set the YARN queue when submitting a Spark application from the Airflow SparkSubmitOperator


I can see now that --queue value is coming from the Airflow spark-default connection:

Conn Id = spark_defaultHost = yarnExtra = {"queue": "root.default"}

Go to Admin Menu > Connections, select spark default and edit it :
Change Extra {"queue": "root.default"} to {"queue": "default"} in the Airflow WebServer UI.

This of course means an Airflow connection is required for each queue.

enter image description here


To be clear, there are at least two ways to do this:

  1. Via the Spark connection, as Phillip answered.
  2. Via the a --conf parameter, which Dustan mentions in a comment.

From my testing, if there's a queue set in the Connection's Extra field, that is used regardless of what you pass into the SparkSubmit conf.

However, if you remove queue from Extra in the Connection, and send it in the SparkSubmitOperator conf arg like below, YARN will show it properly.

conf={       "spark.yarn.queue": "team_the_best_queue",       "spark.submit.deployMode": "cluster",       "spark.whatever.configs.you.have" = "more_config",}