How to set the YARN queue when submitting a Spark application from the Airflow SparkSubmitOperator
I can see now that --queue value is coming from the Airflow spark-default connection:
Conn Id = spark_defaultHost = yarnExtra = {"queue": "root.default"}
Go to Admin Menu > Connections, select spark default and edit it :
Change Extra {"queue": "root.default"}
to {"queue": "default"}
in the Airflow WebServer UI.
This of course means an Airflow connection is required for each queue.
To be clear, there are at least two ways to do this:
- Via the Spark connection, as Phillip answered.
- Via the a
--conf
parameter, which Dustan mentions in a comment.
From my testing, if there's a queue
set in the Connection's Extra
field, that is used regardless of what you pass into the SparkSubmit
conf.
However, if you remove queue
from Extra
in the Connection, and send it in the SparkSubmitOperator
conf arg like below, YARN will show it properly.
conf={ "spark.yarn.queue": "team_the_best_queue", "spark.submit.deployMode": "cluster", "spark.whatever.configs.you.have" = "more_config",}