How to set amount of Spark executors? How to set amount of Spark executors? java java

How to set amount of Spark executors?


You could also do it programmatically by setting the parameters "spark.executor.instances" and "spark.executor.cores" on the SparkConf object.

Example:

SparkConf conf = new SparkConf()      // 4 executor per instance of each worker       .set("spark.executor.instances", "4")      // 5 cores on each executor      .set("spark.executor.cores", "5");

The second parameter is only for YARN and standalone mode. It allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker.


In Spark 2.0+ version

use spark session variable to set number of executors dynamically (from within program)

spark.conf.set("spark.executor.instances", 4)spark.conf.set("spark.executor.cores", 4)

In above case maximum 16 tasks will be executed at any given time.

other option is dynamic allocation of executors as below -

spark.conf.set("spark.dynamicAllocation.enabled", "true")spark.conf.set("spark.executor.cores", 4)spark.conf.set("spark.dynamicAllocation.minExecutors","1")spark.conf.set("spark.dynamicAllocation.maxExecutors","5")

This was you can let spark decide on allocating number of executors based on processing and memory requirements for running job.

I feel second option works better that first option and is widely used.

Hope this will help.


OK, got it.Number of executors is not actually Spark property itself but rather driver used to place job on YARN. So as I'm using SparkSubmit class as driver and it has appropriate --num-executors parameter which is exactly what I need.

UPDATE:

For some jobs I don't follow SparkSubmit approach anymore. I cannot do it primarily for applications where Spark job is only one of application component (and is even optional). For these cases I use spark-defaults.conf attached to cluster configuration and spark.executor.instances property inside it. This approach is much more universal allowing me to balance resources properly depending on cluster (developer workstation, staging, production).