Using spark-submit, what is the behavior of the --total-executor-cores option?

multithreading hadoop apache-spark pyspark cpu-cores

The documentation does not seem clear.

From my experience, the most common practice to allocate resources is by indicating the number of executors and the number of cores per executor, for example (taken from here):

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \--master yarn-cluster \--num-executors 10 \--driver-memory 4g \--executor-memory 2g \--executor-cores 4 \--queue thequeue \lib/spark-examples*.jar \10

However, this approach is limited to YARN, and is not applicable to standalone and mesos based Spark, according to this.

Instead, the parameter --total-executor-cores can be used, which represents the total amount of cores - of all executors - assigned to the Spark job. In your case, having a total of 40 cores, setting the attribute --total-executor-cores 40 would make use of all the available resources.

Unfortunately, I am not aware of how Spark distributes the workload when less resources than the total available are provided. If working with two or more simultaneous jobs, however, it should be transparent to the user, in that Spark (or whatever resource manager) would manage how the resources are managed depending on the user settings.

multithreading hadoop apache-spark pyspark cpu-cores

To make sure how many workers started on each slave, open web browser, type http://master-ip:8080, and see the workers section about how many workers has been started exactly, and also which worker on which slave. (I mention these above because I am not sure what do you mean by saying '4 slaves per node')

By default, spark would start exact 1 worker on each slave unless you specify SPARK_WORKER_INSTANCES=n in conf/spark-env.sh, where n is the number of worker instance you would like to start on each slave.

When you submit a spark job through spark-submit, spark would start an application driver and several executors for your job.

If not specified clearly, spark would start one executor for each worker, i.e. the total executor num equal to the total worker num, and all cores would be available to this job.
--total-executor-cores you specified would limit the total cores that is available to this application.

CodeHunter

Using spark-submit, what is the behavior of the --total-executor-cores option?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last