How Spark RDD partitions are processed if no. of executors < no. of RDD partition

hadoop apache-spark apache-kafka spark-streaming

Will spark process 1 partition at a time on each executors or if the executor has enough memory and cores it will process more than 1 partition in parallel on each executor.

Spark will process each partition depending on the total amount of cores available to the job you're running.

Let's say your streaming job has 10 executors, each one with 2 cores. This means that you'll be able to process 10 x 2 = 20 partitions concurrently, assuming spark.task.cpus is set to 1.

If you really want the details, look inside Spark Standalone requests resources from CoarseGrainedSchedulerBackend, you can look at it's makeOffers:

private def makeOffers() {  // Filter out executors under killing  val activeExecutors = executorDataMap.filterKeys(executorIsAlive)  val workOffers = activeExecutors.map { case (id, executorData) =>    new WorkerOffer(id, executorData.executorHost, executorData.freeCores)  }.toIndexedSeq  launchTasks(scheduler.resourceOffers(workOffers))}

Key here is executorDataMap, which holds a mapping from executor id to an ExecutorData, which tells how much cores each such executor in the system is utilizing, and according to that and the preferred locality of the partition, makes an educated guess on which executor this task should run.

Here is an example from a live Spark Streaming app consuming from Kafka:

We have 5 partitions with 3 executors running, where each executor has more than 2 cores which enables the streaming to process each partition concurrently.

CodeHunter

How Spark RDD partitions are processed if no. of executors < no. of RDD partition

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last