Apache Spark: How partitions are processed in an executor Apache Spark: How partitions are processed in an executor hadoop hadoop

Apache Spark: How partitions are processed in an executor


  1. Now the number of executors that you have specified is 1 and the executor cores is 3. So on your machine only one executor will runwhich will run a maximum of 3 tasks at the same time. The executor memory specifies the amount of data Spark can cache.So out of 100 partitions on one executor at maximum 3 can be processed in parallel.

  2. We can use the repartition method to change the partitions for an RDD in spark. Also reduceByKey and some other methods have an optionto pass the number of partitions in the output RDD to be passed as an argument.

  3. I did not exactly understand your last question. But the executor cores will play the same role as mentioned above, to run tasks in parallel on one executor.