Spark job just hangs with large data Spark job just hangs with large data hadoop hadoop

Spark job just hangs with large data


Dynamic allocation and maximize resource allocation are both different settings, one would be disabled when other is active. With Maximize resource allocation in EMR, 1 executor per node is launched, and it allocates all the cores and memory to that executor.

I would recommend taking a different route. You seem to have a pretty big cluster with 51 nodes, not sure if it is even required. However, follow this rule of thumb to begin with, and you will get a hang of how to tune these configurations.

  • Cluster memory - minimum of 2X the data you are dealing with.

Now assuming 51 nodes is what you require, try below:

  • r3.4x has 16 CPUs - so you can put all of them to use by leaving one for the OS and other processes.
  • Set your number of executors to 150 - this will allocate 3 executors per node.
  • Set number of cores per executor to 5 (3 executors per node)
  • Set your executor memory to roughly total host memory/3 = 35G
  • You got to control the parallelism (default partitions), set this to number of total cores you have ~ 800
  • Adjust shuffle partitions - make this twice of number of cores - 1600

Above configurations have been working like a charm for me. You can monitor the resource utilization on Spark UI.

Also, in your yarn config /etc/hadoop/conf/capacity-scheduler.xml file, set yarn.scheduler.capacity.resource-calculator to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator - which will allow Spark to really go full throttle with those CPUs. Restart yarn service after change.


You should be increasing the executor memory and # executors, If the data is huge try increasing the Driver memory.

My suggestion is to not use the dynamic resource allocation and let it run and see if it still hangs or not (Please note that spark job can consume entire cluster resources and make other applications starve for resources try this approach when no jobs are running). if it doesn't hang that means you should play with the resource allocation, then start hardcoding the resources and keep increasing resources so that you can find the best resource allocation you can possibly use.

Below links can help you understand the resource allocation and optimization of resources.

http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/

https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz.html