Apache Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

hadoop memory memory-management apache-spark out-of-memory

All your cases use

--executor-cores 1

It is the best practice to go above 1. And don't go above 5.From our experience and from Spark developers recommendation.

E.g.http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/:

A rough guess is that at most five tasks per executor can achieve full write throughput, so it’s good to keep the number of cores per executor below that number

I can't find now reference where it was recommended to go above 1 cores per executor. But the idea is that running multiple tasks in the same executor gives you ability to share some common memory regions so it actually saves memory.

Start with --executor-cores 2, double --executor-memory (because --executor-cores tells also how many tasks one executor will run concurently), and see what it does for you. Your environment is compact in terms of available memory, so going to 3 or 4 will give you even better memory utilization.

We use Spark 1.5 and stopped using --executor-cores 1 quite some time ago as it was giving GC problems; it looks also like a Spark bug, because just giving more memory wasn't helping as much as just switching to having more tasks per container. I guess tasks in the same executor may peak its memory consumption at different times, so you don't waste/don't have to overprovision memory just to make it work.

Another benefit is that Spark's shared variables (accumulators and broadcast variables) will have just one copy per executor, not per task - so switching to multiple tasks per executor is a direct memory saving right there. Even if you don't use Spark shared variables explicitly, Spark very likely creates them internally anyway. For example, if you join two tables through Spark SQL, Spark's CBO may decide to broadcast a smaller table (or a smaller dataframe) across to make join run faster.

http://spark.apache.org/docs/latest/programming-guide.html#shared-variables

CodeHunter

Apache Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last