Ever increasing physical memory for a Spark application in YARN Ever increasing physical memory for a Spark application in YARN hadoop hadoop

Ever increasing physical memory for a Spark application in YARN


Finally I was able to get rid of the issue. The issue was that the compressors created in Spark SQL's parquet write path weren't getting recycled and hence, my executors were creating a brand new compressor (from native memory) for every parquet write file and thus exhausting the physical memory limits.

I had opened the following bug in Parquet Jira and have raised the PR for same :-

https://issues.apache.org/jira/browse/PARQUET-353

This fixed the memory issue at my end.

P.S. - You will see this problem only in a Parquet write intensive application.