java.lang.OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0

python hadoop memory apache-spark pyspark

I believe that the cause of this problem is coalesce(), which despite the fact that it avoids a full shuffle (like repartition would do), it has to shrink the data in the requested number of partitions.

Here, you are requesting all the data to fit into one partition, thus one task (and only one task) has to work with all the data, which may cause its container to suffer from memory limitations.

So, either ask for more partitions than 1, or avoid coalesce() in this case.

Otherwise, you could try the solutions provided in the links below, for increasing your memory configurations:

python hadoop memory apache-spark pyspark

The problem for me was indeed coalesce(). What I did was exporting the file not using coalesce() but parquet instead using df.write.parquet("testP"). Then read back the file and export that with coalesce(1).

Hopefully it works for you as well.

python hadoop memory apache-spark pyspark

In my case replacing the coalesce(1) with repartition(1) Worked.

CodeHunter

java.lang.OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last