R multicore mcfork(): Unable to fork: Cannot allocate memory

r memory amazon-ec2 multicore domc

The issue might be exactly what the error message suggests: there isn't enough memory to fork and create parallel processes.

R essentially needs to create a copy of everything that's in memory for each individual process (to my knowledge it doesn't utilize shared memory). If you are already using 51% of your RAM with a single process, then you don't have enough memory to create a second process since that would required 102% of your RAM in total.

Try:

Using fewer cores - If you were trying to use 4 cores, it's possible you have enough RAM to support 3 parallel threads, but not 4. registerDoMC(2), for example, will set the number of parallel threads to 2 (if you are using the doMC parallel backend).
Using less memory - without seeing the rest of your code, it's hard to suggest ways to accomplish this. One thing that might help is figuring out which R objects are taking up all the memory (Determining memory usage of objects?) and then removing any objects from memory that you don't need (rm(my_big_object))
Adding more RAM - if all else fails, throw hardware at it so you have more capacity.
Sticking to single threading - multithreaded processing in R is a tradeoff of CPU and memory. It sounds like in this case you may not have enough memory to support the CPU power you have, so the best course of action might be to just stick to a single core.

r memory amazon-ec2 multicore domc

R function mcfork is only a wrapper to the syscall fork (BtW, the man page says, that this call is itself a wrapper to the clone)

I created a simple C++ program to test fork's behaviour:

#include <stdio.h>#include <unistd.h>#include<vector>int main(int argc, char **argv){    printf("--beginning of program\n");    std::vector<std::vector<int> > l(50000, std::vector<int>(50000, 0));//    while (true) {}    int counter = 0;    pid_t pid = fork();    pid = fork();    pid = fork();    if (pid == 0)    {        // child process        int i = 0;        for (; i < 5; ++i)        {            printf("child process: counter=%d\n", ++counter);        }    }    else if (pid > 0)    {        // parent process        int j = 0;        for (; j < 5; ++j)        {            printf("parent process: counter=%d\n", ++counter);        }    }    else    {        // fork failed        printf("fork() failed!\n");        return 1;    }    printf("--end of program--\n");    while (true) {}    return 0;}

First, the program allocates about 8GB data on heap.Then, it spawns 2^2^2 = 8 children via fork call and waits to be killed by the user, and enters an infinite loop to be easy to spot on task manager.

Here are my observations:

For the fork to succeed, you need to have at least 51% free memory on my system, but this includes swap. You can change this by editing /proc/sys/vm/overcommit_* proc files.
As expected, none of the children take more memory, so this 51% free memory remains free throughout course of the program, and all subsequent forks also don't fail.
The memory is shared between the forks, so it gets reclaimed only after you killed the last child.

Memory fragmentation issue

You should not be concerned about any layer of memory fragmentation with respect to fork. R's memory fragmentation doesn't apply here, because fork operates on virtual memory. You shouldn't worry about fragmentation of physical memory, because virtually all modern operating systems use virtual memory (which consequently enables them to use swap). The only memory fragmentation that might be of issue is a fragmentation of virtual memory space, but AFAIK on Linux virtual memory space is 2^47 which is more than huge, and for many decades you should not have any problems with finding a continuous regions of any practical size.

Summary:

Make sure you have more swap then physical memory, and as long as your computations don't actually need more memory then you have in RAM, you can mcfork them as much as you want.

Or, if you are willing to risk stability (memory starvation) of the whole system, try echo 1 >/proc/sys/vm/overcommit_memory as root on linux.

Or better yet: (more safe)

echo 2 >/proc/sys/vm/overcommit_memoryecho 100 >/proc/sys/vm/overcommit_ratio

You can read more about overcommiting here: https://www.win.tue.nl/~aeb/linux/lk/lk-9.html

r memory amazon-ec2 multicore domc

A note for those who want to use GUI such as RStudio.
If you want to take advantage of parallel processing, it is advised not to use a GUI, as that interrupts the multithreaded processes between your code and the GUI programme. Here is an excerpt from registerDoMC package help manual on R:

The multicore functionality, originally written by Simon Urbanek and subsumed in the parallel package in R 2.14.0, provides functions for parallel execution of R code on machines with multiple cores or processors, using the system fork call to spawn copies of the current process.
The multicore functionality, and therefore registerDoMC, should not be used in a GUI environment, because multiple processes then share the same GUI.

I solved a similiar error experienced by the OP by disabling registerDoMC(cores = n) when running my program using RStudio. Multiprocessing works best with base R. Hope this helps.

CodeHunter

R multicore mcfork(): Unable to fork: Cannot allocate memory

Summary:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last