Why does process creation using `clone` result in an out-of-memory failure?

The Problem

When a child process is created by the Rust call, several things happen at a C/C++ level. This is a simplification, but it will help explain the dilemma.

The streams are duplicated (with dup2 or a similar call)
The parent process is forked (with the fork or clone system call)
The forked process executes the child (with call from the execvp family)

The parent and child are now concurrent processes. The Rust call you are currently using appears to be a clone call that is behaving much like a pure fork, so you're 20G x 2 - 32G = 8G short, without considering the space needed by the operating system and anything else that might be running. The clone call is returning with a negative return value and errno is set by the call to ENOMEM errno.

If the architectural solutions of adding physical memory, compressing the data, or streaming it through a process that does not require the entirety of it to be in memory at any one time are not options, then the classic solution is reasonably simple.

Recommendation

Design the parent process to be lean. Then spawn two worker children, one that handles your 20GB need and the other that handles your 1 GB need¹. These children can be connected to one another via pipe, file, shared memory, socket, semaphore, signalling, and/or other communication mechanism(s), just as a parent and child can be.

Many mature software packages from Apache httpd to embedded cell tower routing daemons use this design pattern. It is reliable, maintainable, extensible, and portable.

The 32G would then likely suffice for the 20G and 1G processing needs, along with OS and lean parent process.

Although this solution will surely solve your problem, if the code is to be reused or extended later, there may be value in looking into potential process design changes involving data frames or multidimensional slices to support streaming of data and memory requirement reductions.

Memory Overcommit Always

Setting overcommit_memory to 1 eliminates the clone error condition referenced in the question because the Rust call calls the LINUX clone call that reads that setting. But there are several caveats with this solution that point back to the above recommendation as superior, primarily that the value of 1 is dangerous, especially for production environments.

Background

Kernel discussions about OpenBSD rfork and the clone call ensued in the late 1990s and early 2000s. The features stemming from those discussions permit less extreme forking than processes, which is symmetrically like the provision of more extensive independence between pthreads. Some of these discussions have produced extensions to the traditional process spawning that have entered POSIX standardization.

In the early 2000s, Linux Torvalds suggested a flag structure to determine what components of the execution model are shared and what are copied when execution forks, blurring the distinction between processes and threads. From this, the clone call emerged.

Over-committing memory is not discussed much if any in those threads. The design goal was MORE control of the results of a fork rather than the delegation of memory usage optimization to an operating system heuristic, which is what the default setting of overcommit_memory = 0 does.

Caveats

Memory overcommit goes beyond these extensions, adding the complexity of trade-offs of its modes², design trend caveats³, practical run time limitations⁴, and performance impacts⁵.

Portability and Longevity

Additionally, without standardization, the code using memory overcommit may not be portable, and the question of longevity is pertinent, especially when a setting controls the behavior of a function. There is no guarantee of backward compatibility or even some warning of deprication if the setting system changes.

Danger

The linuxdevcenter documentation² says, "1 always overcommits. Perhaps you now realize the danger of this mode.", and there are other indications of danger with ALWAYS overcommitting ^{6, 7}.

The implementers of overcommit on LINUX, Windows, and VMWare may guarantee reliability, but it is a statistical game that, combined with the many other complexities of process control, may lead to certain unstable characteristics under certain conditions. Even the name overcommit tells us something about its true character as a practice.

A non-default overcommit_memory mode, for which several warnings are issues, but works for the immediate trial of the immediate case may later lead to intermittent reliability.

Predictability and Its Impact on System Reliability and Response Time Consistency

The idea of a process in a UNIX like operating system, from its Bell Labs beginnings, is that a process makes a concrete requests to its container, the operating system. The result both predictable and binary. Either the request is denied or granted. Once granted, the process is given complete control and direct access over the resources until the use of it is relinquished by the process.

The swap space aspect of virtual memory is a breach of this principle that appears as gross deceleration of activity on workstations, when RAM is heavily consumed. For instance, there are times during development when one presses a key and has to wait ten seconds to see the character on the display.

Conclusion

There are many ways to get the most out of physical memory, but doing so by hoping that use of memory allocated will be sparse will likely introduce negative impacts. Performance hits from swapping when overcommit is overused is the well documented example. If you are keeping 20G of data in RAM, this may particularly be the case.

Only allocating what is needed, forking in intelligent ways, using threads, and freeing memory that is surely no longer needed lead to memory thrift without impacting reliability, creating spikes in swap disk usage, and can operate without caveat up to the limits of system resources.

The position of the designer of the Command::new call may be based on this perspective. In this case, how soon after the fork the exec is called is not a determining factor in how much memory is requested during the spawn.

Notes and References

[1] Spawning worker children may require some code refactoring and appear to be too much trouble on a superficial level, but the refactoring may be surprisingly straightforward and significantly beneficial.

[2] http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2

[3] https://www.etalabs.net/overcommit.html

[4] http://www.gabesvirtualworld.com/memory-overcommit-in-production-yes-yes-yes/

[5] https://labs.vmware.com/vmtj/memory-overcommitment-in-the-esx-server

[6] https://github.com/kubernetes/kubernetes/issues/14452

[7] http://linuxtoolkit.blogspot.com/2011_08_01_archive.html