How smart is mmap? How smart is mmap? unix unix

How smart is mmap?


The presence or absence of the contents of a file in memory is much less coupled to mmap system calls than you think. When you mmap a file, it doesn't necessarily load it into memory. When you munmap it (or if the process exits), it doesn't necessarily discard the pages.

There are many different things that could trigger the contents of a file to be loaded into memory: mapping it, reading it normally, executing it, attempting to access memory that is mapped to the file. Similarily, there are different things that could cause the file's contents to be removed from memory, mostly related to the OS deciding it wants the memory for something more important.

In the two scenarios from your question, consider inserting a step between steps 1 and 2:

  • 1.5. another process allocates and uses a large amount of memory -> the mmaped file is evicted from memory to make room.

In this case the file's contents will probably have to get reloaded into memory if they are mapped again and used again in step 2.

versus:

  • 1.5. nothing happens -> the contents of the mmaped file hang around in memory.

In this case the file's contents don't need to be reloaded in step 2.

In terms of what happens to the contents of your file, your two scenarios aren't much different. It's something like this step 1.5 that would make a much more important difference.

As for a background process that is constantly accessing the file in order to ensure it's kept in memory (for example, by scanning the file and then sleeping for a short amount of time in a loop), this would of course force the file to remain in memory. but you're probably better off just letting the OS make its own decision about when to evict the file and when not to evict it.


The second process likely finds the data from the first process in the buffer cache. So in most cases the data will not be loaded again from disk. But since the buffer cache is a cache, there are no guarantees that the pages don't get evicted inbetween.

You could start a third process and use mmap(2) and mlock(2) to fix the pages in ram. But this will probably cause more trouble than it is worth.

Linux substituted the UNIX buffer cache for a page cache. But the principle is still the same. The Mac OS X equivalent is called Unified Buffer Cache (UBC).