How to do proper file locking on NFS? How to do proper file locking on NFS? linux linux

How to do proper file locking on NFS?


I don't see how the combination of file locks and os.replace() can make sense. When the file is replaced (that is, the directory entry is replaced), all the existing file locks (probably including file locks waiting for the locking to succeed, I'm not sure of the semantics here) and file descriptors will be against the old file, not the new one. I suspect this is the reason behind the race conditions causing you to lose some of the records in your tests.

os.replace() is a good technique to ensure that a reader doesn't read a partial update. But it doesn't work robustly in the face of multiple updaters (unless losing some of the updates is ok).

Another issues is that fcntl is a really really stupid API. In particular, the locks are bound to the process, not the file descriptor. Which means that e.g. a close() on ANY file descriptor pointing to the file will release the lock.

One way would be to use a "lock file", e.g. taking advantage of the atomicity of link(). From http://man7.org/linux/man-pages/man2/open.2.html:

Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same filesystem (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.

If it's Ok to read slightly stale data then you can use this link() dance only for a temp file that you use when updating the file and then os.replace() the "main" file you use for reading (reading can then be lockless). If not, then you need to do the link() trick for the "main" file and forget about shared/exclusive locking, all locks are then exclusive.

Addendum: One tricky thing to deal with when using lock files is what to do when a process dies for whatever reason, and leaves the lock file around. If this is to run unattended, you might want to incorporate some kind of timeout and removal of lock files (e.g. check the stat() timestamps).


Using randomly named hard links and the link counts on those files as lock files is a common strategy (E.g. this), and arguable better than using lockd but for far more information about the limits of all sorts of locks over NFS read this: http://0pointer.de/blog/projects/locking.html

You'll also find that this is a long standing standard problem for MTA software using Mbox files over NFS. Probably the best answer there was to use Maildir instead of Mbox, but if you look for examples in the source code of something like postfix, it'll be close to best practice. And if they simply don't solve that problem, that might also be your answer.


NFS is great for file sharing. It sucks as a "transmission" medium.

I've been down the NFS-for-data-transmission road multiple times. In every instance, the solution involved moving away from NFS.

Getting reliable locking is one part of the problem. The other part is the update of the file on the server and expecting the clients to receive that data at some specific point-in-time (such as before they can grab the lock).

NFS isn't designed to be a data transmission solution. There are caches and timing involved. Not to mention paging of the file content, and file metadata (e.g. the atime attribute). And client O/S'es keeping track of state locally (such as "where" to append the client's data when writing to the end of the file).

For a distributed, synchronized store, I recommend looking at a tool that does just that. Such as Cassandra, or even a general-purpose database.

If I'm reading the use-case correctly, you could also go with a simple server-based solution. Have a server listen for TCP connections, read messages from the connections, and then write each to file, serializing the writes within the server itself. There's some added complexity in having your own protocol (to know where a message starts and stops), but otherwise, it's fairly straight-forward.