How to close a file? How to close a file? multithreading multithreading

How to close a file?


With respect to POSIX, R..'s answer to a related question is very clear and concise: close() is a non-restartable special case, and no loop should be used.

This was surprising to me, so I decided to describe my findings, followed by my conclusions and chosen solution at end.

This is not really an answer. Consider this more like the opinion of a fellow programmer, including the reasoning behind that opinion.


POSIX.1-2001 and POSIX.1-2008 describe three possible errno values that may occur: EBADF, EINTR, and EIO. The descriptor state after EINTR and EIO is "unspecified", which means it may or may not have been closed. EBADF indicates fd is not a valid descriptor. In other words, POSIX.1 clearly recommends using

    if (close(fd) == -1) {        /* An error occurred, see 'errno'. */    }

without any retry looping to close file descriptors.

(Even the Austin Group defect #519 R.. mentioned, does not help with recovering from close() errors: it leaves it unspecified whether any I/O is possible after an EINTR error, even if the descriptor itself is left open.)


For Linux, the close() syscall is defined in fs/open.c, with __do_close() in fs/file.c managing the descriptor table locking, and filp_close() back in fs/open.c taking care of the details.

In summary, the descriptor entry is removed from the table unconditionally first, followed by filesystem-specific flushing (f_op->flush()), followed by notification (dnotify/fsnotify hook), and finally by removing any record or file locks. (Most local filesystems like ext2, ext3, ext4, xfs, bfs, tmpfs, and so on, do not have ->flush(), so given a valid descriptor, close() cannot fail. Only ecryptfs, exofs, fuse, cifs, and nfs have ->flush() handlers in Linux-3.13.6, as far as I can tell.)

This does mean that in Linux, if a write error occurs in the filesystem-specific ->flush() handler during close(), there is no way to retry; the file descriptor is always closed, just like Torvalds said.

The FreeBSD close() man page describes the exact same behaviour.

Neither the OpenBSD nor the Mac OS X close() man pages describe whether the descriptor is closed in case of errors, but I believe they share the FreeBSD behaviour.


It seems clear to me that no loop is necessary or required to close a file descriptor safely. However, close() may still return an error.

errno == EBADF indicates the file descriptor was already closed. If my code encounters this unexpectedly, to me it indicates there is a significant fault in the code logic, and the process should gracefully exit; I'd rather my processes die than produce garbage.

Any other errno values indicate an error in finalizing the file state. In Linux, it is definitely an error related to flushing any remaining data to the actual storage. In particular, I can imagine ENOMEM in case there is no room to buffer the data, EIO if the data could not be sent or written to the actual device or media, EPIPE if connection to the storage was lost, ENOSPC if the storage is already full with no reservation to the unflushed data, and so on. If the file is a log file, I'd have the process report the failure and exit gracefully. If the file contents are still available in memory, I would remove (unlink) the entire file, and retry. Otherwise I'd report the failure to the user.

(Remember that in Linux and FreeBSD, you do not "leak" file descriptors in the error case; they are guaranteed to be closed even if an error occurs. I am assuming all other operating systems I might use behave the same way.)

The helper function I'll use from now on will be something like

#include <unistd.h>#include <errno.h>/** * closefd - close file descriptor and return error (errno) code * * @descriptor: file descriptor to close * * Actual errno will stay unmodified.*/static int closefd(const int descriptor){    int saved_errno, result;    if (descriptor == -1)        return EBADF;    saved_errno = errno;    result = close(descriptor);    if (result == -1)        result = errno;    errno = saved_errno;    return result;}

I know the above is safe on Linux and FreeBSD, and I assume it is safe on all other POSIX-y systems. If I encounter one that is not, I can simply replace the above with a custom version, wrapping it in a suitable #ifdef for that OS. The reason this maintains errno unchanged is just a quirk of my coding style; it makes short-circuiting error paths shorter (less repeated code).

If I am closing a file that contains important user information, I will do a fsync() or fdatasync() on it prior to closing. This ensures the data hits the storage, but also causes a delay compared to normal operation; therefore I won't do it for ordinary data files.

Unless I will be unlink()ing the closed file, I will check closefd() return value, and act accordingly. If I can easily retry, I will, but at most once or twice. For log files and generated/streamed files, I only warn the user.

I want to remind anyone reading this far that we cannot make anything completely reliable; it is just not possible. What we can do, and in my opinion should do, is to detect when an error occurs, as reliably as we can. If we can easily and with neglible resource use retry, we should. In all cases, we should make sure the notification (about the error) is propagated to the actual human user. Let the human worry about whether some other action, possibly complex, needs to be done before the operation is retried. After all, a lot of tools are used only as a part of a larger task, and the best course of action usually depends on that larger task.