End of nonblocking file End of nonblocking file python python

End of nonblocking file


At least on POSIX (including Linux), the obvious answer is that nonblocking regular files don't exist. Regular files ALWAYS block, and O_NONBLOCK is silently ignored.

Similarly, poll()/select() et al. will always tell you that a fd pointing to a regular file is ready for I/O, regardless of whether the data is ready in the page cache or still on disk (mostly relevant for reading).

EDIT And, since O_NONBLOCK is a no-op for regular files, a read() on a regular file will never set errno to EAGAIN, contrary to what another answer to this question claims.

EDIT2 References:

From the POSIX (p)select() specification: "File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions."

From the POSIX poll() specification: "Regular files shall always poll TRUE for reading and writing."

The above suffices to imply that while perhaps not strictly prohibited, non-blocking regular files doesn't make sense as there would be no way to poll them except busy-waiting.

Beyond the above, there is at least some circumstantial evidence

From the POSIX open() specification: The behavior for file descriptors referring to pipes, block special files, and character special files is defined. "Otherwise, the behavior of O_NONBLOCK is unspecified."

Some related links:

http://tinyclouds.org/iocp-links.html

http://www.remlab.net/op/nonblock.shtml

http://davmac.org/davpage/linux/async-io.html

And, even here on stackoverflow:

Can regular file reading benefited from nonblocking-IO?

As the answer by R. points out, due to how page caching works, non-blocking for regular files is not very easily defined. E.g. what if by some mechanism you find out that data is ready for reading in the page cache, and then before you read it the kernel decides to kick that page out of cache due to memory pressure? It's different for things like sockets and pipes, because correctness requires that data is not discarded just like that.

Also, how would you select/poll for a seekable file descriptor? You'd need some new API that supported specifying which byte range in the file you're interested in. And the kernel implementation of that API would tie in to the VM system, as it would need to prevent the pages you're interested in from being kicked out. Which would imply that those pages would count against the process locked pages limit (see ulimit -l) in order to prevent a DOS. And, when would those pages be unlocked? And so on.


This is a really good question. Non-blocking sockets return an empty string from recv() rather than throwing a socket.error indicating that there's no data available. For files though, there doesn't seem to be any direct indicator that's available to Python.

The only mechanism I can think of for detecting EOF is to compare the current position of the file to the overall file size after receiving an empty string:

def read_nonblock( fd ):    t = os.read(fd, 4096)    if t == '':        if os.fstat(fd).st_size == os.lseek(fd, 0, os.SEEK_CUR):            raise Exception("EOF reached")    return t

This, of course, assumes that regular files in non-blocking mode will actually return immediately rather than wait for data to be read from the disk. I'm not sure if that's true on Windows or Linux. It'd be worth testing but I wouldn't be surprised if reading of regular files even in non-blocking mode only returns an empty string when the actual EOF is encountered.


A nice trick that works well in c++ (YMMV) is that if the amount of data returned is less that the size of the buffer (i.e. the buffer is not full) you can safely assume that the transaction has completed. there then is a 1/buffersize probability that the last part of the file completely fills the buffer so for a high buffer size you can be reasonable sure that the transaction will end with a non-filled buffer and so if you test the quantity of data returned against the buffer size and they are not equal you know that either an error occured or the transaction is complete. Not sure if this will translate to python but that is my method for spotting EOFs