Python wait until data is in sys.stdin Python wait until data is in sys.stdin python python

Python wait until data is in sys.stdin


The following should just work.

import sysfor line in sys.stdin:    # whatever

Rationale:

The code will iterate over lines in stdin as they come in. If the stream is still open, but there isn't a complete line then the loop will hang until either a newline character is encountered (and the whole line returned) or the stream is closed (and the whatever is left in the buffer is returned).

Once the stream has been closed, no more data can be written to or read from stdin. Period.

The reason that your code was overloading your cpu is that once the stdin has been closed any subsequent attempts to iterate over stdin will return immediately without doing anything. In essence your code was equivalent to the following.

for line in sys.stdin:    # do somethingwhile 1:    pass # infinite loop, very CPU intensive

Maybe it would be useful if you posted how you were writing data to stdin.

EDIT:

Python will (for the purposes of for loops, iterators and readlines() consider a stream closed when it encounters an EOF character. You can ask python to read more data after this, but you cannot use any of the previous methods. The python man page recommends using

import syswhile True:    line = sys.stdin.readline()    # do something with line

When an EOF character is encountered readline will return an empty string. The next call to readline will function as normal if the stream is still open. You can test this out yourself by running the command in a terminal. Pressing ctrl+D will cause a terminal to write the EOF character to stdin. This will cause the first program in this post to terminate, but the last program will continue to read data until the stream is actually closed. The last program should not 100% your CPU as readline will wait until there is data to return rather than returning an empty string.

I only have the problem of a busy loop when I try readline from an actual file. But when reading from stdin, readline happily blocks.


This actually works flawlessly (i.e. no runnaway CPU) - when you call the script from the shell, like so:

tail -f input-file | yourscript.py

Obviously, that is not ideal - since you then have to write all relevant stdout to that file -

but it works without a lot of overhead!Namely because of using readline() - I think:

while 1:        line = sys.stdin.readline()

It will actually stop and wait at that line until it gets more input.

Hope this helps someone!


I've come back to problem after a long time. The issue appears to be that Apache treats a CustomLog like a file -- something it can open, write to, close, and then reopen at a later date. This causes the receiving process to be told that it's input stream has been closed. However, that doesn't mean the processes input stream cannot be written to again, just that whichever process was writing to the input stream will not be writing to it again.

The best way to deal with this is to setup a handler and let the OS know to invoke the handler whenever input is written to standard input. Normally you should avoid heavily relying on OS signal event handling as they are relatively expensive. However, copying a megabyte of text to following only produced two SIGIO events, so it's okay in this case.

fancyecho.py

import sysimport osimport signalimport fcntlimport threadingio_event = threading.Event()# Event handlers should generally be as compact as possible.# Here all we do is notify the main thread that input has been received.def handle_io(signal, frame):    io_event.set()# invoke handle_io on a SIGIO eventsignal.signal(signal.SIGIO, handle_io)# send io events on stdin (fd 0) to our process assert fcntl.fcntl(0, fcntl.F_SETOWN, os.getpid()) == 0# tell the os to produce SIGIO events when data is written to stdinassert fcntl.fcntl(0, fcntl.F_SETFL, os.O_ASYNC) == 0print("pid is:", os.getpid())while True:    data = sys.stdin.read()    io_event.clear()    print("got:", repr(data))    io_event.wait()

How you might use this toy program. Output has been cleaned up due to interleaving of input and output.

$ echo test | python3 fancyecho.py &[1] 25487pid is: 25487got: 'test\n'$ echo data > /proc/25487/fd/0got: 'data\n'$