python: reading subprocess output in threads
Your problem has nothing to do the subprocess
module, or threads (problematic as they are), or even mixing subprocesses and threads (a very bad idea, even worse than using threads to start with, unless you're using the backport of Python 3.2's subprocess module that you can get from code.google.com/p/python-subprocess32) or accessing the same things from multiple threads (as your print
statements do.)
What happens is that your shuffleline.py
program buffers. Not in output, but in input. Although it isn't very obvious, when you iterate over a fileobject, Python will read in blocks, usually 8k bytes. Since sys.stdin
is a fileobject, your for
loop will buffer until EOF or a full block:
for line in sys.stdin: line = line.strip() ....
If you want to not do this, either use a while loop to call sys.stdin.readline()
(which returns ''
for EOF):
while True: line = sys.stdin.readline() if not line: break line = line.strip() ...
or use the two-argument form of iter()
, which creates an iterator that calls the first argument until the second argument (the "sentinel") is returned:
for line in iter(sys.stdin.readline, ''): line = line.strip() ...
I would also be remiss if I did not suggest not using threads for this, but non-blocking I/O on the subprocess's pipes instead, or even something like twisted.reactor.spawnProcess
which has lots of ways of hooking processes and other things together as consumers and producers.