python: reading subprocess output in threads python: reading subprocess output in threads multithreading multithreading

python: reading subprocess output in threads


Your problem has nothing to do the subprocess module, or threads (problematic as they are), or even mixing subprocesses and threads (a very bad idea, even worse than using threads to start with, unless you're using the backport of Python 3.2's subprocess module that you can get from code.google.com/p/python-subprocess32) or accessing the same things from multiple threads (as your print statements do.)

What happens is that your shuffleline.py program buffers. Not in output, but in input. Although it isn't very obvious, when you iterate over a fileobject, Python will read in blocks, usually 8k bytes. Since sys.stdin is a fileobject, your for loop will buffer until EOF or a full block:

for line in sys.stdin:    line = line.strip()    ....

If you want to not do this, either use a while loop to call sys.stdin.readline() (which returns '' for EOF):

while True:    line = sys.stdin.readline()    if not line:        break    line = line.strip()    ...

or use the two-argument form of iter(), which creates an iterator that calls the first argument until the second argument (the "sentinel") is returned:

for line in iter(sys.stdin.readline, ''):    line = line.strip()    ...

I would also be remiss if I did not suggest not using threads for this, but non-blocking I/O on the subprocess's pipes instead, or even something like twisted.reactor.spawnProcess which has lots of ways of hooking processes and other things together as consumers and producers.