Python: multiprocessing.map: If one process raises an exception, why aren't other processes' finally blocks called?
Short answer: SIGTERM
trumps finally
.
Long answer: Turn on logging with mp.log_to_stderr()
:
import randomimport multiprocessing as mpimport timeimport logginglogger=mp.log_to_stderr(logging.DEBUG)def Process(x): try: logger.info(x) time.sleep(random.random()) raise Exception('Exception: ' + x) finally: logger.info('Finally: ' + x)result=mp.Pool(3).map(Process, ['1','2','3'])
The logging output includes:
[DEBUG/MainProcess] terminating workers
Which corresponds to this code in multiprocessing.pool._terminate_pool
:
if pool and hasattr(pool[0], 'terminate'): debug('terminating workers') for p in pool: p.terminate()
Each p
in pool
is a multiprocessing.Process
, and calling terminate
(at least on non-Windows machines) calls SIGTERM:
from multiprocessing/forking.py
:
class Popen(object) def terminate(self): ... try: os.kill(self.pid, signal.SIGTERM) except OSError, e: if self.wait(timeout=0.1) is None: raise
So it comes down to what happens when a Python process in a try
suite is sent a SIGTERM
.
Consider the following example (test.py):
import time def worker(): try: time.sleep(100) finally: print('enter finally') time.sleep(2) print('exit finally') worker()
If you run it, then send it a SIGTERM
, then the process ends immediately, without entering the finally
suite, as evidenced by no output, and no delay.
In one terminal:
% test.py
In second terminal:
% pkill -TERM -f "test.py"
Result in first terminal:
Terminated
Compare that with what happens when the process is sent a SIGINT
(C-c
):
In second terminal:
% pkill -INT -f "test.py"
Result in first terminal:
enter finallyexit finallyTraceback (most recent call last): File "/home/unutbu/pybin/test.py", line 14, in <module> worker() File "/home/unutbu/pybin/test.py", line 8, in worker time.sleep(100) KeyboardInterrupt
Conclusion: SIGTERM
trumps finally
.
The answer from unutbu definitely explains why you get the behavior you observe. However, it should emphasized that SIGTERM is sent only because of how multiprocessing.pool._terminate_pool
is implemented. If you can avoid using Pool
, then you can get the behavior you desire. Here is a borrowed example:
from multiprocessing import Processfrom time import sleepimport randomdef f(x): try: sleep(random.random()*10) raise Exception except: print "Caught exception in process:", x # Make this last longer than the except clause in main. sleep(3) finally: print "Cleaning up process:", xif __name__ == '__main__': processes = [] for i in range(4): p = Process(target=f, args=(i,)) p.start() processes.append(p) try: for process in processes: process.join() except: print "Caught exception in main." finally: print "Cleaning up main."
After sending a SIGINT is, example output is:
Caught exception in process: 0^CCleaning up process: 0Caught exception in main.Cleaning up main.Caught exception in process: 1Caught exception in process: 2Caught exception in process: 3Cleaning up process: 1Cleaning up process: 2Cleaning up process: 3
Note that the finally
clause is ran for all processes. If you need shared memory, consider using Queue
, Pipe
, Manager
, or some external store like redis
or sqlite3
.
finally
re-raises the original exception unless you return
from it. The exception is then raised by Pool.map
and kills your entire application. The subprocesses are terminated and you see no other exceptions.
You can add a return
to swallow the exception:
def Process(x): try: print x sleep(random.random()) raise Exception('Exception: ' + x) finally: print 'Finally: ' + x return
Then you should have None
in your map
result when an exception occurred.