Why is it important to protect the main loop when using joblib.Parallel? Why is it important to protect the main loop when using joblib.Parallel? python python

Why is it important to protect the main loop when using joblib.Parallel?


This is necessary because Windows doesn't have fork(). Because of this limitation, Windows needs to re-import your __main__ module in all the child processes it spawns, in order to re-create the parent's state in the child. This means that if you have the code that spawns the new process at the module-level, it's going to be recursively executed in all the child processes. The if __name__ == "__main__" guard is used to prevent code at the module scope from being re-executed in the child processes.

This isn't necessary on Linux because it does have fork(), which allows it to fork a child process that maintains the same state of the parent, without re-importing the __main__ module.


In case someone stumbles across this in 2021:Due to the new backend "loky" used by joblib>0.12 protecting the main for loop is no longer required. See https://joblib.readthedocs.io/en/latest/parallel.html