Parallel jobs don't finish in scikit-learn's GridSearchCV Parallel jobs don't finish in scikit-learn's GridSearchCV multithreading multithreading

Parallel jobs don't finish in scikit-learn's GridSearchCV


This might be an issue with multiprocessing used by GridSearchCV if njob>1. So rather than using multiprocessing, you can try multithreading to see if it works fine.

from sklearn.externals.joblib import parallel_backendclf = GridSearchCV(...)with parallel_backend('threading'):    clf.fit(x_train, y_train)

I was having the same issue with my estimator using GSV with njob >1 and using this works great across njob values.

PS: I am not sure if "threading" would have same advantages as "multiprocessing" for all estimators. But theoretically, "threading" would not be a great choice if your estimator is limited by GIL but if the estimator is a cython/numpy based it would be better than "multiprocessing"

System tried on:

MAC OS: 10.12.6Python: 3.6numpy==1.13.3pandas==0.21.0scikit-learn==0.19.1


I believe I had similar issue and the culprit was a sudden memory usage spike. The process would try to allocate memory and immediately die because there is not enough available

If you have access to a machine with much more memory available (like 128-256GB) it is worth checking with the same or lower number of jobs (n_jobs=4) there.This is how I resolved that anyway - just moved my script to a massive server.