how to save a scikit-learn pipline with keras regressor inside to disk? how to save a scikit-learn pipline with keras regressor inside to disk? python python

how to save a scikit-learn pipline with keras regressor inside to disk?


I struggled with the same problem as there are no direct ways to do this. Here is a hack which worked for me. I saved my pipeline into two files. The first file stored a pickled object of the sklearn pipeline and the second one was used to store the Keras model:

...from keras.models import load_modelfrom sklearn.externals import joblib...pipeline = Pipeline([    ('scaler', StandardScaler()),    ('estimator', KerasRegressor(build_model))])pipeline.fit(X_train, y_train)# Save the Keras model first:pipeline.named_steps['estimator'].model.save('keras_model.h5')# This hack allows us to save the sklearn pipeline:pipeline.named_steps['estimator'].model = None# Finally, save the pipeline:joblib.dump(pipeline, 'sklearn_pipeline.pkl')del pipeline

And here is how the model could be loaded back:

# Load the pipeline first:pipeline = joblib.load('sklearn_pipeline.pkl')# Then, load the Keras model:pipeline.named_steps['estimator'].model = load_model('keras_model.h5')y_pred = pipeline.predict(X_test)


Keras is not compatible with pickle out of the box. You can fix it if you are willing to monkey patch: https://github.com/tensorflow/tensorflow/pull/39609#issuecomment-683370566.

You can also use the SciKeras library which does this for you and is a drop in replacement for KerasClassifier: https://github.com/adriangb/scikeras

Disclosure: I am the author of SciKeras as well as that PR.