Save python random forest model to file
...import cPicklerf = RandomForestRegresor()rf.fit(X, y)with open('path/to/file', 'wb') as f: cPickle.dump(rf, f)# in your prediction file with open('path/to/file', 'rb') as f: rf = cPickle.load(f)preds = rf.predict(new_X)
You can use joblib
to save and load the Random Forest from scikit-learn (in fact, any model from scikit-learn)
The example:
import joblibfrom sklearn.ensemble import RandomForestClassifier# create RFrf = RandomForestClassifier()# fit on some datarf.fit(X, y)# savejoblib.dump(rf, "my_random_forest.joblib")# loadloaded_rf = joblib.load("my_random_forest.joblib")
What is more, the joblib.dump
has compress
argument, so the model can be compressed. I made very simple test on iris dataset and compress=3
reduces the size of the file about 5.6 times.
I use dill, it stores all the data and I think possibly module information? Maybe not. I remember trying to use pickle
for storing these really complicated objects and it didn't work for me. cPickle
probably does the same job as dill
but i've never tried cpickle
. it looks like it works in literally the exact same way. I use "obj" extension but that's by no means conventional...It just made sense for me since I was storing an object.
import dillwd = "/whatever/you/want/your/working/directory/to/be/"rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)rf.fit(Predx, Predy)dill.dump(rf, open(wd + "filename.obj","wb"))
btw, not sure if you use iPython, but sometimes writing a file that way doesn't so you have to do the:
with open(wd + "filename.obj","wb") as f: dill.dump(rf,f)
call the objects again:
model = dill.load(open(wd + "filename.obj","rb"))