GridSearchCV - XGBoost - Early Stopping
When using early_stopping_rounds
you also have to give eval_metric
and eval_set
as input parameter for the fit method. Early stopping is done via calculating the error on an evaluation set. The error has to decrease every early_stopping_rounds
otherwise the generation of additional trees is stopped early.
See the documentation of xgboosts fit method for details.
Here you see a minimal fully working example:
import xgboost as xgbfrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import TimeSeriesSplitcv = 2trainX= [[1], [2], [3], [4], [5]]trainY = [1, 2, 3, 4, 5]# these are the evaluation setstestX = trainX testY = trainYparamGrid = {"subsample" : [0.5, 0.8]}fit_params={"early_stopping_rounds":42, "eval_metric" : "mae", "eval_set" : [[testX, testY]]}model = xgb.XGBRegressor()gridsearch = GridSearchCV(model, paramGrid, verbose=1 , fit_params=fit_params, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX,trainY]))gridsearch.fit(trainX,trainY)
An update to @glao's answer and a response to @Vasim's comment/question, as of sklearn 0.21.3 (note that fit_params
has been moved out of the instantiation of GridSearchCV
and been moved into the fit()
method; also, the import specifically pulls in the sklearn wrapper module from xgboost):
import xgboost.sklearn as xgbfrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import TimeSeriesSplitcv = 2trainX= [[1], [2], [3], [4], [5]]trainY = [1, 2, 3, 4, 5]# these are the evaluation setstestX = trainX testY = trainYparamGrid = {"subsample" : [0.5, 0.8]}fit_params={"early_stopping_rounds":42, "eval_metric" : "mae", "eval_set" : [[testX, testY]]}model = xgb.XGBRegressor()gridsearch = GridSearchCV(model, paramGrid, verbose=1, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]))gridsearch.fit(trainX, trainY, **fit_params)