Using GridSearchCV with AdaBoost and DecisionTreeClassifier Using GridSearchCV with AdaBoost and DecisionTreeClassifier python python

Using GridSearchCV with AdaBoost and DecisionTreeClassifier


There are several things wrong in the code you posted:

  1. The keys of the param_grid dictionary need to be strings. You should be getting a NameError.
  2. The key "abc__n_estimators" should just be "n_estimators": you are probably mixing this with the pipeline syntax. Here nothing tells Python that the string "abc" represents your AdaBoostClassifier.
  3. None (and not none) is not a valid value for n_estimators. The default value (probably what you meant) is 50.

Here's the code with these fixes. To set the parameters of your Tree estimator you can use the "__" syntax that allows accessing nested parameters.

from sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import AdaBoostClassifierfrom sklearn.grid_search import GridSearchCVparam_grid = {"base_estimator__criterion" : ["gini", "entropy"],              "base_estimator__splitter" :   ["best", "random"],              "n_estimators": [1, 2]             }DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)ABC = AdaBoostClassifier(base_estimator = DTC)# run grid searchgrid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

Also, 1 or 2 estimators does not really make sense for AdaBoost. But I'm guessing this is not the actual code you're running.

Hope this helps.


Trying to provide a shorter (and hopefully generic) answer.


If you want to grid search within a BaseEstimator for the AdaBoostClassifier e.g. varying the max_depth or min_sample_leaf of a DecisionTreeClassifier estimator, then you have to use a special syntax in the parameter grid.

abc = AdaBoostClassifier(base_estimator=DecisionTreeClassifier())parameters = {'base_estimator__max_depth':[i for i in range(2,11,2)],              'base_estimator__min_samples_leaf':[5,10],              'n_estimators':[10,50,250,1000],              'learning_rate':[0.01,0.1]}clf = GridSearchCV(abc, parameters,verbose=3,scoring='f1',n_jobs=-1)clf.fit(X_train,y_train)

So, note the 'base_estimator__max_depth' and 'base_estimator__min_samples_leaf' keys in the parameters dictionary. That's the way to access the hyperparameters of a BaseEstimator for an ensemble algorithm like AdaBoostClassifier when you are doing a grid search. Note the __ double underscore notation in particular. Other two keys in the parameters are the regular AdaBoostClassifier parameters.