Invalid parameter for sklearn estimator pipeline
There should be two underscores between estimator name and it's parameters in a Pipelinelogisticregression__C
. Do the same for tfidfvectorizer
See the example at http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html#sphx-glr-auto-examples-plot-compare-reduction-py
For a more general answer to using Pipeline
in a GridSearchCV
, the parameter grid for the model should start with whatever name you gave when defining the pipeline. For example:
# Pay attention to the name of the second step, i. e. 'model'pipeline = Pipeline(steps=[ ('preprocess', preprocess), ('model', Lasso())])# Define the parameter grid to be used in GridSearchparam_grid = {'model__alpha': np.arange(0, 1, 0.05)}search = GridSearchCV(pipeline, param_grid)search.fit(X_train, y_train)
In the pipeline, we used the name model
for the estimator step. So, in the grid search, any hyperparameter for Lasso regression should be given with the prefix model__
. The parameters in the grid depends on what name you gave in the pipeline. In plain-old GridSearchCV
without a pipeline, the grid would be given like this:
param_grid = {'alpha': np.arange(0, 1, 0.05)}search = GridSearchCV(Lasso(), param_grid)
You can find out more about GridSearch from this post.
Note that if you are using a pipeline with a voting classifier and a column selector, you will need multiple layers of names:
pipe1 = make_pipeline(ColumnSelector(cols=(0, 1)), LogisticRegression())pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), SVC())votingClassifier = VotingClassifier(estimators=[ ('p1', pipe1), ('p2', pipe2)])
You will need a param grid that looks like the following:
param_grid = { 'p2__svc__kernel': ['rbf', 'poly'], 'p2__svc__gamma': ['scale', 'auto'], }
p2
is the name of the pipe and svc
is the default name of the classifier you create in that pipe. The third element is the parameter you want to modify.