Using explicit (predefined) validation set for grid search with sklearn

python validation scikit-learn cross-validation

Use PredefinedSplit

ps = PredefinedSplit(test_fold=your_test_fold)

then set cv=ps in GridSearchCV

test_fold : “array-like, shape (n_samples,)
test_fold[i] gives the test set fold of sample i. A value of -1 indicates that the corresponding sample is not part of any test set folds, but will instead always be put into the training fold.

Also see here

when using a validation set, set the test_fold to 0 for all samples that are part of the validation set, and to -1 for all other samples.

python validation scikit-learn cross-validation

Consider using the hypopt Python package (pip install hypopt) for which I am an author. It's a professional package created specifically for parameter optimization with a validation set. It works with any scikit-learn model out-of-the-box and can be used with Tensorflow, PyTorch, Caffe2, etc. as well.

# Code from https://github.com/cgnorthcutt/hypopt# Assuming you already have train, test, val sets and a model.from hypopt import GridSearchparam_grid = [  {'C': [1, 10, 100], 'kernel': ['linear']},  {'C': [1, 10, 100], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']}, ]# Grid-search all parameter combinations using a validation set.opt = GridSearch(model = SVR(), param_grid = param_grid)opt.fit(X_train, y_train, X_val, y_val)print('Test Score for Optimized Parameters:', opt.score(X_test, y_test))

EDIT: I (think I) received -1's on this response because I'm suggesting a package that I authored. This is unfortunate, given that the package was created specifically to solve this type of problem.

python validation scikit-learn cross-validation

# Import Librariesfrom sklearn.model_selection import train_test_split, GridSearchCVfrom sklearn.model_selection import PredefinedSplit# Split Data to Train and ValidationX_train, X_val, y_train, y_val = train_test_split(X, y, train_size = 0.8, stratify = y,random_state = 2020)# Create a list where train data indices are -1 and validation data indices are 0split_index = [-1 if x in X_train.index else 0 for x in X.index]# Use the list to create PredefinedSplitpds = PredefinedSplit(test_fold = split_index)# Use PredefinedSplit in GridSearchCVclf = GridSearchCV(estimator = estimator,                   cv=pds,                   param_grid=param_grid)# Fit with all dataclf.fit(X, y)

CodeHunter

Using explicit (predefined) validation set for grid search with sklearn

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last