How to set a threshold for a sklearn classifier based on ROC results?

python scikit-learn classification threshold roc

This is what I have done:

model = SomeSklearnModel()model.fit(X_train, y_train)predict = model.predict(X_test)predict_probabilities = model.predict_proba(X_test)fpr, tpr, _ = roc_curve(y_test, predict_probabilities)

However, I am annoyed that predict chooses a threshold corresponding to 0.4% of true positives (false positives are zero). The ROC curve shows a threshold I like better for my problem where the true positives are approximately 20% (false positive around 4%). I then scan the predict_probabilities to find what probability value corresponds to my favourite ROC point. In my case this probability is 0.21. Then I create my own predict array:

predict_mine = np.where(rf_predict_probabilities > 0.21, 1, 0)

and there you go:

confusion_matrix(y_test, predict_mine)

returns what I wanted:

array([[6927,  309],       [ 621,  121]])

python scikit-learn classification threshold roc

It's difficult to provide an exact answer without any specific code examples. If you're already doing cross validation, you might consider specifying the AUC as the parameter to optimize:

shuffle = cross_validation.KFold(len(X_train), n_folds=10, shuffle=True)scores = cross_val_score(classifier, X_train, y_train, cv=shuffle, scoring='roc_auc')

CodeHunter

How to set a threshold for a sklearn classifier based on ROC results?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last