How to plot precision and recall of multiclass classifier?
From scikit-learn documentation:
Precision-recall curves are typically used in binary classification tostudy the output of a classifier. In order to extend theprecision-recall curve and average precision to multi-class ormulti-label classification, it is necessary to binarize the output.One curve can be drawn per label, but one can also draw aprecision-recall curve by considering each element of the labelindicator matrix as a binary prediction (micro-averaging).
ROC curves are typically used in binary classification to study theoutput of a classifier. In order to extend ROC curve and ROC area tomulti-class or multi-label classification, it is necessary to binarizethe output. One ROC curve can be drawn per label, but one can alsodraw a ROC curve by considering each element of the label indicatormatrix as a binary prediction (micro-averaging).
Therefore, you should binarize the output and consider precision-recall and roc curves for each class. Moreover, you are going to use predict_proba
to get class probabilities.
I divide the code into three parts:
- general settings, learning and prediction
- precision-recall curve
- ROC curve
1. general settings, learning and prediction
from sklearn.datasets import fetch_mldatafrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.multiclass import OneVsRestClassifierfrom sklearn.metrics import precision_recall_curve, roc_curvefrom sklearn.preprocessing import label_binarizeimport matplotlib.pyplot as plt#%matplotlib inlinemnist = fetch_mldata("MNIST original")n_classes = len(set(mnist.target))Y = label_binarize(mnist.target, classes=[*range(n_classes)])X_train, X_test, y_train, y_test = train_test_split(mnist.data, Y, random_state = 42)clf = OneVsRestClassifier(RandomForestClassifier(n_estimators=50, max_depth=3, random_state=0))clf.fit(X_train, y_train)y_score = clf.predict_proba(X_test)
2. precision-recall curve
# precision recall curveprecision = dict()recall = dict()for i in range(n_classes): precision[i], recall[i], _ = precision_recall_curve(y_test[:, i], y_score[:, i]) plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i)) plt.xlabel("recall")plt.ylabel("precision")plt.legend(loc="best")plt.title("precision vs. recall curve")plt.show()
3. ROC curve
# roc curvefpr = dict()tpr = dict()for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])) plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i))plt.xlabel("false positive rate")plt.ylabel("true positive rate")plt.legend(loc="best")plt.title("ROC curve")plt.show()