Using randomForest package in R, how to get probabilities from classification model? Using randomForest package in R, how to get probabilities from classification model? r r

Using randomForest package in R, how to get probabilities from classification model?


model$predicted is NOT the same thing returned by predict(). If you want the probability of the TRUE or FALSE class then you must run predict(), or pass x,y,xtest,ytest like

randomForest(x,y,xtest=x,ytest=y), 

where x=out.data[, feature.cols], y=out.data[, response.col].

model$predicted returns the class based on which class had the larger value in model$votes for each record. votes, as @joran pointed out is the proportion of OOB(out of bag) ‘votes’ from the random forest, a vote only counting when the record was selected in an OOB sample. On the other hand predict() returns the true probability for each class based on votes by all the trees.

Using randomForest(x,y,xtest=x,ytest=y) functions a little differently than when passing a formula or simply randomForest(x,y), as in the example given above. randomForest(x,y,xtest=x,ytest=y) WILL return the probability for each class, this may sound a little weird, but it is found under model$test$votes, and the predicted class under model$test$predicted, which simply selects the class based on which class had the larger value in model$test$votes. Also, when using randomForest(x,y,xtest=x,ytest=y), model$predicted and model$votes have the same definition as above.

Finally, just to note, if randomForest(x,y,xtest=x,ytest=y) is used, then, in order to use predict() function the keep.forest flag should be set to TRUE.

model=randomForest(x,y,xtest=x,ytest=y,keep.forest=TRUE). prob=predict(model,x,type="prob")

prob WILL be equivalent to model$test$votes since the test data input are both x.