How to tune parameters in Random Forest, using Scikit Learn?

python parameters machine-learning scikit-learn random-forest

From my experience, there are three features worth exploring with the sklearn RandomForestClassifier, in order of importance:

n_estimators
max_features
criterion

n_estimators is not really worth optimizing. The more estimators you give it, the better it will do. 500 or 1000 is usually sufficient.

max_features is worth exploring for many different values. It may have a large impact on the behavior of the RF because it decides how many features each tree in the RF considers at each split.

criterion may have a small impact, but usually the default is fine. If you have the time, try it out.

Make sure to use sklearn's GridSearch (preferably GridSearchCV, but your data set size is too small) when trying out these parameters.

If I understand your question correctly, though, you only have 9 samples and 3 classes? Presumably 3 samples per class? It's very, very likely that your RF is going to overfit with that little amount of data, unless they are good, representative records.

python parameters machine-learning scikit-learn random-forest

The crucial parts are usually three elements:

number of estimators - usually bigger the forest the better, there is small chance of overfitting here
max depth of each tree (default none, leading to full tree) - reduction of the maximum depth helps fighting with overfitting
max features per split (default sqrt(d)) - you might one to play around a bit as it significantly alters behaviour of the whole tree. sqrt heuristic is usually a good starting point but an actual sweet spot might be somewhere else

python parameters machine-learning scikit-learn random-forest

This wonderful article has a detailed explanation of tunable parameters, how to track performance vs speed trade-off, some practical tips, and how to perform grid-search.

CodeHunter

How to tune parameters in Random Forest, using Scikit Learn?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last