"ValueError: max_features must be in (0, n_features] " in scikit when using random forest "ValueError: max_features must be in (0, n_features] " in scikit when using random forest python-3.x python-3.x

"ValueError: max_features must be in (0, n_features] " in scikit when using random forest


So I managed to solve the problem!!! :)In scikit page says:

*If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.*

My value:

  • List item

n_features=20. This is in int. It is the number of features that I have in my dataset.

max_features: this is the number of features that I want to use. But they are in int so I have to turn them into float

To turn it into float I have to use the formula that is in scikit:

int(max_features * n_features)int(x * 20)=2x=0.1

We have to assume that I want to use only 2 features from the 20.

x is the percentage in float

I changed the value in max_features from int to float. Just like this:

max_features:

(int) (float)

20 = 1.0

15 = 0.75

10 = 0.5

5 = 0.25

2 = 0.1

EXAMPLE

#Instead of: clf = make_pipeline(preprocessing.RobustScaler(), RandomForestClassifier(n_estimators = 100,                    max_features=5, n_jobs=-1)) #I did:clf = make_pipeline(preprocessing.RobustScaler(), RandomForestClassifier(n_estimators = 100,                    max_features=0.25, n_jobs=-1))