"ValueError: max_features must be in (0, n_features] " in scikit when using random forest
So I managed to solve the problem!!! :)In scikit page says:
*If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.*
My value:
- List item
n_features=20. This is in int. It is the number of features that I have in my dataset.
max_features: this is the number of features that I want to use. But they are in int so I have to turn them into float
To turn it into float I have to use the formula that is in scikit:
int(max_features * n_features)int(x * 20)=2x=0.1
We have to assume that I want to use only 2 features from the 20.
x is the percentage in float
I changed the value in max_features from int to float. Just like this:
max_features:
(int) (float)
20 = 1.0
15 = 0.75
10 = 0.5
5 = 0.25
2 = 0.1
EXAMPLE
#Instead of: clf = make_pipeline(preprocessing.RobustScaler(), RandomForestClassifier(n_estimators = 100, max_features=5, n_jobs=-1)) #I did:clf = make_pipeline(preprocessing.RobustScaler(), RandomForestClassifier(n_estimators = 100, max_features=0.25, n_jobs=-1))