RandomForestClassfier.fit(): ValueError: could not convert string to float RandomForestClassfier.fit(): ValueError: could not convert string to float python python

RandomForestClassfier.fit(): ValueError: could not convert string to float


You have to do some encoding before using fit. As it was told fit() does not accept Strings but you solve this.

There are several classes that can be used :

  • LabelEncoder : turn your string into incremental value
  • OneHotEncoder : use One-of-K algorithm to transform your String into integer

Personally I have post almost the same question on StackOverflow some time ago. I wanted to have a scalable solution but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective but if you have a lot different strings the matrix will grow very quickly and memory will be required.


LabelEncoding worked for me (basically you've to encode your data feature-wise)(mydata is a 2d array of string datatype):

myData=np.genfromtxt(filecsv, delimiter=",", dtype ="|a20" ,skip_header=1);from sklearn import preprocessingle = preprocessing.LabelEncoder()for i in range(*NUMBER OF FEATURES*):    myData[:,i] = le.fit_transform(myData[:,i])


You can't pass str to your model fit() method. as it mentioned here

The training input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

Try transforming your data to float and give a try to LabelEncoder.