ValueError: feature_names mismatch: in xgboost in the predict() function
This is the case where the order of column-names while model building is different from order of column-names while model scoring.
I have used the following steps to overcome this error
First load the pickle file
model = pickle.load(open("saved_model_file", "rb"))
extraxt all the columns with order in which they were used
cols_when_model_builds = model.get_booster().feature_names
reorder the pandas dataframe
pd_dataframe = pd_dataframe[cols_when_model_builds]
Try converting data into ndarray before passing it to fit/predict.For eg:if your train data is train_df and test data is test_df. Use below code:
train_x = train_df.valuestest_x = test_df.values
Now fit the model:
xgb.fit(train_x,train_y)
Finally, predict:
pred = xgb.predict(test_x)
Hope this helps!
I also had this problem when i used pandas DataFrame (non-sparse representation).
I converted training and testing data into numpy ndarray
.
`X_train = X_train.as_matrix() X_test = X_test.as_matrix()`
This how i got rid of that Error!