ValueError: feature_names mismatch: in xgboost in the predict() function

python pandas machine-learning regression xgboost

This is the case where the order of column-names while model building is different from order of column-names while model scoring.

I have used the following steps to overcome this error

First load the pickle file

model = pickle.load(open("saved_model_file", "rb"))

extraxt all the columns with order in which they were used

cols_when_model_builds = model.get_booster().feature_names

reorder the pandas dataframe

pd_dataframe = pd_dataframe[cols_when_model_builds]

python pandas machine-learning regression xgboost

Try converting data into ndarray before passing it to fit/predict.For eg:if your train data is train_df and test data is test_df. Use below code:

train_x = train_df.valuestest_x = test_df.values

Now fit the model:

xgb.fit(train_x,train_y)

Finally, predict:

pred = xgb.predict(test_x)

Hope this helps!

python pandas machine-learning regression xgboost

I also had this problem when i used pandas DataFrame (non-sparse representation).

I converted training and testing data into numpy ndarray.

          `X_train = X_train.as_matrix()           X_test = X_test.as_matrix()`

This how i got rid of that Error!

CodeHunter