How to get feature importance in xgboost? How to get feature importance in xgboost? python python

How to get feature importance in xgboost?


In your code you can get feature importance for each feature in dict form:

bst.get_score(importance_type='gain')>>{'ftr_col1': 77.21064539577829,   'ftr_col2': 10.28690566363971,   'ftr_col3': 24.225014841466294,   'ftr_col4': 11.234086283060112}

Explanation: The train() API's method get_score() is defined as:

get_score(fmap='', importance_type='weight')

  • fmap (str (optional)) – The name of feature map file.
  • importance_type
    • ‘weight’ - the number of times a feature is used to split the data across all trees.
    • ‘gain’ - the average gain across all splits the feature is used in.
    • ‘cover’ - the average coverage across all splits the feature is used in.
    • ‘total_gain’ - the total gain across all splits the feature is used in.
    • ‘total_cover’ - the total coverage across all splits the feature is used in.

https://xgboost.readthedocs.io/en/latest/python/python_api.html


Using sklearn API and XGBoost >= 0.81:

clf.get_booster().get_score(importance_type="gain")

or

regr.get_booster().get_score(importance_type="gain")

For this to work correctly, when you call regr.fit (or clf.fit), X must be a pandas.DataFrame.


Get the table containing scores and feature names, and then plot it.

feature_important = model.get_booster().get_score(importance_type='weight')keys = list(feature_important.keys())values = list(feature_important.values())data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)data.plot(kind='barh')

For example:

enter image description here