How to get feature importance in xgboost?
In your code you can get feature importance for each feature in dict form:
bst.get_score(importance_type='gain')>>{'ftr_col1': 77.21064539577829, 'ftr_col2': 10.28690566363971, 'ftr_col3': 24.225014841466294, 'ftr_col4': 11.234086283060112}
Explanation: The train() API's method get_score() is defined as:
get_score(fmap='', importance_type='weight')
- fmap (str (optional)) – The name of feature map file.
- importance_type
- ‘weight’ - the number of times a feature is used to split the data across all trees.
- ‘gain’ - the average gain across all splits the feature is used in.
- ‘cover’ - the average coverage across all splits the feature is used in.
- ‘total_gain’ - the total gain across all splits the feature is used in.
- ‘total_cover’ - the total coverage across all splits the feature is used in.
https://xgboost.readthedocs.io/en/latest/python/python_api.html
Get the table containing scores and feature names, and then plot it.
feature_important = model.get_booster().get_score(importance_type='weight')keys = list(feature_important.keys())values = list(feature_important.values())data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)data.plot(kind='barh')
For example: