How to find the importance of the features for a logistic regression model?

python machine-learning scikit-learn logistic-regression

One of the simplest options to get a feeling for the "influence" of a given parameter in a linear classification model (logistic being one of those), is to consider the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.

Consider this example:

import numpy as np    from sklearn.linear_model import LogisticRegressionx1 = np.random.randn(100)x2 = 4*np.random.randn(100)x3 = 0.5*np.random.randn(100)y = (3 + x1 + x2 + x3 + 0.2*np.random.randn()) > 0X = np.column_stack([x1, x2, x3])m = LogisticRegression()m.fit(X, y)# The estimated coefficients will all be around 1:print(m.coef_)# Those values, however, will show that the second parameter# is more influentialprint(np.std(X, 0)*m.coef_)

An alternative way to get a similar result is to examine the coefficients of the model fit on standardized parameters:

m.fit(X / np.std(X, 0), y)print(m.coef_)

Note that this is the most basic approach and a number of other techniques for finding feature importance or parameter influence exist (using p-values, bootstrap scores, various "discriminative indices", etc).

I am pretty sure you would get more interesting answers at https://stats.stackexchange.com/.

CodeHunter

How to find the importance of the features for a logistic regression model?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last