Logistic regression on One-hot encoding

python pandas machine-learning regression one-hot-encoding

Consider the following approach:

first let's one-hot-encode all non-numeric columns:

In [220]: from sklearn.preprocessing import LabelEncoderIn [221]: x = df.select_dtypes(exclude=['number']) \                .apply(LabelEncoder().fit_transform) \                .join(df.select_dtypes(include=['number']))In [228]: xOut[228]:        status  country  city      datetime  amount601766       0        0     1  1.453916e+09     4.5669244       0        1     0  1.454109e+09     6.9

now we can use LinearRegression classifier:

In [230]: classifier.fit(x.drop('status',1), x['status'])Out[230]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

python pandas machine-learning regression one-hot-encoding

To do a one-hot encoding in a scikit-learn project, you may find it cleaner to use the scikit-learn-contrib project category_encoders: https://github.com/scikit-learn-contrib/categorical-encoding, which includes many common categorical variable encoding methods including one-hot.

CodeHunter

Logistic regression on One-hot encoding

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last