Logistic regression on One-hot encoding Logistic regression on One-hot encoding pandas pandas

Logistic regression on One-hot encoding


Consider the following approach:

first let's one-hot-encode all non-numeric columns:

In [220]: from sklearn.preprocessing import LabelEncoderIn [221]: x = df.select_dtypes(exclude=['number']) \                .apply(LabelEncoder().fit_transform) \                .join(df.select_dtypes(include=['number']))In [228]: xOut[228]:        status  country  city      datetime  amount601766       0        0     1  1.453916e+09     4.5669244       0        1     0  1.454109e+09     6.9

now we can use LinearRegression classifier:

In [230]: classifier.fit(x.drop('status',1), x['status'])Out[230]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)


To do a one-hot encoding in a scikit-learn project, you may find it cleaner to use the scikit-learn-contrib project category_encoders: https://github.com/scikit-learn-contrib/categorical-encoding, which includes many common categorical variable encoding methods including one-hot.