Logistic regression on One-hot encoding
Consider the following approach:
first let's one-hot-encode all non-numeric columns:
In [220]: from sklearn.preprocessing import LabelEncoderIn [221]: x = df.select_dtypes(exclude=['number']) \ .apply(LabelEncoder().fit_transform) \ .join(df.select_dtypes(include=['number']))In [228]: xOut[228]: status country city datetime amount601766 0 0 1 1.453916e+09 4.5669244 0 1 0 1.454109e+09 6.9
now we can use LinearRegression
classifier:
In [230]: classifier.fit(x.drop('status',1), x['status'])Out[230]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
To do a one-hot encoding in a scikit-learn project, you may find it cleaner to use the scikit-learn-contrib project category_encoders: https://github.com/scikit-learn-contrib/categorical-encoding, which includes many common categorical variable encoding methods including one-hot.