Issue with Spark MLLib that causes probability and prediction to be the same for everything
TL;DR Ten iterations is way to low for any real life applications. On large and non-trivial datasets it can take thousand or more iterations (as well as tuning remaining parameters) to converge.
Binomial LogisticRegressionModel
has summary
attribute, which can give you an access to a LogisticRegressionSummary
object. Among other useful metrics it contains objectiveHistory
which can be used to debug training process:
import matplotlib.pyplot as pltlrm = LogisticRegression(..., family="binomial").fit(df)plt.plot(lrm.summary.objectiveHistory)plt.show()