Issue with Spark MLLib that causes probability and prediction to be the same for everything Issue with Spark MLLib that causes probability and prediction to be the same for everything hadoop hadoop

Issue with Spark MLLib that causes probability and prediction to be the same for everything


TL;DR Ten iterations is way to low for any real life applications. On large and non-trivial datasets it can take thousand or more iterations (as well as tuning remaining parameters) to converge.

Binomial LogisticRegressionModel has summary attribute, which can give you an access to a LogisticRegressionSummary object. Among other useful metrics it contains objectiveHistory which can be used to debug training process:

import matplotlib.pyplot as pltlrm = LogisticRegression(..., family="binomial").fit(df)plt.plot(lrm.summary.objectiveHistory)plt.show()