Why doesn't my custom made linear regression model match sklearn?
I think you are missing the 1/m term (where m is the size of y) in the gradient descent. After including the 1/m term, I seem to get a predicted value similar to your sklearn code.
see below
....weights = np.ones((3,))m = y.sizefor boom in range(100): currentCost = cost(normalizedX, weights, y) if boom % 1 == 0: print(boom, 'iteration', weights[0], weights[1], weights[2]) print('Cost', currentCost) for i in range(47): errorDiff = h(normalizedX[i], weights) - y[i] weights[0] = weights[0] - alpha *(1/m)* (errorDiff) * normalizedX[i][0] weights[1] = weights[1] - alpha *(1/m)* (errorDiff) * normalizedX[i][1] weights[2] = weights[2] - alpha *(1/m)* (errorDiff) * normalizedX[i][2]...
this gives the firstprediction to be 355242.
This agrees well with the linear regression model even though it does not do gradient descent.
I also tried sgdregressor (uses stochastic gradient descent) in sklearn and it too seem to get a value close to linear regressor model and your model. see the code below
import numpyimport matplotlib.pyplot as plotimport pandasimport sklearnfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegression, SGDRegressordataset = pandas.read_csv('Housing.csv', header=None)x = dataset.iloc[:, :-1].valuesy = dataset.iloc[:, 2].valuessgdRegressor = SGDRegressor(penalty='none', learning_rate='constant', eta0=0.1, max_iter=1000, tol = 1E-6)xnorm = sklearn.preprocessing.scale(x)scaleCoef = sklearn.preprocessing.StandardScaler().fit(x)mean = scaleCoef.mean_std = numpy.sqrt(scaleCoef.var_)print('stf')print(std)yPrediction = []predictedX = [[(2100 - mean[0]) / std[0], (3 - mean[1]) / std[1]]]print('predictedX', predictedX)for trials in range(10): stuff = sgdRegressor.fit(xnorm, y) yPrediction.extend(sgdRegressor.predict(predictedX))print('predict', np.mean(yPrediction))
results in
predict 355533.10119985335