Why doesn't my custom made linear regression model match sklearn? Why doesn't my custom made linear regression model match sklearn? numpy numpy

Why doesn't my custom made linear regression model match sklearn?


I think you are missing the 1/m term (where m is the size of y) in the gradient descent. After including the 1/m term, I seem to get a predicted value similar to your sklearn code.

see below

....weights = np.ones((3,))m = y.sizefor boom in range(100):  currentCost = cost(normalizedX, weights, y)  if boom % 1 == 0:    print(boom, 'iteration', weights[0], weights[1], weights[2])    print('Cost', currentCost)  for i in range(47):    errorDiff = h(normalizedX[i], weights) - y[i]    weights[0] = weights[0] - alpha *(1/m)* (errorDiff) * normalizedX[i][0]    weights[1] = weights[1] - alpha *(1/m)*  (errorDiff) * normalizedX[i][1]    weights[2] = weights[2] - alpha *(1/m)* (errorDiff) * normalizedX[i][2]...

this gives the firstprediction to be 355242.

This agrees well with the linear regression model even though it does not do gradient descent.

I also tried sgdregressor (uses stochastic gradient descent) in sklearn and it too seem to get a value close to linear regressor model and your model. see the code below

import numpyimport matplotlib.pyplot as plotimport pandasimport sklearnfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegression, SGDRegressordataset = pandas.read_csv('Housing.csv', header=None)x = dataset.iloc[:, :-1].valuesy = dataset.iloc[:, 2].valuessgdRegressor = SGDRegressor(penalty='none', learning_rate='constant', eta0=0.1, max_iter=1000, tol = 1E-6)xnorm = sklearn.preprocessing.scale(x)scaleCoef = sklearn.preprocessing.StandardScaler().fit(x)mean = scaleCoef.mean_std = numpy.sqrt(scaleCoef.var_)print('stf')print(std)yPrediction = []predictedX = [[(2100 - mean[0]) / std[0], (3 - mean[1]) / std[1]]]print('predictedX', predictedX)for trials in range(10):    stuff = sgdRegressor.fit(xnorm, y)    yPrediction.extend(sgdRegressor.predict(predictedX))print('predict', np.mean(yPrediction))

results in

predict 355533.10119985335