Normal equation and Numpy 'least-squares', 'solve' methods difference in regression? Normal equation and Numpy 'least-squares', 'solve' methods difference in regression? numpy numpy

Normal equation and Numpy 'least-squares', 'solve' methods difference in regression?


Don't calculate matrix inverse to solve linear systems

The professional algorithms don't solve for the matrix inverse. It's slow and introduces unnecessary error. It's not a disaster for small systems, but why do something suboptimal?

Basically anytime you see the math written as:

x = A^-1 * b

you instead want:

x = np.linalg.solve(A, b)

In you case, you want something like:

XtX_lamb = X.T.dot(X) + lamb * IdentityMatrixXtY = X.T.dot(Y)x = np.linalg.solve(XtX_lamb, XtY);


As @Matthew Gunn mentioned, it's bad practice to compute the explicit inverse of your coefficient matrix as a means to solve linear systems of equations. It's faster and more accurate to obtain the solution directly (see here).

The reason why you see differences between np.linalg.solve and np.linalg.lstsq is because these functions make different assumptions about the system you are trying to solve, and use different numerical methods.

  • Under the hood, solve calls the DGESV LAPACK routine, which uses LU factorization, followed by forward and backward substitution to find an exact solution to Ax = b. It requires that the system is exactly determined, i.e. that A is square and of full rank.

  • lstsq instead calls DGELSD, which uses the singular value decomposition of A in order to find a least-squares solution. This also works in overdetermined and underdetermined cases.

If your system is fully determined then you should use solve since it requires fewer floating point operations, and will therefore be faster and more precise. In your case, XtX_lamb is guaranteed to be full rank because of the regularisation step.


The other answers define why in theory one calculation method is better than to other. However they don't give a way to test which solution actually shows better results. Here it is:

def test(a, x, b):    res = a.dot(x).as_matrix() - b.as_matrix()    print(np.linalg.norm(res))test(XtX_lamb, x, XtY)test(XtX_lamb, th, XtY)test(XtX_lamb, theta, XtY)

This calcluates the norm2 of the error vector of the linear system.Results are:

np.linalg.solve - 0.000488340357871np.linalg.lstsq - 1.75520748498normal equation - 16.1628614202

Thus linalg.solve indeed show the most accurate result.