How do I calculate r-squared using Python and Numpy? How do I calculate r-squared using Python and Numpy? python python

How do I calculate r-squared using Python and Numpy?


A very late reply, but just in case someone needs a ready function for this:

scipy.stats.linregress

i.e.

slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)

as in @Adam Marples's answer.


From the numpy.polyfit documentation, it is fitting linear regression. Specifically, numpy.polyfit with degree 'd' fits a linear regression with the mean function

E(y|x) = p_d * x**d + p_{d-1} * x **(d-1) + ... + p_1 * x + p_0

So you just need to calculate the R-squared for that fit. The wikipedia page on linear regression gives full details. You are interested in R^2 which you can calculate in a couple of ways, the easisest probably being

SST = Sum(i=1..n) (y_i - y_bar)^2SSReg = Sum(i=1..n) (y_ihat - y_bar)^2Rsquared = SSReg/SST

Where I use 'y_bar' for the mean of the y's, and 'y_ihat' to be the fit value for each point.

I'm not terribly familiar with numpy (I usually work in R), so there is probably a tidier way to calculate your R-squared, but the following should be correct

import numpy# Polynomial Regressiondef polyfit(x, y, degree):    results = {}    coeffs = numpy.polyfit(x, y, degree)     # Polynomial Coefficients    results['polynomial'] = coeffs.tolist()    # r-squared    p = numpy.poly1d(coeffs)    # fit values, and mean    yhat = p(x)                         # or [p(z) for z in x]    ybar = numpy.sum(y)/len(y)          # or sum(y)/len(y)    ssreg = numpy.sum((yhat-ybar)**2)   # or sum([ (yihat - ybar)**2 for yihat in yhat])    sstot = numpy.sum((y - ybar)**2)    # or sum([ (yi - ybar)**2 for yi in y])    results['determination'] = ssreg / sstot    return results


From yanl (yet-another-library) sklearn.metrics has an r2_score function;

from sklearn.metrics import r2_scorecoefficient_of_dermination = r2_score(y, p(x))