Statsmodels: Calculate fitted values and R squared
If you do not include an intercept (constant explanatory variable) in your model, statsmodels computes R-squared based on un-centred total sum of squares, ie.
tss = (ys ** 2).sum() # un-centred total sum of squares
as opposed to
tss = ((ys - ys.mean())**2).sum() # centred total sum of squares
as a result, R-squared would be much higher.
This is mathematically correct. Because, R-squared should indicate how much of the variation is explained by the full-model comparing to the reduced model. If you define your model as:
ys = beta1 . xs + beta0 + noise
then the reduced model can be: ys = beta0 + noise
, where the estimate for beta0
is the sample average, thus we have: noise = ys - ys.mean()
. That is where de-meaning comes from in a model with intercept.
But from a model like:
ys = beta . xs + noise
you may only reduce to: ys = noise
. Since noise
is assumed zero-mean, you may not de-mean ys
. Therefore, unexplained variation in the reduced model is the un-centred total sum of squares.
This is documented here under rsquared
item. Set yBar
equal to zero, and I would expect you will get the same number.
If your model is:
a = <yourmodel>.fit()
Then, to compute fitted values:
a.fittedvalues
and to compute R squared:
a.rsquared