confidence and prediction intervals with StatsModels confidence and prediction intervals with StatsModels python python

confidence and prediction intervals with StatsModels


For test data you can try to use the following.

predictions = result.get_prediction(out_of_sample_df)predictions.summary_frame(alpha=0.05)

I found the summary_frame() method buried here and you can find the get_prediction() method here. You can change the significance level of the confidence interval and prediction interval by modifying the "alpha" parameter.

I am posting this here because this was the first post that comes up when looking for a solution for confidence & prediction intervals – even though this concerns itself with test data rather.

Here's a function to take a model, new data, and an arbitrary quantile, using this approach:

def ols_quantile(m, X, q):  # m: OLS model.  # X: X matrix.  # q: Quantile.  #  # Set alpha based on q.  a = q * 2  if q > 0.5:    a = 2 * (1 - q)  predictions = m.get_prediction(X)  frame = predictions.summary_frame(alpha=a)  if q > 0.5:    return frame.obs_ci_upper  return frame.obs_ci_lower


update see the second answer which is more recent. Some of the models and results classes have now a get_prediction method that provides additional information including prediction intervals and/or confidence intervals for the predicted mean.

old answer:

iv_l and iv_u give you the limits of the prediction interval for each point.

Prediction interval is the confidence interval for an observation and includes the estimate of the error.

I think, confidence interval for the mean prediction is not yet available in statsmodels.(Actually, the confidence interval for the fitted values is hiding inside the summary_table of influence_outlier, but I need to verify this.)

Proper prediction methods for statsmodels are on the TODO list.

Addition

Confidence intervals are there for OLS but the access is a bit clumsy.

To be included after running your script:

from statsmodels.stats.outliers_influence import summary_tablest, data, ss2 = summary_table(re, alpha=0.05)fittedvalues = data[:, 2]predict_mean_se  = data[:, 3]predict_mean_ci_low, predict_mean_ci_upp = data[:, 4:6].Tpredict_ci_low, predict_ci_upp = data[:, 6:8].T# Check we got the right thingsprint np.max(np.abs(re.fittedvalues - fittedvalues))print np.max(np.abs(iv_l - predict_ci_low))print np.max(np.abs(iv_u - predict_ci_upp))plt.plot(x, y, 'o')plt.plot(x, fittedvalues, '-', lw=2)plt.plot(x, predict_ci_low, 'r--', lw=2)plt.plot(x, predict_ci_upp, 'r--', lw=2)plt.plot(x, predict_mean_ci_low, 'r--', lw=2)plt.plot(x, predict_mean_ci_upp, 'r--', lw=2)plt.show()

enter image description here

This should give the same results as SAS, http://jpktd.blogspot.ca/2012/01/nice-thing-about-seeing-zeros.html


summary_frame and summary_table work well when you need exact results for a single quantile, but don't vectorize well. This will provide a normal approximation of the prediction interval (not confidence interval) and works for a vector of quantiles:

def ols_quantile(m, X, q):  # m: Statsmodels OLS model.  # X: X matrix of data to predict.  # q: Quantile.  #  from scipy.stats import norm  mean_pred = m.predict(X)  se = np.sqrt(m.scale)  return mean_pred + norm.ppf(q) * se