Multiple linear regression in Python Multiple linear regression in Python python python

Multiple linear regression in Python


sklearn.linear_model.LinearRegression will do it:

from sklearn import linear_modelclf = linear_model.LinearRegression()clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],        [t.y for t in texts])

Then clf.coef_ will have the regression coefficients.

sklearn.linear_model also has similar interfaces to do various kinds of regularizations on the regression.


Here is a little work around that I created. I checked it with R and it works correct.

import numpy as npimport statsmodels.api as smy = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]x = [     [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],     [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],     [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]     ]def reg_m(y, x):    ones = np.ones(len(x[0]))    X = sm.add_constant(np.column_stack((x[0], ones)))    for ele in x[1:]:        X = sm.add_constant(np.column_stack((ele, X)))    results = sm.OLS(y, X).fit()    return results

Result:

print reg_m(y, x).summary()

Output:

                            OLS Regression Results                            ==============================================================================Dep. Variable:                      y   R-squared:                       0.535Model:                            OLS   Adj. R-squared:                  0.461Method:                 Least Squares   F-statistic:                     7.281Date:                Tue, 19 Feb 2013   Prob (F-statistic):            0.00191Time:                        21:51:28   Log-Likelihood:                -26.025No. Observations:                  23   AIC:                             60.05Df Residuals:                      19   BIC:                             64.59Df Model:                           3                                         ==============================================================================                 coef    std err          t      P>|t|      [95.0% Conf. Int.]------------------------------------------------------------------------------x1             0.2424      0.139      1.739      0.098        -0.049     0.534x2             0.2360      0.149      1.587      0.129        -0.075     0.547x3            -0.0618      0.145     -0.427      0.674        -0.365     0.241const          1.5704      0.633      2.481      0.023         0.245     2.895==============================================================================Omnibus:                        6.904   Durbin-Watson:                   1.905Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708Skew:                          -0.849   Prob(JB):                       0.0950Kurtosis:                       4.426   Cond. No.                         38.6

pandas provides a convenient way to run OLS as given in this answer:

Run an OLS regression with Pandas Data Frame


Just to clarify, the example you gave is multiple linear regression, not multivariate linear regression refer. Difference:

The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression. Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term multivariate linear regression refers to cases where y is a vector, i.e., the same as general linear regression. The difference between multivariate linear regression and multivariable linear regression should be emphasized as it causes much confusion and misunderstanding in the literature.

In short:

  • multiple linear regression: the response y is a scalar.
  • multivariate linear regression: the response y is a vector.

(Another source.)