Multiple linear regression in Python
sklearn.linear_model.LinearRegression
will do it:
from sklearn import linear_modelclf = linear_model.LinearRegression()clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts], [t.y for t in texts])
Then clf.coef_
will have the regression coefficients.
sklearn.linear_model
also has similar interfaces to do various kinds of regularizations on the regression.
Here is a little work around that I created. I checked it with R and it works correct.
import numpy as npimport statsmodels.api as smy = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]x = [ [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5], [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5], [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4] ]def reg_m(y, x): ones = np.ones(len(x[0])) X = sm.add_constant(np.column_stack((x[0], ones))) for ele in x[1:]: X = sm.add_constant(np.column_stack((ele, X))) results = sm.OLS(y, X).fit() return results
Result:
print reg_m(y, x).summary()
Output:
OLS Regression Results ==============================================================================Dep. Variable: y R-squared: 0.535Model: OLS Adj. R-squared: 0.461Method: Least Squares F-statistic: 7.281Date: Tue, 19 Feb 2013 Prob (F-statistic): 0.00191Time: 21:51:28 Log-Likelihood: -26.025No. Observations: 23 AIC: 60.05Df Residuals: 19 BIC: 64.59Df Model: 3 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.]------------------------------------------------------------------------------x1 0.2424 0.139 1.739 0.098 -0.049 0.534x2 0.2360 0.149 1.587 0.129 -0.075 0.547x3 -0.0618 0.145 -0.427 0.674 -0.365 0.241const 1.5704 0.633 2.481 0.023 0.245 2.895==============================================================================Omnibus: 6.904 Durbin-Watson: 1.905Prob(Omnibus): 0.032 Jarque-Bera (JB): 4.708Skew: -0.849 Prob(JB): 0.0950Kurtosis: 4.426 Cond. No. 38.6
pandas
provides a convenient way to run OLS as given in this answer:
Just to clarify, the example you gave is multiple linear regression, not multivariate linear regression refer. Difference:
The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression. Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term multivariate linear regression refers to cases where y is a vector, i.e., the same as general linear regression. The difference between multivariate linear regression and multivariable linear regression should be emphasized as it causes much confusion and misunderstanding in the literature.
In short:
- multiple linear regression: the response y is a scalar.
- multivariate linear regression: the response y is a vector.
(Another source.)