Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS numpy numpy

Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS


The three are very different but overlap in the parameter estimation for the very simple example with only one explanatory variable.

By increasing generality:

scipy.stats.linregress only handles the case of a single explanatory variable with specialized code and calculates a few extra statistics.

numpy.polynomial.polynomial.polyfit estimates the regression for a polynomial of a single variable, but doesn't return much in terms of extra statisics.

statsmodels OLS is a generic linear model (OLS) estimation class. It doesn't prespecify what the explanatory variables are and can handle any multivariate array of explanatory variables, or formulas and pandas DataFrames. It not only returns the estimated parameters, but also a large set of results staistics and methods for statistical inference and prediction.

For completeness of options for estimating linear models in Python (outside of Bayesian analysis), we should also consider scikit-learn LinearRegression and similar linear models, which are useful for selecting among a large number of explanatory variables but does not have the large number of results that statsmodels provides.


Scipy seems quite a bit faster -- this is actually the opposite of what I would have expected by the way!

x = np.random.random(100000)y = np.random.random(100000)%timeit numpy.polynomial.polynomial.polyfit(x, y, 1)100 loops, best of 3: 8.89 ms per loop%timeit scipy.stats.linregress(x,y)100 loops, best of 3: 1.67 ms per loop