when to use numpy vs statistics modules when to use numpy vs statistics modules numpy numpy

when to use numpy vs statistics modules


Why does NumPy duplicate features of SciPy?

From the SciPy FAQ What is the difference between NumPy and SciPy?:

In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, etc. All numerical code would reside in SciPy. However, one of NumPy’s important goals is compatibility, so NumPy tries to retain all features supported by either of its predecessors.

It recommends using SciPy over NumPy:

In any case, SciPy contains more fully-featured versions of the linear algebra modules, as well as many other numerical algorithms. If you are doing scientific computing with Python, you should probably install both NumPy and SciPy. Most new features belong in SciPy rather than NumPy.

When should I use the statistics library?

From the statistics library documentation:

The module is not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators.

Thus I would not use it for serious (i.e. resource intensive) computation.

What is the difference between statsmodels and SciPy?

From the statsmodels about page:

The models module of scipy.stats was originally written by Jonathan Taylor. For some time it was part of scipy but was later removed. During the Google Summer of Code 2009, statsmodels was corrected, tested, improved and released as a new package. Since then, the statsmodels development team has continued to add new models, plotting tools, and statistical methods.

Thus you may have a requirement that SciPy is not able to fulfill, or is better fulfilled by a dedicated library.For example the SciPy documentation for scipy.stats.probplot notes that

Statsmodels has more extensive functionality of this type, see statsmodels.api.ProbPlot.

Thus in cases like these you will need to turn to statistical libraries beyond SciPy.