How to get faster code than numpy.dot for matrix multiplication? How to get faster code than numpy.dot for matrix multiplication? python python

How to get faster code than numpy.dot for matrix multiplication?


np.dot dispatches to BLAS when

  • NumPy has been compiled to use BLAS,
  • a BLAS implementation is available at run-time,
  • your data has one of the dtypes float32, float64, complex32 or complex64, and
  • the data is suitably aligned in memory.

Otherwise, it defaults to using its own, slow, matrix multiplication routine.

Checking your BLAS linkage is described here. In short, check whether there's a file _dotblas.so or similar in your NumPy installation. When there is, check which BLAS library it's linked against; the reference BLAS is slow, ATLAS is fast, OpenBLAS and vendor-specific versions such as Intel MKL are even faster. Watch out with multithreaded BLAS implementations as they don't play nicely with Python's multiprocessing.

Next, check your data alignment by inspecting the flags of your arrays. In versions of NumPy before 1.7.2, both arguments to np.dot should be C-ordered. In NumPy >= 1.7.2, this doesn't matter as much anymore as special cases for Fortran arrays have been introduced.

>>> X = np.random.randn(10, 4)>>> Y = np.random.randn(7, 4).T>>> X.flags  C_CONTIGUOUS : True  F_CONTIGUOUS : False  OWNDATA : True  WRITEABLE : True  ALIGNED : True  UPDATEIFCOPY : False>>> Y.flags  C_CONTIGUOUS : False  F_CONTIGUOUS : True  OWNDATA : False  WRITEABLE : True  ALIGNED : True  UPDATEIFCOPY : False

If your NumPy is not linked against BLAS, either (easy) re-install it, or (hard) use the BLAS gemm (generalized matrix multiply) function from SciPy:

>>> from scipy.linalg import get_blas_funcs>>> gemm = get_blas_funcs("gemm", [X, Y])>>> np.all(gemm(1, X, Y) == np.dot(X, Y))True

This looks easy, but it does hardly any error checking, so you must really know what you're doing.