Difference between numpy dot() and Python 3.5+ matrix multiplication @
The @
operator calls the array's __matmul__
method, not dot
. This method is also present in the API as the function np.matmul
.
>>> a = np.random.rand(8,13,13)>>> b = np.random.rand(8,13,13)>>> np.matmul(a, b).shape(8, 13, 13)
From the documentation:
matmul
differs fromdot
in two important ways.
- Multiplication by scalars is not allowed.
- Stacks of matrices are broadcast together as if the matrices were elements.
The last point makes it clear that dot
and matmul
methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
For matmul
:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
For np.dot
:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
The answer by @ajcr explains how the dot
and matmul
(invoked by the @
symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.
To clarify the differences take a 4x4 array and return the dot
product and matmul
product with a 3x4x2 'stack of matricies' or tensor.
import numpy as npfourbyfour = np.array([ [1,2,3,4], [3,2,1,4], [5,4,6,7], [11,12,13,14] ])threebyfourbytwo = np.array([ [[2,3],[11,9],[32,21],[28,17]], [[2,3],[1,9],[3,21],[28,7]], [[2,3],[1,9],[3,21],[28,7]], ])print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,threebyfourbytwo)))print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,threebyfourbytwo)))
The products of each operation appear below. Notice how the dot product is,
...a sum product over the last axis of a and the second-to-last of b
and how the matrix product is formed by broadcasting the matrix together.
4x4*3x4x2 dot: [[[232 152] [125 112] [125 112]] [[172 116] [123 76] [123 76]] [[442 296] [228 226] [228 226]] [[962 652] [465 512] [465 512]]]4x4*3x4x2 matmul: [[[232 152] [172 116] [442 296] [962 652]] [[125 112] [123 76] [228 226] [465 512]] [[125 112] [123 76] [228 226] [465 512]]]
Just FYI, @
and its numpy equivalents dot
and matmul
are all equally fast. (Plot created with perfplot, a project of mine.)
Code to reproduce the plot:
import perfplotimport numpydef setup(n): A = numpy.random.rand(n, n) x = numpy.random.rand(n) return A, xdef at(data): A, x = data return A @ xdef numpy_dot(data): A, x = data return numpy.dot(A, x)def numpy_matmul(data): A, x = data return numpy.matmul(A, x)perfplot.show( setup=setup, kernels=[at, numpy_dot, numpy_matmul], n_range=[2 ** k for k in range(15)],)