Vectorized way of calculating row-wise dot product two matrices with Scipy

numpy scipy vectorization matrix-multiplication dot-product

Straightforward way to do that is:

import numpy as npa=np.array([[1,2,3],[3,4,5]])b=np.array([[1,2,3],[1,2,3]])np.sum(a*b, axis=1)

which avoids the python loop and is faster in cases like:

def npsumdot(x, y):    return np.sum(x*y, axis=1)def loopdot(x, y):    result = np.empty((x.shape[0]))    for i in range(x.shape[0]):        result[i] = np.dot(x[i], y[i])    return resulttimeit npsumdot(np.random.rand(500000,50),np.random.rand(500000,50))# 1 loops, best of 3: 861 ms per looptimeit loopdot(np.random.rand(500000,50),np.random.rand(500000,50))# 1 loops, best of 3: 1.58 s per loop

numpy scipy vectorization matrix-multiplication dot-product

Check out numpy.einsum for another method:

In [52]: aOut[52]: array([[1, 2, 3],       [3, 4, 5]])In [53]: bOut[53]: array([[1, 2, 3],       [1, 2, 3]])In [54]: einsum('ij,ij->i', a, b)Out[54]: array([14, 26])

Looks like einsum is a bit faster than inner1d:

In [94]: %timeit inner1d(a,b)1000000 loops, best of 3: 1.8 us per loopIn [95]: %timeit einsum('ij,ij->i', a, b)1000000 loops, best of 3: 1.6 us per loopIn [96]: a = random.randn(10, 100)In [97]: b = random.randn(10, 100)In [98]: %timeit inner1d(a,b)100000 loops, best of 3: 2.89 us per loopIn [99]: %timeit einsum('ij,ij->i', a, b)100000 loops, best of 3: 2.03 us per loop

Note: NumPy is constantly evolving and improving; the relative performance of the functions shown above has probably changed over the years. If performance is important to you, run your own tests with the version of NumPy that you will be using.

numpy scipy vectorization matrix-multiplication dot-product

Played around with this and found inner1d the fastest. That function however is internal, so a more robust approach is to use

numpy.einsum("ij,ij->i", a, b)

Even better is to align your memory such that the summation happens in the first dimension, e.g.,

a = numpy.random.rand(3, n)b = numpy.random.rand(3, n)numpy.einsum("ij,ij->j", a, b)

For 10 ** 3 <= n <= 10 ** 6, this is the fastest method, and up to twice as fast as its untransposed equivalent. The maximum occurs when the level-2 cache is maxed out, at about 2 * 10 ** 4.

Note also that the transposed summation is much faster than its untransposed equivalent.

The plot was created with perfplot (a small project of mine)

import numpyfrom numpy.core.umath_tests import inner1dimport perfplotdef setup(n):    a = numpy.random.rand(n, 3)    b = numpy.random.rand(n, 3)    aT = numpy.ascontiguousarray(a.T)    bT = numpy.ascontiguousarray(b.T)    return (a, b), (aT, bT)b = perfplot.bench(    setup=setup,    n_range=[2 ** k for k in range(1, 25)],    kernels=[        lambda data: numpy.sum(data[0][0] * data[0][1], axis=1),        lambda data: numpy.einsum("ij, ij->i", data[0][0], data[0][1]),        lambda data: numpy.sum(data[1][0] * data[1][1], axis=0),        lambda data: numpy.einsum("ij, ij->j", data[1][0], data[1][1]),        lambda data: inner1d(data[0][0], data[0][1]),    ],    labels=["sum", "einsum", "sum.T", "einsum.T", "inner1d"],    xlabel="len(a), len(b)",)b.save("out1.png")b.save("out2.png", relative_to=3)

CodeHunter

Vectorized way of calculating row-wise dot product two matrices with Scipy

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last