Numpy element-wise dot product Numpy element-wise dot product numpy numpy

Numpy element-wise dot product


Approach #1

Use np.einsum -

np.einsum('ijkl,ilm->ijkm',m0,m1)

Steps involved :

  • Keep the first axes from the inputs aligned.

  • Lose the last axis from m0 against second one from m1 in sum-reduction.

  • Let remaining axes from m0 and m1 spread-out/expand with elementwise multiplications in an outer-product fashion.


Approach #2

If you are looking for performance and with the axis of sum-reduction having a smaller length, you are better off with one-loop and using matrix-multiplication with np.tensordot, like so -

s0,s1,s2,s3 = m0.shapes4 = m1.shape[-1]r = np.empty((s0,s1,s2,s4))for i in range(s0):    r[i] = np.tensordot(m0[i],m1[i],axes=([2],[0]))

Approach #3

Now, np.dot could be efficiently used on 2D inputs for some further performance boost. So, with it, the modified version, though a bit longer one, but hopefully the most performant one would be -

s0,s1,s2,s3 = m0.shapes4 = m1.shape[-1]m0.shape = s0,s1*s2,s3   # Get m0 as 3D for temporary usager = np.empty((s0,s1*s2,s4))for i in range(s0):    r[i] = m0[i].dot(m1[i])r.shape = s0,s1,s2,s4m0.shape = s0,s1,s2,s3  # Put m0 back to 4D

Runtime test

Function definitions -

def original_app(m0, m1):    s0,s1,s2,s3 = m0.shape    s4 = m1.shape[-1]    r = np.empty((s0,s1,s2,s4))    for i in range(s0):        for j in range(s1):            r[i, j] = np.dot(m0[i, j], m1[i])    return rdef einsum_app(m0, m1):    return np.einsum('ijkl,ilm->ijkm',m0,m1)def tensordot_app(m0, m1):    s0,s1,s2,s3 = m0.shape    s4 = m1.shape[-1]    r = np.empty((s0,s1,s2,s4))    for i in range(s0):        r[i] = np.tensordot(m0[i],m1[i],axes=([2],[0]))    return r        def dot_app(m0, m1):    s0,s1,s2,s3 = m0.shape    s4 = m1.shape[-1]    m0.shape = s0,s1*s2,s3   # Get m0 as 3D for temporary usage    r = np.empty((s0,s1*s2,s4))    for i in range(s0):        r[i] = m0[i].dot(m1[i])    r.shape = s0,s1,s2,s4    m0.shape = s0,s1,s2,s3  # Put m0 back to 4D    return r

Timings and verification -

In [291]: # Inputs     ...: m0 = np.random.rand(50,30,20,20)     ...: m1 = np.random.rand(50,20,20)     ...: In [292]: out1 = original_app(m0, m1)     ...: out2 = einsum_app(m0, m1)     ...: out3 = tensordot_app(m0, m1)     ...: out4 = dot_app(m0, m1)     ...:      ...: print np.allclose(out1, out2)     ...: print np.allclose(out1, out3)     ...: print np.allclose(out1, out4)     ...: TrueTrueTrueIn [293]: %timeit original_app(m0, m1)     ...: %timeit einsum_app(m0, m1)     ...: %timeit tensordot_app(m0, m1)     ...: %timeit dot_app(m0, m1)     ...: 100 loops, best of 3: 10.3 ms per loop10 loops, best of 3: 31.3 ms per loop100 loops, best of 3: 5.12 ms per loop100 loops, best of 3: 4.06 ms per loop


I think numpy.inner() is what you really want?