Numpy sum of operator results without allocating an unnecessary array Numpy sum of operator results without allocating an unnecessary array numpy numpy

Numpy sum of operator results without allocating an unnecessary array


On my machine this is faster:

(a == b).sum()

If you don't want to use any extra storage, than I would suggest using numba.I'm not too familiar with it, but this seems to work well.I ran into some trouble getting Cython to take a boolean NumPy array.

from numba import autojitdef pysumeq(a, b):    tot = 0    for i in xrange(a.shape[0]):        for j in xrange(a.shape[1]):            if a[i,j] == b[i,j]:                tot += 1    return tot# make numba versionnbsumeq = autojit(pysumeq)A = (rand(10,10)<.5)B = (rand(10,10)<.5)# do a simple dry run to get it to compile# for this specific use casenbsumeq(A, B)

If you don't have numba, I would suggest using the answer by @user2357112

Edit: Just got a Cython version working, here's the .pyx file. I'd go with this.

from numpy cimport ndarray as arcimport numpy as npcimport cython@cython.boundscheck(False)@cython.wraparound(False)def cysumeq(ar[np.uint8_t,ndim=2,cast=True] a, ar[np.uint8_t,ndim=2,cast=True] b):    cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0    for i in xrange(h):        for j in xrange(w):            if a[i,j] == b[i,j]:                tot += 1    return tot


To start with you can skip then A*B step:

>>> aarray([ True, False,  True, False,  True], dtype=bool)>>> barray([False,  True,  True, False,  True], dtype=bool)>>> np.sum(~(a^b))3

If you do not mind destroying array a or b, I am not sure you will get faster then this:

>>> a^=b   #In place xor operator>>> np.sum(~a)3


If the problem is allocation and deallocation, maintain a single output array and tell numpy to put the results there every time:

out = np.empty_like(a) # Allocate this outside a loop and use it every iterationnum_eq = np.equal(a, b, out).sum()

This'll only work if the inputs are always the same dimensions, though. You may be able to make one big array and slice out a part that's the size you need for each call if the inputs have varying sizes, but I'm not sure how much that slows you down.