Numpy sum of operator results without allocating an unnecessary array
On my machine this is faster:
(a == b).sum()
If you don't want to use any extra storage, than I would suggest using numba.I'm not too familiar with it, but this seems to work well.I ran into some trouble getting Cython to take a boolean NumPy array.
from numba import autojitdef pysumeq(a, b): tot = 0 for i in xrange(a.shape[0]): for j in xrange(a.shape[1]): if a[i,j] == b[i,j]: tot += 1 return tot# make numba versionnbsumeq = autojit(pysumeq)A = (rand(10,10)<.5)B = (rand(10,10)<.5)# do a simple dry run to get it to compile# for this specific use casenbsumeq(A, B)
If you don't have numba, I would suggest using the answer by @user2357112
Edit: Just got a Cython version working, here's the .pyx
file. I'd go with this.
from numpy cimport ndarray as arcimport numpy as npcimport cython@cython.boundscheck(False)@cython.wraparound(False)def cysumeq(ar[np.uint8_t,ndim=2,cast=True] a, ar[np.uint8_t,ndim=2,cast=True] b): cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0 for i in xrange(h): for j in xrange(w): if a[i,j] == b[i,j]: tot += 1 return tot
To start with you can skip then A*B step:
>>> aarray([ True, False, True, False, True], dtype=bool)>>> barray([False, True, True, False, True], dtype=bool)>>> np.sum(~(a^b))3
If you do not mind destroying array a or b, I am not sure you will get faster then this:
>>> a^=b #In place xor operator>>> np.sum(~a)3
If the problem is allocation and deallocation, maintain a single output array and tell numpy to put the results there every time:
out = np.empty_like(a) # Allocate this outside a loop and use it every iterationnum_eq = np.equal(a, b, out).sum()
This'll only work if the inputs are always the same dimensions, though. You may be able to make one big array and slice out a part that's the size you need for each call if the inputs have varying sizes, but I'm not sure how much that slows you down.