Increasing speed of a pure Numpy/Scipy convolutional neural network implementation
Accelerating convolution
Building on mplf's suggestion I've found it is possible to remove both of the for loops and the call to convolve2d:
d = x[:,:-1,:-1].swapaxes(0,1)c = x[:,:-1,1:].swapaxes(0,1)b = x[:,1:,:-1].swapaxes(0,1)a = x[:,1:,1:].swapaxes(0,1)x = W[:,:,0,0].dot(a) + W[:,:,0,1].dot(b) + W[:,:,1,0].dot(c) + W[:,:,1,1].dot(d) + biases.reshape(-1,1,1)
This is 10 times faster than the original code.
Accelerating max pool
With this new code, the max pool stage now takes 50% of the time. This can also be sped up by using:
def max_pool(x): """Return maximum in groups of 2x2 for a N,h,w image""" N,h,w = x.shape x = x.reshape(N,h/2,2,w/2,2).swapaxes(2,3).reshape(N,h/2,w/2,4) return np.amax(x,axis=3)
This speeds up the max_pool step by a factor of 10, so overall the program doubles in speed again.
Looking around, it seems that the scipy convolve2d function is unoptimized and rather in-efficient. There is an open issue on this from Jan 2014 (https://github.com/scipy/scipy/issues/3184) and this question seems to be related Improving Numpy Performance.
I would suggest trying the solution posted by Theran and see if this produces any better performance first.