Increasing speed of a pure Numpy/Scipy convolutional neural network implementation Increasing speed of a pure Numpy/Scipy convolutional neural network implementation numpy numpy

Increasing speed of a pure Numpy/Scipy convolutional neural network implementation


Accelerating convolution

Building on mplf's suggestion I've found it is possible to remove both of the for loops and the call to convolve2d:

d = x[:,:-1,:-1].swapaxes(0,1)c = x[:,:-1,1:].swapaxes(0,1)b = x[:,1:,:-1].swapaxes(0,1)a = x[:,1:,1:].swapaxes(0,1)x = W[:,:,0,0].dot(a) + W[:,:,0,1].dot(b) + W[:,:,1,0].dot(c) + W[:,:,1,1].dot(d) + biases.reshape(-1,1,1)

This is 10 times faster than the original code.

Accelerating max pool

With this new code, the max pool stage now takes 50% of the time. This can also be sped up by using:

def max_pool(x):    """Return maximum in groups of 2x2 for a N,h,w image"""    N,h,w = x.shape    x = x.reshape(N,h/2,2,w/2,2).swapaxes(2,3).reshape(N,h/2,w/2,4)    return np.amax(x,axis=3)

This speeds up the max_pool step by a factor of 10, so overall the program doubles in speed again.


Looking around, it seems that the scipy convolve2d function is unoptimized and rather in-efficient. There is an open issue on this from Jan 2014 (https://github.com/scipy/scipy/issues/3184) and this question seems to be related Improving Numpy Performance.

I would suggest trying the solution posted by Theran and see if this produces any better performance first.