How to implement the ReLU function in Numpy
There are a couple of ways.
>>> x = np.random.random((3, 2)) - 0.5>>> xarray([[-0.00590765, 0.18932873], [-0.32396051, 0.25586596], [ 0.22358098, 0.02217555]])>>> np.maximum(x, 0)array([[ 0. , 0.18932873], [ 0. , 0.25586596], [ 0.22358098, 0.02217555]])>>> x * (x > 0)array([[-0. , 0.18932873], [-0. , 0.25586596], [ 0.22358098, 0.02217555]])>>> (abs(x) + x) / 2array([[ 0. , 0.18932873], [ 0. , 0.25586596], [ 0.22358098, 0.02217555]])
If timing the results with the following code:
import numpy as npx = np.random.random((5000, 5000)) - 0.5print("max method:")%timeit -n10 np.maximum(x, 0)print("multiplication method:")%timeit -n10 x * (x > 0)print("abs method:")%timeit -n10 (abs(x) + x) / 2
We get:
max method:10 loops, best of 3: 239 ms per loopmultiplication method:10 loops, best of 3: 145 ms per loopabs method:10 loops, best of 3: 288 ms per loop
So the multiplication seems to be the fastest.
I'm completely revising my original answer because of points raised in the other questions and comments. Here is the new benchmark script:
import timeimport numpy as npdef fancy_index_relu(m): m[m < 0] = 0relus = { "max": lambda x: np.maximum(x, 0), "in-place max": lambda x: np.maximum(x, 0, x), "mul": lambda x: x * (x > 0), "abs": lambda x: (abs(x) + x) / 2, "fancy index": fancy_index_relu,}for name, relu in relus.items(): n_iter = 20 x = np.random.random((n_iter, 5000, 5000)) - 0.5 t1 = time.time() for i in range(n_iter): relu(x[i]) t2 = time.time() print("{:>12s} {:3.0f} ms".format(name, (t2 - t1) / n_iter * 1000))
It takes care to use a different ndarray for each implementation and iteration. Here are the results:
max 126 msin-place max 107 ms mul 136 ms abs 86 ms fancy index 132 ms
EDIT As jirassimok has mentioned below my function will change the data in place, after that it runs a lot faster in timeit. This causes the good results. It's some kind of cheating. Sorry for your inconvenience.
I found a faster method for ReLU with numpy. You can use the fancy index feature of numpy as well.
fancy index:
20.3 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> x = np.random.random((5,5)) - 0.5 >>> xarray([[-0.21444316, -0.05676216, 0.43956365, -0.30788116, -0.19952038], [-0.43062223, 0.12144647, -0.05698369, -0.32187085, 0.24901568], [ 0.06785385, -0.43476031, -0.0735933 , 0.3736868 , 0.24832288], [ 0.47085262, -0.06379623, 0.46904916, -0.29421609, -0.15091168], [ 0.08381359, -0.25068492, -0.25733763, -0.1852205 , -0.42816953]])>>> x[x<0]=0>>> xarray([[ 0. , 0. , 0.43956365, 0. , 0. ], [ 0. , 0.12144647, 0. , 0. , 0.24901568], [ 0.06785385, 0. , 0. , 0.3736868 , 0.24832288], [ 0.47085262, 0. , 0.46904916, 0. , 0. ], [ 0.08381359, 0. , 0. , 0. , 0. ]])
Here is my benchmark:
import numpy as npx = np.random.random((5000, 5000)) - 0.5print("max method:")%timeit -n10 np.maximum(x, 0)print("max inplace method:")%timeit -n10 np.maximum(x, 0,x)print("multiplication method:")%timeit -n10 x * (x > 0)print("abs method:")%timeit -n10 (abs(x) + x) / 2print("fancy index:")%timeit -n10 x[x<0] =0max method:241 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)max inplace method:38.5 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)multiplication method:162 ms ± 3.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)abs method:181 ms ± 4.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)fancy index:20.3 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)