sine calculation orders of magnitude slower than cosine sine calculation orders of magnitude slower than cosine python python

sine calculation orders of magnitude slower than cosine


I don't think numpy has anything to do with this: I think you're tripping across a performance bug in the C math library on your system, one which affects sin near large multiples of pi. (I'm using "bug" in a pretty broad sense here -- for all I know, since the sine of large floats is poorly defined, the "bug" is actually the library behaving correctly to handle corner cases!)

On linux, I get:

>>> %timeit -n 10000 math.sin(6e7*math.pi)10000 loops, best of 3: 191 µs per loop>>> %timeit -n 10000 math.sin(6e7*math.pi+0.12)10000 loops, best of 3: 428 ns per loop

and other Linux-using types from the Python chatroom report

10000 loops, best of 3: 49.4 µs per loop 10000 loops, best of 3: 206 ns per loop

and

In [3]: %timeit -n 10000 math.sin(6e7*math.pi)10000 loops, best of 3: 116 µs per loopIn [4]: %timeit -n 10000 math.sin(6e7*math.pi+0.12)10000 loops, best of 3: 428 ns per loop

but a Mac user reported

In [3]: timeit -n 10000 math.sin(6e7*math.pi)10000 loops, best of 3: 300 ns per loopIn [4]: %timeit -n 10000 math.sin(6e7*math.pi+0.12)10000 loops, best of 3: 361 ns per loop

for no order-of-magnitude difference. As a workaround, you might try taking things mod 2 pi first:

>>> new = np.sin(omega_t2[-1000:] % (2*np.pi))>>> old = np.sin(omega_t2[-1000:])>>> abs(new - old).max()7.83773902468434e-09

which has better performance:

>>> %timeit -n 1000 new = np.sin(omega_t2[-1000:] % (2*np.pi))1000 loops, best of 3: 63.8 µs per loop>>> %timeit -n 1000 old = np.sin(omega_t2[-1000:])1000 loops, best of 3: 6.82 ms per loop

Note that as expected, a similar effect happens for cos, just shifted:

>>> %timeit -n 1000 np.cos(6e7*np.pi + np.pi/2)1000 loops, best of 3: 37.6 µs per loop>>> %timeit -n 1000 np.cos(6e7*np.pi + np.pi/2 + 0.12)1000 loops, best of 3: 2.46 µs per loop


One possible cause of these huge performance differences might be in how the math library creates or handles IEEE floating point underflow (or denorms), which might be produced by a difference of some of the tinier mantissa bits during transcendental function approximation. And your t1 and t2 vectors might differ by these smaller mantissa bits, as well as the algorithm used to compute the transcendental function in whatever libraries you linked, as well as the IEEE arithmetic denorms or underflow handler on each particular OS.