Why are log2 and log1p so much faster than log and log10, in numpy? Why are log2 and log1p so much faster than log and log10, in numpy? python python

Why are log2 and log1p so much faster than log and log10, in numpy?


I found your question extremely interesting, so I spent a few hours doing a bit of reserach; I think I have found an explanation for the performance difference as it applies to integers (thank you Matteo Italia for your note) - It is unclear how that reasoning can be extended to floats though:

Computers use base 2 - According to the articles linked in reference, the computation of log2 is a 4 processor cycles process - that of log10 requires to multiply log2(val) by 1/log2(10) which adds another 5 cycles.

Finding log2 is a matter of finding the index of the least significant bit of a value. (video at about the 23rd minute mark onward).

bit hacks: Find integer log base 10 of an integer

bit hacks: Find the log base 2 of an N-bit integer in O(lg(N))

The integer log base 10 is computed by first using one of the techniques above for finding the log base 2. By the relationship log10(v) = log2(v) / log2(10), we need to multiply it by 1/log2(10), which is approximately 1233/4096, or 1233 followed by a right shift of 12. Adding one is needed because the IntegerLogBase2 rounds down. Finally, since the value t is only an approximation that may be off by one, the exact value is found by subtracting the result of v < PowersOf10[t].

This method takes 6 more operations than IntegerLogBase2. It may be sped up (on machines with fast memory access) by modifying the log base 2 table-lookup method above so that the entries hold what is computed for t (that is, pre-add, -mulitply, and -shift). Doing so would require a total of only 9 operations to find the log base 10, assuming 4 tables were used (one for each byte of v).

Of note: the use of DeBruijn sequences lookup & bit shifting techniques to calculate log2 in this MIT video: Lec 2 | MIT 6.172 Performance Engineering of Software Systems, Fall 2010 (video at the 36th minute mark onward).

Of note this StackOverflow post which demonstrates a method to make efficient log2 calculations truly cross platform with C++

Caveat: I have not checked numpy source code to verify that it indeed implements similar techniques, but it would be surprising that it did not. In fact, from the comments under the OP's post, Fermion Portal have checked:

Actually numpy uses math.h from glibc, you'll see the same difference in C/C++ if you use math.h/cmath.h. You can find the nicely commented source code for the two functions e.g. ic.unicamp.br/~islene/2s2008-mo806/libc/sysdeps/ieee754/dbl-64/… and ic.unicamp.br/~islene/2s2008-mo806/libc/sysdeps/ieee754/dbl-64/…Fermion Portal[9]


This is just a note, but longer than a comment. Apparently this has to do with your particular install:

import numpy as npimport numexpr as nex = np.random.rand(100000)

I get the same timings with numpy 1.10 from conda and a version compiled with icc:

%timeit np.log2(x)1000 loops, best of 3: 1.24 ms per loop%timeit np.log(x)1000 loops, best of 3: 1.28 ms per loop

I thought it might have something to with grabbing the MKL VML package, but looks like thats a no:

%timeit ne.evaluate('log(x)')1000 loops, best of 3: 218 µs per loop

Looks like your numpy install is grabbing its log/log2 implementation from two different places which is odd.


Disclaimer: I'm neither a credible nor an official source.

I'm almost certain that any implementation of log to the base e function can be made as fast as the log2 function, because to convert one to the other you require a single division by a constant. This is of course assuming that a single division operation is a tiny fraction of the other computations; which in accurate implementations of logarithms is true.

In most cases, numpy uses math.h from glibc, you'll see the same difference in C/C++ if you use math.h/cmath.h. In the comments, some people observe same speeds for np.log and np.log2; I suspect this may come from different builds / platforms.

You can find the nicely commented source code for the two functions in files e_log.c, e_log2.c, e_logf.c, e_log2f.c in the dbl-64/ and flt-32/ subdirectories of this GitHub repo.

For double precision, in glibc, the log function is implementing a completely different algo (compared to log2) from IBM from ~2001, which was included in their libultim library. Whereas log2 is from Sun Microsystems from ~1993. Just looking at the code one can see two different approximations are being implemented. In contrast, for single precision, both functions log and log2 are the same apart from division by ln2 in the log2 case, hence the same speed.

For even more background on the underlying algorithms, alternatives and discussions on which to include in glibc in the future, see here.