Performance of various numpy fancy indexing methods, also with numba

python performance numpy indexing numba

Your summary isn't completely correct, you already did tests with differently sized arrays but one thing that you didn't do was to change the number of elements indexed.

I restricted it to pure indexing and omitted take (which effectively is integer array indexing) and compress and extract (because these are effectively boolean array indexing). The only difference for these are the constant factors. The constant factor for the methods take and compress will be less than the overhead for the numpy functions np.take and np.compress but otherwise the effects will be negligible for reasonably sized arrays.

Just let me present it with different numbers:

# ~ every 500th elementx = np.arange(0, 1000000, dtype=np.float64)idx = np.random.randint(0, 1000000, size=int(1000000/500))  # changed the ratio!bool_mask = np.zeros(x.shape, dtype=np.bool)bool_mask[idx] = True%timeit x[idx]# 51.6 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)%timeit x[bool_mask]# 1.03 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)# ~ every 50th elementidx = np.random.randint(0, 1000000, size=int(1000000/50))  # changed the ratio!bool_mask = np.zeros(x.shape, dtype=np.bool)bool_mask[idx] = True%timeit x[idx]# 1.46 ms ± 55.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)%timeit x[bool_mask]# 2.69 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)# ~ every 5th elementidx = np.random.randint(0, 1000000, size=int(1000000/5))  # changed the ratio!bool_mask = np.zeros(x.shape, dtype=np.bool)bool_mask[idx] = True%timeit x[idx]# 14.9 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit x[bool_mask]# 8.31 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So what happened here? It's simple: Integer array indexing only needs to access as many elements as there are values in the index-array. That means if there are few matches it will be quite fast but slow if there are many indices. Boolean array indexing, however, always needs to walk through the whole boolean array and check for "true" values. That means it should be roughly "constant" for the array.

But, wait, it's not really constant for boolean arrays and why does integer array indexing take longer (last case) than boolean array indexing even if it has to process ~5 times less elements?

That's where it gets more complicated. In this case the boolean array had True at random places which means that it will be subject to branch prediction failures. These will be more likely if True and False will have equal occurrences but at random places. That's why the boolean array indexing got slower - because the ratio of True to False got more equal and thus more "random". Also the result array will be larger if there are more Trues which also consumes more time.

As an example for this branch prediction thing use this as example (could differ with different system/compilers):

bool_mask = np.zeros(x.shape, dtype=np.bool)bool_mask[:1000000//2] = True   # first half True, second half False%timeit x[bool_mask]# 5.92 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)bool_mask = np.zeros(x.shape, dtype=np.bool)bool_mask[::2] = True   # True and False alternating%timeit x[bool_mask]# 16.6 ms ± 361 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)bool_mask = np.zeros(x.shape, dtype=np.bool)bool_mask[::2] = Truenp.random.shuffle(bool_mask)  # shuffled%timeit x[bool_mask]# 18.2 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So the distribution of True and False will critically affect the runtime with boolean masks even if they contain the same amount of Trues! The same effect will be visible for the compress-functions.

For integer array indexing (and likewise np.take) another effect will be visible: cache locality. The indices in your case are randomly distributed, so your computer has to do a lot of "RAM" to "processor cache" loads because it's very unlikely two indices will be near to each other.

Compare this:

idx = np.random.randint(0, 1000000, size=int(1000000/5))%timeit x[idx]# 15.6 ms ± 703 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)idx = np.random.randint(0, 1000000, size=int(1000000/5))idx = np.sort(idx)  # sort them%timeit x[idx]# 4.33 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

By sorting the indices the chances immensely increased that the next value will already be in the cache and this can lead to huge speedups. That's a very important factor if you know that the indices will be sorted (for example if they were created by np.where they are sorted, which makes the result of np.where especially efficient for indexing).

So, it's not like integer array indexing is slower for small arrays and faster for large arrays it depends on much more factors. Both do have their use-cases and depending on the circumstances one might be (considerably) faster than the other.

Let me also talk a bit about the numba functions. First some general statements:

cache won't make a difference, it just avoids recompiling the function. In interactive environments this is essentially useless. It's faster if you would package the functions in a module though.
nogil by itself won't provide any speed boost. It will be faster if it's called in different threads because each function execution can release the GIL and then multiple calls can run in parallel.

Otherwise I don't know how numba effectivly implements these functions, however when you use NumPy features in numba it could be slower or faster - but even if it's faster it won't be much faster (except maybe for small arrays). Because if it could be made faster the NumPy developers would also implement it. My rule of thumb is: If you can do it (vectorized) with NumPy don't bother with numba. Only if you can't do it with vectorized NumPy functions or NumPy would use too many temporary arrays then numba will shine!

CodeHunter

Performance of various numpy fancy indexing methods, also with numba

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last