numpy - efficiently copy values from matrix to matrix using some precalculated map numpy - efficiently copy values from matrix to matrix using some precalculated map python-3.x python-3.x

numpy - efficiently copy values from matrix to matrix using some precalculated map


If you have infinity time for precomputing you can get a slight speedup by going to flat indexing:

map_f = np.ravel_multi_index((*np.moveaxis(mapping, 2, 0),), A.shape)

Then simply do:

A.ravel()[map_f]

Please note that this speedup is on top of the large speedup we get from fancy indexing. For example:

>>> A = np.random.random((5000, 3000))>>> mapping = np.random.randint(0, 15000, (5000, 3000, 2)) % [5000, 3000]>>> >>> map_f = np.ravel_multi_index((*np.moveaxis(mapping, 2, 0),), A.shape)>>> >>> np.all(A.ravel()[map_f] == A[mapping[..., 0], mapping[..., 1]])True>>> >>> timeit('A[mapping[:, :, 0], mappping[:, :, 1]]', globals=globals(), number=10)4.101239089999581>>> timeit('A.ravel()[map_f]', globals=globals(), number=10)2.7831342950012186

If we were to compare to the original loopy code, the speedup would be more like ~40x.

Finally, note that this solution does not only avoid the additional dependency and potential installation nightmare that is numba, but is also simpler, shorter and faster:

numba:precomp:    132.957 msmain        238.359 msflat indexing:precomp:     76.223 msmain:       219.910 ms

Code:

import numpy as npfrom numba import jit@jitdef fast(A, B, mapping):    N, M = B.shape    for i in range(N):        for j in range(M):            B[i, j] = A[mapping[i, j, 0], mapping[i, j, 1]]    return Bfrom timeit import timeitA = np.random.random((5000, 3000))mapping = np.random.randint(0, 15000, (5000, 3000, 2)) % [5000, 3000]a = np.random.random((5, 3))m = np.random.randint(0, 15, (5, 3, 2)) % [5, 3]print('numba:')print(f"precomp: {timeit('b = fast(a, np.empty_like(a), m)', globals=globals(), number=1)*1000:10.3f} ms")print(f"main     {timeit('B = fast(A, np.empty_like(A), mapping)', globals=globals(), number=10)*100:10.3f} ms")print('\nflat indexing:')print(f"precomp: {timeit('map_f = np.ravel_multi_index((*np.moveaxis(mapping, 2, 0),), A.shape)', globals=globals(), number=10)*100:10.3f} ms")map_f = np.ravel_multi_index((*np.moveaxis(mapping, 2, 0),), A.shape)print(f"main:    {timeit('B = A.ravel()[map_f]', globals=globals(), number=10)*100:10.3f} ms")


One very nice solution to these types of performance critical problems is to keep it simple and utilize one of the high performance packages. The easiest might be Numba which provides the jit decorator that compiles array and loop heavy code to optimized LLVM. Below is a full example:

from time import timeimport numpy as npfrom numba import jit# Function doing the computationdef normal(A, B, mapping):    N, M = B.shape    for i in range(N):        for j in range(M):            B[i, j] = A[mapping[i, j, 0], mapping[i, j, 1]]    return B# The same exact function, but with the Numba jit decorator@jitdef fast(A, B, mapping):    N, M = B.shape    for i in range(N):        for j in range(M):            B[i, j] = A[mapping[i, j, 0], mapping[i, j, 1]]    return B# Create sample datadef create_sample_data(I, J, N, M):    A = np.random.random((I, J))    B = np.empty((N, M))    mapping = np.asarray(np.stack((        np.random.random((N, M))*I,        np.random.random((N, M))*J,        ), axis=2), dtype=int)    return A, B, mappingA, B, mapping = create_sample_data(500, 600, 700, 800)# Run normallyt0 = time()B = normal(A, B, mapping)t1 = time()print('normal took', t1 - t0, 'seconds')# Run using Numba.# First we should run the function with smaller arrays,# just to compile the code.fast(*create_sample_data(5, 6, 7, 8))# Now, run with real datat0 = time()B = fast(A, B, mapping)t1 = time()print('fast took', t1 - t0, 'seconds')

This uses your own looping solution, which is inherently slow using standard Python, but as fast as C when using Numba. On my machine the normal function executes in 0.270 seconds, while the fast function executes in 0.00248 seconds. That is, Numba gives us a 109x speedup (!) pretty much for free.

Note that the fast Numba function is called twice, first with small input arrays and only then with the real data. This is a critical step which is often neglected. Without it, you will find that the performance increase is not nearly as good, as the first call is used to compile the code. The types and dimensions of the input arrays should be the same in this initial call, but the size in each dimension is not important.

I create B outside of the function(s) and passed it as an argument (to be "filled with values"). You might just as well allocate B inside of the function, Numba does not care.

The easiest way to get Numba is properly via the Anaconda distribution.


One option would be to use numba, which can often provide substantial improvements in this kind of simple algorithmic code.

import numpy as npfrom numba import njitI, J = 5000, 5000N, M = 3000, 3000A = np.random.randint(0, 10, [I, J])B = np.random.randint(0, 10, [N, M])mapping = np.dstack([np.random.randint(0, I - 1, (N, M)),                     np.random.randint(0, J - 1, (N, M))])B0 = B.copy()def orig(A, B, mapping):    for i in range(N):        for j in range(M):            B[i, j] = A[mapping[i, j, 0], mapping[i, j, 1]]new = njit(orig)

which gives us matching results:

In [313]: Bold = B0.copy()In [314]: orig(A, Bold, mapping)In [315]: Bnew = B0.copy()In [316]: new(A, Bnew, mapping)In [317]: (Bold == Bnew).all()Out[317]: True

and is much faster:

In [320]: %time orig(A, B0.copy(), mapping)Wall time: 6.11 sIn [321]: %time new(A, B0.copy(), mapping)Wall time: 257 ms

and faster still after the first call, when it has to do its jit work:

In [322]: %time new(A, B0.copy(), mapping)Wall time: 171 msIn [323]: %time new(A, B0.copy(), mapping)Wall time: 163 ms

for a 30x improvement for adding two lines of code.