Replace elements in numpy array avoiding loops
SELECTING THE FASTEST METHOD
Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.
TL;DR: Numpy indexing is the winner
def meth1(): # suggested by @Slam for old, new in Y: Xold[Xold == old] = new def meth2(): # suggested by myself, convert y_dict = dict(Y) first [y_dict[i] if i in y_dict.keys() else i for i in Xold] def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1]) def meth4(): # suggested by @Brad Solomon, import pandas as pd first pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values # suggested by @jdehesa. create Xnew = Xold.copy() and index # idx = np.searchsorted(Xold, Y[:, 0]) first def meth5(): Xnew[idx] = Y[:, 1]
Not so surprising results
In [39]: timeit.timeit(meth1, number=1000000) Out[39]: 12.08 In [40]: timeit.timeit(meth2, number=1000000) Out[40]: 2.87 In [38]: timeit.timeit(meth3, number=1000000) Out[38]: 55.39 In [12]: timeit.timeit(meth4, number=1000000) Out[12]: 256.84 In [50]: timeit.timeit(meth5, number=1000000) Out[50]: 1.12
So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted()
.
We can use np.searchsorted
for a generic case when the data in first column of Y
is not necessarily sorted -
sidx = Y[:,0].argsort()out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]
Sample run -
In [53]: XoldOut[53]: array([14, 10, 12, 13, 11])In [54]: YOut[54]: array([[ 10, 0], [ 11, 100], [ 13, 300], [ 14, 400], [ 12, 200]])In [55]: sidx = Y[:,0].argsort() ...: out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]In [56]: outOut[56]: array([400, 0, 200, 300, 100])
If not all elements have corresponding mappings available, then we need to do a bit more of work, like so -
sidx = Y[:,0].argsort()sorted_indx = np.searchsorted(Y[:,0], Xold, sorter=sidx)sorted_indx[sorted_indx==len(sidx)] = len(sidx)-1idx_out = sidx[sorted_indx]out = Y[idx_out,1]out[Y[idx_out,0]!=Xold] = 0 # NA values as 0s
Here is one possibility:
import numpy as npXold = np.array([0, 1, 2, 3, 4])Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])# Check every X value against every Y first valuem = Xold == Y[:, 0, np.newaxis]# Check which elements in X are among Y first values# (so values that are not in Y are not replaced)m_X = np.any(m, axis=0)# Compute replacement# Xold * (1 - m_X) are the non-replaced values# np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X are the replaced valuesXnew = Xold * (1 - m_X) + np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_Xprint(Xnew)
Output:
[ 0 100 200 300 400]
This method works for more or less every case (unsorted arrays, multiple repetitions of values in X, values in X not replaced, values in Y not replacing anything in X), except if you give two replacements for the same value in Y, which would be wrong anyway. However, its time and space complexity is the product of the sizes of X and Y. If your problem has additional constraints (data is sorted, no repetitions, etc.) it might be possible to do something better. For example, if X is sorted with no repeated elements and every value in Y replaces a value in X (like in your example), this would probably be faster:
import numpy as npXold = np.array([0, 1, 2, 3, 4])Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])idx = np.searchsorted(Xold, Y[:, 0])Xnew = Xold.copy()Xnew[idx] = Y[:, 1]print(Xnew)# [ 0 100 200 300 400]