Replace elements in numpy array avoiding loops Replace elements in numpy array avoiding loops numpy numpy

Replace elements in numpy array avoiding loops


SELECTING THE FASTEST METHOD

Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.

TL;DR: Numpy indexing is the winner

 def meth1(): # suggested by @Slam    for old, new in Y:          Xold[Xold == old] = new def meth2(): # suggested by myself, convert y_dict = dict(Y) first     [y_dict[i] if i in y_dict.keys() else i for i in Xold] def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first     npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1]) def meth4(): # suggested by @Brad Solomon, import pandas as pd first      pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values  # suggested by @jdehesa. create Xnew = Xold.copy() and index  # idx = np.searchsorted(Xold, Y[:, 0]) first  def meth5():                  Xnew[idx] = Y[:, 1]

Not so surprising results

 In [39]: timeit.timeit(meth1, number=1000000)                                                                       Out[39]: 12.08 In [40]: timeit.timeit(meth2, number=1000000)                                                                       Out[40]: 2.87 In [38]: timeit.timeit(meth3, number=1000000)                                                                       Out[38]: 55.39 In [12]: timeit.timeit(meth4, number=1000000)                                                                                       Out[12]: 256.84 In [50]: timeit.timeit(meth5, number=1000000)                                                                                       Out[50]: 1.12

So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted().


We can use np.searchsorted for a generic case when the data in first column of Y is not necessarily sorted -

sidx = Y[:,0].argsort()out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]

Sample run -

In [53]: XoldOut[53]: array([14, 10, 12, 13, 11])In [54]: YOut[54]: array([[ 10,   0],       [ 11, 100],       [ 13, 300],       [ 14, 400],       [ 12, 200]])In [55]: sidx = Y[:,0].argsort()    ...: out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]In [56]: outOut[56]: array([400,   0, 200, 300, 100])

If not all elements have corresponding mappings available, then we need to do a bit more of work, like so -

sidx = Y[:,0].argsort()sorted_indx = np.searchsorted(Y[:,0], Xold, sorter=sidx)sorted_indx[sorted_indx==len(sidx)] = len(sidx)-1idx_out = sidx[sorted_indx]out = Y[idx_out,1]out[Y[idx_out,0]!=Xold] = 0 # NA values as 0s


Here is one possibility:

import numpy as npXold = np.array([0, 1, 2, 3, 4])Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])# Check every X value against every Y first valuem = Xold == Y[:, 0, np.newaxis]# Check which elements in X are among Y first values# (so values that are not in Y are not replaced)m_X = np.any(m, axis=0)# Compute replacement# Xold * (1 - m_X) are the non-replaced values# np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X are the replaced valuesXnew = Xold * (1 - m_X) + np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_Xprint(Xnew)

Output:

[  0 100 200 300 400]

This method works for more or less every case (unsorted arrays, multiple repetitions of values in X, values in X not replaced, values in Y not replacing anything in X), except if you give two replacements for the same value in Y, which would be wrong anyway. However, its time and space complexity is the product of the sizes of X and Y. If your problem has additional constraints (data is sorted, no repetitions, etc.) it might be possible to do something better. For example, if X is sorted with no repeated elements and every value in Y replaces a value in X (like in your example), this would probably be faster:

import numpy as npXold = np.array([0, 1, 2, 3, 4])Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])idx = np.searchsorted(Xold, Y[:, 0])Xnew = Xold.copy()Xnew[idx] = Y[:, 1]print(Xnew)# [  0 100 200 300 400]