Replace elements in numpy array avoiding loops

SELECTING THE FASTEST METHOD

Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.

TL;DR: Numpy indexing is the winner

 def meth1(): # suggested by @Slam    for old, new in Y:          Xold[Xold == old] = new def meth2(): # suggested by myself, convert y_dict = dict(Y) first     [y_dict[i] if i in y_dict.keys() else i for i in Xold] def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first     npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1]) def meth4(): # suggested by @Brad Solomon, import pandas as pd first      pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values  # suggested by @jdehesa. create Xnew = Xold.copy() and index  # idx = np.searchsorted(Xold, Y[:, 0]) first  def meth5():                  Xnew[idx] = Y[:, 1]

Not so surprising results

 In [39]: timeit.timeit(meth1, number=1000000)                                                                       Out[39]: 12.08 In [40]: timeit.timeit(meth2, number=1000000)                                                                       Out[40]: 2.87 In [38]: timeit.timeit(meth3, number=1000000)                                                                       Out[38]: 55.39 In [12]: timeit.timeit(meth4, number=1000000)                                                                                       Out[12]: 256.84 In [50]: timeit.timeit(meth5, number=1000000)                                                                                       Out[50]: 1.12

So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted().

python numpy for-loop numpy-slicing

We can use np.searchsorted for a generic case when the data in first column of Y is not necessarily sorted -

sidx = Y[:,0].argsort()out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]

Sample run -

In [53]: XoldOut[53]: array([14, 10, 12, 13, 11])In [54]: YOut[54]: array([[ 10,   0],       [ 11, 100],       [ 13, 300],       [ 14, 400],       [ 12, 200]])In [55]: sidx = Y[:,0].argsort()    ...: out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]In [56]: outOut[56]: array([400,   0, 200, 300, 100])

If not all elements have corresponding mappings available, then we need to do a bit more of work, like so -

sidx = Y[:,0].argsort()sorted_indx = np.searchsorted(Y[:,0], Xold, sorter=sidx)sorted_indx[sorted_indx==len(sidx)] = len(sidx)-1idx_out = sidx[sorted_indx]out = Y[idx_out,1]out[Y[idx_out,0]!=Xold] = 0 # NA values as 0s

python numpy for-loop numpy-slicing

Here is one possibility:

import numpy as npXold = np.array([0, 1, 2, 3, 4])Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])# Check every X value against every Y first valuem = Xold == Y[:, 0, np.newaxis]# Check which elements in X are among Y first values# (so values that are not in Y are not replaced)m_X = np.any(m, axis=0)# Compute replacement# Xold * (1 - m_X) are the non-replaced values# np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X are the replaced valuesXnew = Xold * (1 - m_X) + np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_Xprint(Xnew)

Output:

[  0 100 200 300 400]

This method works for more or less every case (unsorted arrays, multiple repetitions of values in X, values in X not replaced, values in Y not replacing anything in X), except if you give two replacements for the same value in Y, which would be wrong anyway. However, its time and space complexity is the product of the sizes of X and Y. If your problem has additional constraints (data is sorted, no repetitions, etc.) it might be possible to do something better. For example, if X is sorted with no repeated elements and every value in Y replaces a value in X (like in your example), this would probably be faster:

import numpy as npXold = np.array([0, 1, 2, 3, 4])Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])idx = np.searchsorted(Xold, Y[:, 0])Xnew = Xold.copy()Xnew[idx] = Y[:, 1]print(Xnew)# [  0 100 200 300 400]

CodeHunter

Replace elements in numpy array avoiding loops

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last