Find and replace multiple values in python Find and replace multiple values in python numpy numpy

Find and replace multiple values in python


>>> arr = np.empty(a.max() + 1, dtype=val_new.dtype)>>> arr[val_old] = val_new>>> arr[a]array([3, 4, 3, 1, 5, 5, 2, 3])


Assuming that your val_old array is sorted (which is the case here, but if later on it's not, then don't forget to sort val_new along with it!), you can use numpy.searchsorted and then access val_new with the results.
This does not work if a number has no mapping, you will have to provide 1to1 mappings in that case.

In [1]: import numpy as npIn [2]: a = np.array([2, 3, 2, 5, 4, 4, 1, 2])In [3]: old_val = np.array([1, 2, 3, 4, 5])In [4]: new_val = np.array([2, 3, 4, 5, 1])In [5]: a_new = np.array([3, 4, 3, 1, 5, 5, 2, 3])In [6]: i = np.searchsorted(old_val,a)In [7]: a_replaced = new_val[i]In [8]: all(a_replaced == a_new)Out[8]: True

50k numbers? No problem!

In [23]: def timed():    t0 = time.time()    i = np.searchsorted(old_val, a)    a_replaced = new_val[i]    t1 = time.time()    print('%s Seconds'%(t1-t0))   ....: In [24]: a = np.random.choice(old_val, 50000)In [25]: timed()0.00288081169128 Seconds

500k? You won't notice the difference!

In [26]: a = np.random.choice(old_val, 500000)In [27]: timed()0.019248008728 Seconds


In vanilla Python, without the speed of numpy or pandas, this is one way:

a = [2, 3, 2, 5, 4, 4, 1, 2]val_old = [1, 2, 3, 4, 5]val_new = [2, 3, 4, 5, 1]expected_a_new = [3, 4, 3, 1, 5, 5, 2, 3]d = dict(zip(val_old, val_new))a_new = [d.get(e, e) for e in a]print a_new # [3, 4, 3, 1, 5, 5, 2, 3]print a_new == expected_a_new # True

The average time complexity for this algorithm is O(M + N) where M is the length of your "translation list" and N is the length of list a.