Map a NumPy array of strings to integers Map a NumPy array of strings to integers numpy numpy

Map a NumPy array of strings to integers


You can use np.unique with the return_inverse argument:

>>> lookupTable, indexed_dataSet = np.unique(dataSet, return_inverse=True)>>> lookupTablearray(['george', 'greg', 'kevin'],       dtype='<U21')>>> indexed_dataSetarray([2, 1, 0, 2])

If you like, you can reconstruct your original array from these two arrays:

>>> lookupTable[indexed_dataSet]array(['kevin', 'greg', 'george', 'kevin'],       dtype='<U21')

If you use pandas, lookupTable, indexed_dataSet = pd.factorize(dataSet) will achieve the same thing (and potentially be more efficient for large arrays).


np.searchsorted does the trick:

dataSet = np.array(['kevin', 'greg', 'george', 'kevin'], dtype='U21'), lut = np.sort(np.unique(dataSet))  # [u'george', u'greg', u'kevin']ind = np.searchsorted(lut,dataSet) # array([[2, 1, 0, 2]])