Remap values in pandas column with a dict

You can use .replace. For example:

>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}})>>> di = {1: "A", 2: "B"}>>> df  col1 col20    w    a1    1    22    2  NaN>>> df.replace({"col1": di})  col1 col20    w    a1    A    22    B  NaN

or directly on the Series, i.e. df["col1"].replace(di, inplace=True).

python dictionary pandas remap

`map` can be much faster than `replace`

If your dictionary has more than a couple of keys, using map can be much faster than replace. There are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values (and also whether you want non-matches to keep their values or be converted to NaNs):

Exhaustive Mapping

In this case, the form is very simple:

df['col1'].map(di)       # note: if the dictionary does not exhaustively map all                         # entries then non-matched entries are changed to NaNs

Although map most commonly takes a function as its argument, it can alternatively take a dictionary or series: Documentation for Pandas.series.map

Non-Exhaustive Mapping

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna:

df['col1'].map(di).fillna(df['col1'])

as in @jpp's answer here: Replace values in a pandas series via dictionary efficiently

Benchmarks

Using the following data with pandas version 0.23.1:

di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })

and testing with %timeit, it appears that map is approximately 10x faster than replace.

Note that your speedup with map will vary with your data. The largest speedup appears to be with large dictionaries and exhaustive replaces. See @jpp answer (linked above) for more extensive benchmarks and discussion.

python dictionary pandas remap

There is a bit of ambiguity in your question. There are at least ~~three~~ two interpretations:

the keys in di refer to index values
the keys in di refer to df['col1'] values
the keys in di refer to index locations (not the OP's question, but thrown in for fun.)

Below is a solution for each case.

Case 1:If the keys of di are meant to refer to index values, then you could use the update method:

df['col1'].update(pd.Series(di))

For example,

import pandas as pdimport numpy as npdf = pd.DataFrame({'col1':['w', 10, 20],                   'col2': ['a', 30, np.nan]},                  index=[1,2,0])#   col1 col2# 1    w    a# 2   10   30# 0   20  NaNdi = {0: "A", 2: "B"}# The value at the 0-index is mapped to 'A', the value at the 2-index is mapped to 'B'df['col1'].update(pd.Series(di))print(df)

yields

  col1 col21    w    a2    B   300    A  NaN

I've modified the values from your original post so it is clearer what update is doing.Note how the keys in di are associated with index values. The order of the index values -- that is, the index locations -- does not matter.

Case 2:If the keys in di refer to df['col1'] values, then @DanAllan and @DSM show how to achieve this with replace:

import pandas as pdimport numpy as npdf = pd.DataFrame({'col1':['w', 10, 20],                   'col2': ['a', 30, np.nan]},                  index=[1,2,0])print(df)#   col1 col2# 1    w    a# 2   10   30# 0   20  NaNdi = {10: "A", 20: "B"}# The values 10 and 20 are replaced by 'A' and 'B'df['col1'].replace(di, inplace=True)print(df)

yields

  col1 col21    w    a2    A   300    B  NaN

Note how in this case the keys in di were changed to match values in df['col1'].

Case 3:If the keys in di refer to index locations, then you could use

df['col1'].put(di.keys(), di.values())

since

df = pd.DataFrame({'col1':['w', 10, 20],                   'col2': ['a', 30, np.nan]},                  index=[1,2,0])di = {0: "A", 2: "B"}# The values at the 0 and 2 index locations are replaced by 'A' and 'B'df['col1'].put(di.keys(), di.values())print(df)

yields

  col1 col21    A    a2   10   300    B  NaN

Here, the first and third rows were altered, because the keys in di are 0 and 2, which with Python's 0-based indexing refer to the first and third locations.

CodeHunter

Remap values in pandas column with a dict

`map` can be much faster than `replace`

Exhaustive Mapping

Non-Exhaustive Mapping

Benchmarks

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Remap values in pandas column with a dict

map can be much faster than replace

Exhaustive Mapping

Non-Exhaustive Mapping

Benchmarks

Recent Posts

`map` can be much faster than `replace`