Find and replace substrings in a Pandas dataframe ignore case
Same as you'd do with the standard regex, using the i
flag.
df = df.replace('(?i)Number', 'NewWord', regex=True)
Granted, df.replace
is limiting in the sense that flags must be passed as part of the regex string (rather than flags). If this was using str.replace
, you could've used case=False
or flags=re.IGNORECASE
.
Simply use case=False
in str.replace
.
Example:
df = pd.DataFrame({'col':['this is a Number', 'and another NuMBer', 'number']})>>> df col0 this is a Number1 and another NuMBer2 numberdf['col'] = df['col'].str.replace('Number', 'NewWord', case=False)>>> df col0 this is a NewWord1 and another NewWord2 NewWord
[Edit]: In the case of having multiple columns you are looking for your substring in, you can select all columns with object
dtypes, and apply the above solution to them. Example:
>>> df col col2 col30 this is a Number numbernumbernumber 11 and another NuMBer x 22 number y 3str_columns = df.select_dtypes('object').columnsdf[str_columns] = (df[str_columns] .apply(lambda x: x.str.replace('Number', 'NewWord', case=False)))>>> df col col2 col30 this is a NewWord NewWordNewWordNewWord 11 and another NewWord x 22 NewWord y 3
Brutish. This only works if the whole string is either 'Number'
or 'NUMBER'
. It will not replace those within a larger string. And of course, it is limited to just those two words.
df.replace(['Number', 'NUMBER'], 'NewWord')
More Brute Force
If it wasn't obvious enough, this is far inferior to @coldspeed's answer
import redf.applymap(lambda x: re.sub('number', 'NewWord', x, flags=re.IGNORECASE))
Or with a cue from @coldspeed's answer
df.applymap(lambda x: re.sub('(?i)number', 'NewWord', x))