Replacing punctuation in a data frame based on punctuation list [duplicate] Replacing punctuation in a data frame based on punctuation list [duplicate] pandas pandas

Replacing punctuation in a data frame based on punctuation list [duplicate]


Use replace with correct regex would be easier:

In [41]:import pandas as pdpd.set_option('display.notebook_repr_html', False)df = pd.DataFrame({'text':['test','%hgh&12','abc123!!!','porkyfries']})dfOut[41]:         text0        test1     %hgh&122   abc123!!!3  porkyfries[4 rows x 1 columns]

use regex with the pattern which means not alphanumeric/whitespace

In [49]:df['text'] = df['text'].str.replace('[^\w\s]','')dfOut[49]:         text0        test1       hgh122      abc1233  porkyfries[4 rows x 1 columns]


For removing punctuation from a text column in your dataframme:

In:

import reimport stringrem = string.punctuationpattern = r"[{}]".format(rem)pattern

Out:

'[!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~]'

In:

df = pd.DataFrame({'text':['book...regh', 'book...', 'boo,', 'book. ', 'ball, ', 'ballnroll"', '"rope"', 'rick % ']})df

Out:

        text0  book...regh1      book...2         boo,3       book. 4       ball, 5   ballnroll"6       "rope"7      rick % 

In:

df['text'] = df['text'].str.replace(pattern, '')df

You can replace the pattern with your desired character. Ex - replace(pattern, '$')

Out:

        text0   bookregh1       book2        boo3      book 4      ball 5  ballnroll6       rope7     rick  


Translate is often considered the cleanest and fastest way to remove punctuation (source)

import stringtext = text.translate(None, string.punctuation.translate(None, '"'))

You may find that it works better to remove punctuation in 'a' before loading it into pandas.