How can I remove non-ASCII characters but leave periods and spaces? How can I remove non-ASCII characters but leave periods and spaces? python python

How can I remove non-ASCII characters but leave periods and spaces?


You can filter all characters from the string that are not printable using string.printable, like this:

>>> s = "some\x00string. with\x15 funny characters">>> import string>>> printable = set(string.printable)>>> filter(lambda x: x in printable, s)'somestring. with funny characters'

string.printable on my machine contains:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c

EDIT: On Python 3, filter will return an iterable. The correct way to obtain a string back would be:

''.join(filter(lambda x: x in printable, s))


An easy way to change to a different codec, is by using encode() or decode(). In your case, you want to convert to ASCII and ignore all symbols that are not supported. For example, the Swedish letter å is not an ASCII character:

    >>>s = u'Good bye in Swedish is Hej d\xe5'    >>>s = s.encode('ascii',errors='ignore')    >>>print s    Good bye in Swedish is Hej d

Edit:

Python3: str -> bytes -> str

>>>"Hej då".encode("ascii", errors="ignore").decode()'hej d'

Python2: unicode -> str -> unicode

>>> u"hej då".encode("ascii", errors="ignore").decode()u'hej d'

Python2: str -> unicode -> str (decode and encode in reverse order)

>>> "hej d\xe5".decode("ascii", errors="ignore").encode()'hej d'


According to @artfulrobot, this should be faster than filter and lambda:

import rere.sub(r'[^\x00-\x7f]',r'', your-non-ascii-string) 

See more examples here Replace non-ASCII characters with a single space