How can I remove non-ASCII characters but leave periods and spaces?
You can filter all characters from the string that are not printable using string.printable, like this:
>>> s = "some\x00string. with\x15 funny characters">>> import string>>> printable = set(string.printable)>>> filter(lambda x: x in printable, s)'somestring. with funny characters'
string.printable on my machine contains:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c
EDIT: On Python 3, filter will return an iterable. The correct way to obtain a string back would be:
''.join(filter(lambda x: x in printable, s))
An easy way to change to a different codec, is by using encode() or decode(). In your case, you want to convert to ASCII and ignore all symbols that are not supported. For example, the Swedish letter å is not an ASCII character:
>>>s = u'Good bye in Swedish is Hej d\xe5' >>>s = s.encode('ascii',errors='ignore') >>>print s Good bye in Swedish is Hej d
Edit:
Python3: str -> bytes -> str
>>>"Hej då".encode("ascii", errors="ignore").decode()'hej d'
Python2: unicode -> str -> unicode
>>> u"hej då".encode("ascii", errors="ignore").decode()u'hej d'
Python2: str -> unicode -> str (decode and encode in reverse order)
>>> "hej d\xe5".decode("ascii", errors="ignore").encode()'hej d'