Create (sane/safe) filename from any (unsafe) string Create (sane/safe) filename from any (unsafe) string python python

Create (sane/safe) filename from any (unsafe) string


Python:

"".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()

this accepts Unicode characters but removes line breaks, etc.

example:

filename = u"ad\nbla'{-+\)(ç?"

gives: adblaç

editstr.isalnum() does alphanumeric on one step. – comment from queueoverflow below. danodonovan hinted on keeping a dot included.

    keepcharacters = (' ','.','_')    "".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()


My requirements were conservative ( the generated filenames needed to be valid on multiple operating systems, including some ancient mobile OSs ). I ended up with:

    "".join([c for c in text if re.match(r'\w', c)])

That white lists the alphanumeric characters ( a-z, A-Z, 0-9 ) and the underscore. The regular expression can be compiled and cached for efficiency, if there are a lot of strings to be matched. For my case, it wouldn't have made any significant difference.


More or less what has been mentioned here with regexp, but in reverse (replace any NOT listed):

>>> import re>>> filename = u"ad\nbla'{-+\)(ç1?">>> re.sub(r'[^\w\d-]','_',filename)u'ad_bla__-_____1_'