Turn a string into a valid filename? Turn a string into a valid filename? python python

Turn a string into a valid filename?


You can look at the Django framework for how they create a "slug" from arbitrary text. A slug is URL- and filename- friendly.

The Django text utils define a function, slugify(), that's probably the gold standard for this kind of thing. Essentially, their code is the following.

import unicodedataimport redef slugify(value, allow_unicode=False):    """    Taken from https://github.com/django/django/blob/master/django/utils/text.py    Convert to ASCII if 'allow_unicode' is False. Convert spaces or repeated    dashes to single dashes. Remove characters that aren't alphanumerics,    underscores, or hyphens. Convert to lowercase. Also strip leading and    trailing whitespace, dashes, and underscores.    """    value = str(value)    if allow_unicode:        value = unicodedata.normalize('NFKC', value)    else:        value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')    value = re.sub(r'[^\w\s-]', '', value.lower())    return re.sub(r'[-\s]+', '-', value).strip('-_')

And the older version:

def slugify(value):    """    Normalizes string, converts to lowercase, removes non-alpha characters,    and converts spaces to hyphens.    """    import unicodedata    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')    value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())    value = unicode(re.sub('[-\s]+', '-', value))    # ...    return value

There's more, but I left it out, since it doesn't address slugification, but escaping.


You can use list comprehension together with the string methods.

>>> s'foo-bar#baz?qux@127/\\9]'>>> "".join(x for x in s if x.isalnum())'foobarbazqux1279'


This whitelist approach (ie, allowing only the chars present in valid_chars) will work if there aren't limits on the formatting of the files or combination of valid chars that are illegal (like ".."), for example, what you say would allow a filename named " . txt" which I think is not valid on Windows. As this is the most simple approach I'd try to remove whitespace from the valid_chars and prepend a known valid string in case of error, any other approach will have to know about what is allowed where to cope with Windows file naming limitations and thus be a lot more complex.

>>> import string>>> valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)>>> valid_chars'-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'>>> filename = "This Is a (valid) - filename%$&$ .txt">>> ''.join(c for c in filename if c in valid_chars)'This Is a (valid) - filename .txt'