how to determine the filename of content downloaded with HTTP in Python? how to determine the filename of content downloaded with HTTP in Python? python python

how to determine the filename of content downloaded with HTTP in Python?


The rfc6266 library appears to do exactly what you need. It can parse raw headers, requests responses, and urllib2 responses. It's on PyPI.

Some examples:

>>> import rfc6266, requests>>> rfc6266.parse_headers('''Attachment; filename=example.html''').filename_unsafe'example.html'>>> rfc6266.parse_headers('''INLINE; FILENAME= "an example.html"''').filename_unsafe'an example.html'>>> rfc6266.parse_headers(    '''attachment; '''    '''filename*= UTF-8''%e2%82%ac%20rates''').filename_unsafe'€ rates'>>> rfc6266.parse_headers(    '''attachment; '''    '''filename="EURO rates"; '''    '''filename*=utf-8''%e2%82%ac%20rates''').filename_unsafe'€ rates'>>> r = requests.get('http://example.com/€ rates')>>> rfc6266.parse_requests_response(r).filename_unsafe'€ rates'

As a note, though: this library does not like nonstandard whitespace in the header.


if you don't really need the result in utf-8

def getFilename(s):  fname = re.findall("filename\*?=([^;]+)", s, flags=re.IGNORECASE)  print fname[0].strip().strip('"')

but if utf-8 is a must

def getFilename(s):    fname = re.findall("filename\*=([^;]+)", s, flags=re.IGNORECASE)    if not fname:        fname = re.findall("filename=([^;]+)", s, flags=re.IGNORECASE)    if "utf-8''" in fname[0].lower():        fname = re.sub("utf-8''", '', fname[0], flags=re.IGNORECASE)        fname = urllib.unquote(fname).decode('utf8')    else:        fname = fname[0]    # clean space and double quotes    print fname.strip().strip('"')# examplegetFilename('Attachment; filename=example.html')getFilename('INLINE; FILENAME= "an example.html"')getFilename("attachment;filename*= UTF-8''%e2%82%ac%20rates")getFilename("attachment; filename=\"EURO rates\";filename*=utf-8''%e2%82%ac%20rates")getFilename("attachment;filename=\"_____ _____ ___ __ ____ _____ Hekayt Bent.2017.mp3\";filename*=UTF-8''%D8%A7%D8%BA%D9%86%D9%8A%D9%87%20%D8%AD%D9%83%D8%A7%D9%8A%D8%A9%20%D8%A8%D9%86%D8%AA%20%D9%84%D9%80%20%D9%85%D8%AD%D9%85%D8%AF%20%D8%B4%D8%AD%D8%A7%D8%AA%D8%A9%20Hekayt%20Bent.2017.mp3")

result

example.htmlan example.html€ rates€ ratesاغنيه حكاية بنت لـ محمد شحاتة Hekayt Bent.2017.mp3