How to get pdf filename with Python requests? How to get pdf filename with Python requests? python python

How to get pdf filename with Python requests?


It is specified in an http header content-disposition. So to extract the name you would do:

import red = r.headers['content-disposition']fname = re.findall("filename=(.+)", d)[0]

Name extracted from the string via regular expression (re module).


Building on some of the other answers, here's how I do it. If there isn't a Content-Disposition header, I parse it from the download URL:

import reimport requestsfrom requests.exceptions import RequestExceptionurl = 'http://www.example.com/downloads/sample.pdf'try:    with requests.get(url) as r:        fname = ''        if "Content-Disposition" in r.headers.keys():            fname = re.findall("filename=(.+)", r.headers["Content-Disposition"])[0]        else:            fname = url.split("/")[-1]        print(fname)except RequestException as e:    print(e)

There are arguably better ways of parsing the URL string, but for simplicity I didn't want to involve any more libraries.


Apparently, for this particular resource it is in:

r.headers['content-disposition']

Don't know if it is always the case, though.