How to get pdf filename with Python requests?
It is specified in an http header content-disposition
. So to extract the name you would do:
import red = r.headers['content-disposition']fname = re.findall("filename=(.+)", d)[0]
Name extracted from the string via regular expression (re
module).
Building on some of the other answers, here's how I do it. If there isn't a Content-Disposition
header, I parse it from the download URL:
import reimport requestsfrom requests.exceptions import RequestExceptionurl = 'http://www.example.com/downloads/sample.pdf'try: with requests.get(url) as r: fname = '' if "Content-Disposition" in r.headers.keys(): fname = re.findall("filename=(.+)", r.headers["Content-Disposition"])[0] else: fname = url.split("/")[-1] print(fname)except RequestException as e: print(e)
There are arguably better ways of parsing the URL string, but for simplicity I didn't want to involve any more libraries.
Apparently, for this particular resource it is in:
r.headers['content-disposition']
Don't know if it is always the case, though.