How to download a file using python in a 'smarter' way?
Download scripts like that tend to push a header telling the user-agent what to name the file:
Content-Disposition: attachment; filename="the filename.ext"
If you can grab that header, you can get the proper filename.
There's another thread that has a little bit of code to offer up for Content-Disposition
-grabbing.
remotefile = urllib2.urlopen('http://example.com/somefile.zip')remotefile.info()['Content-Disposition']
Based on comments and @Oli's anwser, I made a solution like this:
from os.path import basenamefrom urlparse import urlsplitdef url2name(url): return basename(urlsplit(url)[2])def download(url, localFileName = None): localName = url2name(url) req = urllib2.Request(url) r = urllib2.urlopen(req) if r.info().has_key('Content-Disposition'): # If the response has Content-Disposition, we take file name from it localName = r.info()['Content-Disposition'].split('filename=')[1] if localName[0] == '"' or localName[0] == "'": localName = localName[1:-1] elif r.url != url: # if we were redirected, the real file name we take from the final URL localName = url2name(r.url) if localFileName: # we can force to save the file as specified name localName = localFileName f = open(localName, 'wb') f.write(r.read()) f.close()
It takes file name from Content-Disposition; if it's not present, uses filename from the URL (if redirection happened, the final URL is taken into account).
Combining much of the above, here is a more pythonic solution:
import urllib2import shutilimport urlparseimport osdef download(url, fileName=None): def getFileName(url,openUrl): if 'Content-Disposition' in openUrl.info(): # If the response has Content-Disposition, try to get filename from it cd = dict(map( lambda x: x.strip().split('=') if '=' in x else (x.strip(),''), openUrl.info()['Content-Disposition'].split(';'))) if 'filename' in cd: filename = cd['filename'].strip("\"'") if filename: return filename # if no filename was found above, parse it out of the final URL. return os.path.basename(urlparse.urlsplit(openUrl.url)[2]) r = urllib2.urlopen(urllib2.Request(url)) try: fileName = fileName or getFileName(url,r) with open(fileName, 'wb') as f: shutil.copyfileobj(r,f) finally: r.close()