How to download a file using python in a 'smarter' way? How to download a file using python in a 'smarter' way? python python

How to download a file using python in a 'smarter' way?


Download scripts like that tend to push a header telling the user-agent what to name the file:

Content-Disposition: attachment; filename="the filename.ext"

If you can grab that header, you can get the proper filename.

There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

remotefile = urllib2.urlopen('http://example.com/somefile.zip')remotefile.info()['Content-Disposition']


Based on comments and @Oli's anwser, I made a solution like this:

from os.path import basenamefrom urlparse import urlsplitdef url2name(url):    return basename(urlsplit(url)[2])def download(url, localFileName = None):    localName = url2name(url)    req = urllib2.Request(url)    r = urllib2.urlopen(req)    if r.info().has_key('Content-Disposition'):        # If the response has Content-Disposition, we take file name from it        localName = r.info()['Content-Disposition'].split('filename=')[1]        if localName[0] == '"' or localName[0] == "'":            localName = localName[1:-1]    elif r.url != url:         # if we were redirected, the real file name we take from the final URL        localName = url2name(r.url)    if localFileName:         # we can force to save the file as specified name        localName = localFileName    f = open(localName, 'wb')    f.write(r.read())    f.close()

It takes file name from Content-Disposition; if it's not present, uses filename from the URL (if redirection happened, the final URL is taken into account).


Combining much of the above, here is a more pythonic solution:

import urllib2import shutilimport urlparseimport osdef download(url, fileName=None):    def getFileName(url,openUrl):        if 'Content-Disposition' in openUrl.info():            # If the response has Content-Disposition, try to get filename from it            cd = dict(map(                lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),                openUrl.info()['Content-Disposition'].split(';')))            if 'filename' in cd:                filename = cd['filename'].strip("\"'")                if filename: return filename        # if no filename was found above, parse it out of the final URL.        return os.path.basename(urlparse.urlsplit(openUrl.url)[2])    r = urllib2.urlopen(urllib2.Request(url))    try:        fileName = fileName or getFileName(url,r)        with open(fileName, 'wb') as f:            shutil.copyfileobj(r,f)    finally:        r.close()