How to download a file using python in a 'smarter' way?

Download scripts like that tend to push a header telling the user-agent what to name the file:

Content-Disposition: attachment; filename="the filename.ext"

If you can grab that header, you can get the proper filename.

There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

remotefile = urllib2.urlopen('http://example.com/somefile.zip')remotefile.info()['Content-Disposition']

python http download

Based on comments and @Oli's anwser, I made a solution like this:

from os.path import basenamefrom urlparse import urlsplitdef url2name(url):    return basename(urlsplit(url)[2])def download(url, localFileName = None):    localName = url2name(url)    req = urllib2.Request(url)    r = urllib2.urlopen(req)    if r.info().has_key('Content-Disposition'):        # If the response has Content-Disposition, we take file name from it        localName = r.info()['Content-Disposition'].split('filename=')[1]        if localName[0] == '"' or localName[0] == "'":            localName = localName[1:-1]    elif r.url != url:         # if we were redirected, the real file name we take from the final URL        localName = url2name(r.url)    if localFileName:         # we can force to save the file as specified name        localName = localFileName    f = open(localName, 'wb')    f.write(r.read())    f.close()

It takes file name from Content-Disposition; if it's not present, uses filename from the URL (if redirection happened, the final URL is taken into account).

python http download

Combining much of the above, here is a more pythonic solution:

import urllib2import shutilimport urlparseimport osdef download(url, fileName=None):    def getFileName(url,openUrl):        if 'Content-Disposition' in openUrl.info():            # If the response has Content-Disposition, try to get filename from it            cd = dict(map(                lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),                openUrl.info()['Content-Disposition'].split(';')))            if 'filename' in cd:                filename = cd['filename'].strip("\"'")                if filename: return filename        # if no filename was found above, parse it out of the final URL.        return os.path.basename(urlparse.urlsplit(openUrl.url)[2])    r = urllib2.urlopen(urllib2.Request(url))    try:        fileName = fileName or getFileName(url,r)        with open(fileName, 'wb') as f:            shutil.copyfileobj(r,f)    finally:        r.close()

CodeHunter

How to download a file using python in a 'smarter' way?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last