Download large file in python with requests

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):    local_filename = url.split('/')[-1]    # NOTE the stream=True parameter below    with requests.get(url, stream=True) as r:        r.raise_for_status()        with open(local_filename, 'wb') as f:            for chunk in r.iter_content(chunk_size=8192):                 # If you have chunk encoded response uncomment if                # and set chunk_size parameter to None.                #if chunk:                 f.write(chunk)    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

python download stream python-requests

It's much easier if you use Response.raw and shutil.copyfileobj():

import requestsimport shutildef download_file(url):    local_filename = url.split('/')[-1]    with requests.get(url, stream=True) as r:        with open(local_filename, 'wb') as f:            shutil.copyfileobj(r.raw, f)    return local_filename

This streams the file to disk without using excessive memory, and the code is simple.

Note: According to the documentation, Response.raw will not decode gzip and deflate transfer-encodings, so you will need to do this manually.

python download stream python-requests

Not exactly what OP was asking, but... it's ridiculously easy to do that with urllib:

from urllib.request import urlretrieveurl = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'dst = 'ubuntu-16.04.2-desktop-amd64.iso'urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file:

from urllib.request import urlopenfrom shutil import copyfileobjfrom tempfile import NamedTemporaryFileurl = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:    copyfileobj(fsrc, fdst)

I watched the process:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?

CodeHunter

Download large file in python with requests

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last