Download large file in python with requests Download large file in python with requests python python

Download large file in python with requests

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):    local_filename = url.split('/')[-1]    # NOTE the stream=True parameter below    with requests.get(url, stream=True) as r:        r.raise_for_status()        with open(local_filename, 'wb') as f:            for chunk in r.iter_content(chunk_size=8192):                 # If you have chunk encoded response uncomment if                # and set chunk_size parameter to None.                #if chunk:                 f.write(chunk)    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

It's much easier if you use Response.raw and shutil.copyfileobj():

import requestsimport shutildef download_file(url):    local_filename = url.split('/')[-1]    with requests.get(url, stream=True) as r:        with open(local_filename, 'wb') as f:            shutil.copyfileobj(r.raw, f)    return local_filename

This streams the file to disk without using excessive memory, and the code is simple.

Note: According to the documentation, Response.raw will not decode gzip and deflate transfer-encodings, so you will need to do this manually.

Not exactly what OP was asking, but... it's ridiculously easy to do that with urllib:

from urllib.request import urlretrieveurl = ''dst = 'ubuntu-16.04.2-desktop-amd64.iso'urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file:

from urllib.request import urlopenfrom shutil import copyfileobjfrom tempfile import NamedTemporaryFileurl = ''with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:    copyfileobj(fsrc, fdst)

I watched the process:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?