Download large file in python with requests
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url): local_filename = url.split('/')[-1] # NOTE the stream=True parameter below with requests.get(url, stream=True) as r: r.raise_for_status() with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): # If you have chunk encoded response uncomment if # and set chunk_size parameter to None. #if chunk: f.write(chunk) return local_filename
Note that the number of bytes returned using iter_content
is not exactly the chunk_size
; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See body-content-workflow and Response.iter_content for further reference.
It's much easier if you use Response.raw
and shutil.copyfileobj()
:
import requestsimport shutildef download_file(url): local_filename = url.split('/')[-1] with requests.get(url, stream=True) as r: with open(local_filename, 'wb') as f: shutil.copyfileobj(r.raw, f) return local_filename
This streams the file to disk without using excessive memory, and the code is simple.
Note: According to the documentation, Response.raw
will not decode gzip
and deflate
transfer-encodings, so you will need to do this manually.
Not exactly what OP was asking, but... it's ridiculously easy to do that with urllib
:
from urllib.request import urlretrieveurl = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'dst = 'ubuntu-16.04.2-desktop-amd64.iso'urlretrieve(url, dst)
Or this way, if you want to save it to a temporary file:
from urllib.request import urlopenfrom shutil import copyfileobjfrom tempfile import NamedTemporaryFileurl = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst: copyfileobj(fsrc, fdst)
I watched the process:
watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'
And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?