How can I download a file on a click event using selenium?

python selenium selenium-webdriver web-scraping

Find the link using find_element(s)_by_*, then call click method.

from selenium import webdriver# To prevent download dialogprofile = webdriver.FirefoxProfile()profile.set_preference('browser.download.folderList', 2) # custom locationprofile.set_preference('browser.download.manager.showWhenStarting', False)profile.set_preference('browser.download.dir', '/tmp')profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')browser = webdriver.Firefox(profile)browser.get("http://www.drugcite.com/?q=ACTIMMUNE")browser.find_element_by_id('exportpt').click()browser.find_element_by_id('exporthlgt').click()

Added profile manipulation code to prevent download dialog.

python selenium selenium-webdriver web-scraping

I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.

Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...

Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay

The Magic

import pyvirtualdisplayimport seleniumimport selenium.webdriverimport timeimport base64import jsonroot_url = 'https://www.google.com'download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'print('Opening virtual display')display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))display.start()print('\tDone')print('Opening web browser')driver = selenium.webdriver.Firefox()#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a tryprint('\tDone')print('Retrieving initial web page')driver.get(root_url)print('\tDone')print('Injecting retrieval code into web page')driver.execute_script("""    window.file_contents = null;    var xhr = new XMLHttpRequest();    xhr.responseType = 'blob';    xhr.onload = function() {        var reader  = new FileReader();        reader.onloadend = function() {            window.file_contents = reader.result;        };        reader.readAsDataURL(xhr.response);    };    xhr.open('GET', %(download_url)s);    xhr.send();""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {    'download_url': json.dumps(download_url),})print('Looping until file is retrieved')downloaded_file = Nonewhile downloaded_file is None:    # Returns the file retrieved base64 encoded (perfect for downloading binary)    downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')    print(downloaded_file)    if not downloaded_file:        print('\tNot downloaded, waiting...')        time.sleep(0.5)print('\tDone')print('Writing file to disk')fp = open('google-logo.png', 'wb')fp.write(base64.b64decode(downloaded_file))fp.close()print('\tDone')driver.close() # close web browser, or it'll persist after python exits.display.popen.kill() # close virtual display, or it'll persist after python exits.

Explaination

We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.

Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.

Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.

This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.

Alternate Approach

While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.

python selenium selenium-webdriver web-scraping

In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:

docs = document  .querySelector('downloads-manager')  .shadowRoot.querySelector('#downloads-list')  .getElementsByTagName('downloads-item')

This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)

CodeHunter

How can I download a file on a click event using selenium?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last