Selenium pdf automatic download not working
Disable the built-in pdfjs
plugin and navigate to the URL - the PDF file would be downloaded automatically, the code:
from selenium import webdriverfp = webdriver.FirefoxProfile()fp.set_preference("browser.download.folderList", 2)fp.set_preference("browser.download.manager.showWhenStarting",False)fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf,application/x-pdf")fp.set_preference("pdfjs.disabled", "true") # < KEY PART HEREbrowser = webdriver.Firefox(firefox_profile=fp)browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
Update (the complete code that worked for me):
from selenium import webdrivermime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"fp = webdriver.FirefoxProfile()fp.set_preference("browser.download.folderList", 2)fp.set_preference("browser.download.manager.showWhenStarting", False)fp.set_preference("browser.download.dir", "/home/aafanasiev/Downloads")fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)fp.set_preference("pdfjs.disabled", True)browser = webdriver.Firefox(firefox_profile=fp)browser.get("http://epaper.dinamalar.com/")webobj_get_link = browser.find_element_by_id("liSavePdf")webobj_get_object = webobj_get_link.find_element_by_tag_name("a")webobj_get_object.click()
I tested the following code and I succesfully downloaded your pdf on Windows 7:
fp = webdriver.FirefoxProfile()fp.set_preference("browser.download.folderList", 2)fp.set_preference("browser.download.manager.showWhenStarting", False)fp.set_preference("browser.download.dir", download_location)fp.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")fp.set_preference("pdfjs.disabled", True)fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")driver = webdriver.Firefox(fp)driver.implicitly_wait(10)driver.maximize_window()driver.get("http://epaper.dinamalar.com/")element = driver.find_element_by_css_selector("li#liSavePdf>a>img")element.click()
Since there is not HTML code available, my guess is that this line
webobj = browser.find_element_by_id("download").click();
actually calls the onclick
event, but you don't handle it properly. In other words, what you're missing is the location where this .pdf file will be stored. I have very little experience with python programming, but one solution could be to use HTTP webclient lib, that will allow you to automatically download files. Something like CSharp's WebClient.DownloadFile Method (String, String). And if used properly, you can skip any Selenium commands for this action.
Maybe something like this post will be a good start.