How can I download a PDF file from an URL where the PDF is embedded into the HTML?
You can download pdf using requests
and BeautifulSoup
libraries. In code below replace /Users/../aaa.pdf
with full path where document will be downloaded:
import requestsfrom bs4 import BeautifulSoupurl = 'http://www.nebraskadeedsonline.us/document.aspx?g5savSPtTDnumMn1bRBWoKqN6Gu65tBhDE9%2fVs5YdPg='response = requests.post(url)page = BeautifulSoup(response.text, "html.parser")VIEWSTATE = page.select_one("#__VIEWSTATE").attrs["value"]VIEWSTATEGENERATOR = page.select_one("#__VIEWSTATEGENERATOR").attrs["value"]EVENTVALIDATION = page.select_one("#__EVENTVALIDATION").attrs["value"]btnDocument = page.select_one("[name=btnDocument]").attrs["value"]data = { '__VIEWSTATE': VIEWSTATE, '__VIEWSTATEGENERATOR': VIEWSTATEGENERATOR, '__EVENTVALIDATION': EVENTVALIDATION, 'btnDocument': btnDocument}response = requests.post(url, data=data)with open('/Users/../aaa.pdf', 'wb') as f: f.write(response.content)