Scrape ajax web page with python and/or scrapy
The 'random link' looks like it has the form:
https://petitions.whitehouse.gov/signatures/more/petitionid/ pagenum/ lastpetitionwhere petitionid is static for a single petition, pagenum increments each time and lastpetition is returned each time from the request.
My usual approach would be to use the requests library to emulate a session for cookies and then work out what requests the browser is making.
import requestss=requests.session()url='http://httpbin.org/get'params = {'cat':'Persian', 'age':3, 'name':'Furball'} s.get(url, params=params)
I'd pay particular attention to the following link:
<a href="/petition/shut-down-tar-sands-project-utah-it-begins-and-reject-keystone-xl-pipeline/H1MQJGMW?page=2&last=50b5a1f9ee140f227a00000b" class="load-next no-follow active" rel="50ae9207eab72aed25000003">Load Next 20 Signatures</a>
It's hard to fully emulate Jquery/Javascript with Python.You could have a look on spidermonkeyor on web-testing automation tools like Selenium , which can fully automate any browser actions.Previous question on SO:How can Python work with javascript