Scrape ajax web page with python and/or scrapy Scrape ajax web page with python and/or scrapy ajax ajax

Scrape ajax web page with python and/or scrapy


The 'random link' looks like it has the form:

https://petitions.whitehouse.gov/signatures/more/petitionid/ pagenum/ lastpetitionwhere petitionid is static for a single petition, pagenum increments each time and lastpetition is returned each time from the request.

My usual approach would be to use the requests library to emulate a session for cookies and then work out what requests the browser is making.

import requestss=requests.session()url='http://httpbin.org/get'params = {'cat':'Persian',          'age':3,          'name':'Furball'}             s.get(url, params=params)

I'd pay particular attention to the following link:

<a href="/petition/shut-down-tar-sands-project-utah-it-begins-and-reject-keystone-xl-pipeline/H1MQJGMW?page=2&last=50b5a1f9ee140f227a00000b" class="load-next no-follow active" rel="50ae9207eab72aed25000003">Load Next 20 Signatures</a>


It's hard to fully emulate Jquery/Javascript with Python.You could have a look on spidermonkeyor on web-testing automation tools like Selenium , which can fully automate any browser actions.Previous question on SO:How can Python work with javascript