Nodejs scraping website after javascript has loaded the values
You would be better of using something like casperjs http://casperjs.org/. It is a testing utility based on phantomjs. It is basically exactly like opening the page in a webkit browser, just without the GUI. You could write something like. I dont think it works with node, but it should be easy enough to run a casper script and pipe the output back to node.:
var casper = require('casper').create({ loadImages: true, loadPlugins: true, verbose: true, //logLevel: 'info', clientScripts: [ 'jquery-1.7.1.min.js', ], viewportSize: { width: 1366, height: 768, }, pageSettings: { javascriptEnabled: true, userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5', },});casper.start(url);casper.thenEvaluate(function () { //javascript code to run in the scope of the page});
First off, how are you using jsdom? Apparently, jsdom.env
does not execute scripts in the DOM, only the scripts that you add in the call to jsdom.env
. If you want to execute scripts, I think you should use jsdom.jsdom
.
Second, you need to specify an onload
handler. This should execute after the document is ready, and hopefully any scripts will have changed the DOM to your liking.
Something like this:
var jsdom = require('jsdom').jsdom , document = jsdom(html) , window = document.createWindow();document.onload = function() { // Do your stuff}