Nodejs scraping website after javascript has loaded the values Nodejs scraping website after javascript has loaded the values node.js node.js

Nodejs scraping website after javascript has loaded the values


You would be better of using something like casperjs http://casperjs.org/. It is a testing utility based on phantomjs. It is basically exactly like opening the page in a webkit browser, just without the GUI. You could write something like. I dont think it works with node, but it should be easy enough to run a casper script and pipe the output back to node.:

var casper = require('casper').create({    loadImages: true,    loadPlugins: true,    verbose: true,    //logLevel: 'info',    clientScripts: [        'jquery-1.7.1.min.js',    ],    viewportSize: {        width: 1366,        height: 768,    },    pageSettings: {        javascriptEnabled: true,        userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5',    },});casper.start(url);casper.thenEvaluate(function () {    //javascript code to run in the scope of the page});


First off, how are you using jsdom? Apparently, jsdom.env does not execute scripts in the DOM, only the scripts that you add in the call to jsdom.env. If you want to execute scripts, I think you should use jsdom.jsdom.

Second, you need to specify an onload handler. This should execute after the document is ready, and hopefully any scripts will have changed the DOM to your liking.

Something like this:

var jsdom = require('jsdom').jsdom  , document = jsdom(html)  , window = document.createWindow();document.onload = function() {  // Do your stuff}