Scrape web pages in real time with Node.js

javascript jquery node.js screen-scraping web-scraping

Node.io seems to take the cake :-)

javascript jquery node.js screen-scraping web-scraping

All aforementioned solutions presume running the scraper locally. This means you will be severely limited in performance (due to running them in sequence or in a limited set of threads). A better approach, imho, is to rely on an existing, albeit commercial, scraping grid.

Here is an example:

var bobik = new Bobik("YOUR_AUTH_TOKEN");bobik.scrape({  urls: ['amazon.com', 'zynga.com', 'http://finance.google.com/', 'http://shopping.yahoo.com'],  queries:  ["//th", "//img/@src", "return document.title", "return $('script').length", "#logo", ".logo"]}, function (scraped_data) {  if (!scraped_data) {    console.log("Data is unavailable");    return;  }  var scraped_urls = Object.keys(scraped_data);  for (var url in scraped_urls)    console.log("Results from " + url + ": " + scraped_data[scraped_urls[url]]);});

Here, scraping is performed remotely and a callback is issued to your code only when results are ready (there is also an option to collect results as they become available).

You can download Bobik client proxy SDK at https://github.com/emirkin/bobik_javascript_sdk

javascript jquery node.js screen-scraping web-scraping

I've been doing research myself, and https://npmjs.org/package/wscraper boasts itself as a

a web scraper agent based on cheerio.js a fast, flexible, and lean implementation of core jQuery; built on top of request.js; inspired by http-agent.js

Very low usage (according to npmjs.org) but worth a look for any interested parties.

CodeHunter

Scrape web pages in real time with Node.js

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last