PHP Magento Screen Scraping PHP Magento Screen Scraping curl curl

PHP Magento Screen Scraping


Using stream_context_create you can specify headers to be sent when calling your file_get_contents.

What I'd suggest is, open your browser and login to the site. Open up Firebug (or your favorite Cookie viewer) and grab the cookies and send them with your request.

Edit: Here's an example from PHP.net:

<?php// Create a stream$opts = array(  'http'=>array(    'method'=>"GET",    'header'=>"Accept-language: en\r\n" .              "Cookie: foo=bar\r\n"  ));$context = stream_context_create($opts);// Open the file using the HTTP headers set above$file = file_get_contents('http://www.example.com/', false, $context);?>

Edit (2): This is out of the scope of your question, but if you are wondering how to scrape the website afterwards you could look into the DOMDocument::loadHTML method. This will essentially give you the required functions (i.e. XPath query, getElementsByTagName, getElementsById) to scrape what you need.

If you want to scrape something simple, you can also use RegEx with preg_match_all.


If you're familiar with CURL this should be relatively simple to do in a day or so. I've created some similar apps to login to banks to retrieve data - which of course also require authentication.

Below is a link with an example of how to use CURL with cookies for authentication purposes:

http://coderscult.com/php/php-curl/2008/05/20/php-curl-cookies-example/

If you can grab the output of the page you can parse for your results with a regex. Alternatively, you can use a class like Snoopy to do this work for you:

http://sourceforge.net/projects/snoopy/