cURL Scrape then Parse/Find Specific Content cURL Scrape then Parse/Find Specific Content curl curl

cURL Scrape then Parse/Find Specific Content


You can divide your problem in several parts.

  1. Retrieving the data from the data source.For that, you can possibly use CURL or file_get_contents() depending on your requirements. Code examples are everywhere. http://php.net/manual/en/function.file-get-contents.php and http://php.net/manual/en/curl.examples-basic.php

  2. Parsing the retrieved data.For that, i would start by looking into "PHP Simple HTML DOM Parser" You can use it to extract data from an HTML string. http://simplehtmldom.sourceforge.net/

  3. Building and generating the output.This is simply a question of what you want to do with the data that you have extracted. For example, you can print it, reformat it, or store it to a database/file.


I suggest you use a rready made scaper. I use Goutte (https://github.com/FriendsOfPHP/Goutte) which allows me to load website content and traverse it in the same way you do with jQuery. i.e. if I want the content of the <div id="content"> I use $client->filter('#content')->text()

It even allows me to find and 'click' on links and submit forms to retreive and process the content.

It makes life soooooooo mucn easier than using cURL or file_get_contentsa() and working your way through the html manually