Combining CURL and simple html dom Combining CURL and simple html dom curl curl

Combining CURL and simple html dom


Try changing this:

$html->load($curl_scraped_page);

To this:

$html->load($curl_scraped_page, true, false);

The problem is that simple_html_dom removes all \r \n by default and in this case it breaks javascript code since yahoo don't end it with a semicolon.

You can see this error at the browser console and you can also see that simple_html_dom removes linebreaks viewing the source.


I think I would add a function to the class

function loadWithoutRemovingStuff($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT){    $this->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);    while ($this->parse());    $this->root->_[HDOM_INFO_END] = $this->cursor;    $this->parse_charset();    return $this;}

and then call that function instead of the default load function.

Or, since everything is public in this class,

 $html = new simple_html_dom(); $html->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText); while ($html->parse()); $html->root->_[HDOM_INFO_END] = $html->cursor; $html->parse_charset();

but the first way is better (cleaner)