Combining CURL and simple html dom
Try changing this:
$html->load($curl_scraped_page);
To this:
$html->load($curl_scraped_page, true, false);
The problem is that simple_html_dom removes all \r \n by default and in this case it breaks javascript code since yahoo don't end it with a semicolon.
You can see this error at the browser console and you can also see that simple_html_dom removes linebreaks viewing the source.
I think I would add a function to the class
function loadWithoutRemovingStuff($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT){ $this->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText); while ($this->parse()); $this->root->_[HDOM_INFO_END] = $this->cursor; $this->parse_charset(); return $this;}
and then call that function instead of the default load
function.
Or, since everything is public in this class,
$html = new simple_html_dom(); $html->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText); while ($html->parse()); $html->root->_[HDOM_INFO_END] = $html->cursor; $html->parse_charset();
but the first way is better (cleaner)