Trouble fetching some title from a webpage Trouble fetching some title from a webpage curl curl

Trouble fetching some title from a webpage


It could very well be that there are more issues with your code than I have covered in this answer, but the most prominent issue that I see is the following:

DOMDocument::loadHTML() is not a static method, but an instance method (which returns a boolean). You should first create an instance of DOMDocument and then call loadHTML() on that instance:

$dom = new DOMDocument;$dom->loadHTML($xml);

However, since you have suppressed errors with the @ operator on that particular line, you are not receiving a warning about this. And although it's very commonly seen that the error suppressor operator @ is used to suppress HTML validation errors, like this, you should look into using libxml_use_internal_errors()1 instead, as this does not suppress general PHP errors.

$dom = new DOMDocument;$oldSetting = libxml_use_internal_errors(true);$dom->loadHTML($xml);libxml_use_internal_errors($oldSetting);

As a final note:
It's possible to load a DOM document from a URL directly (without the need for cURL) with DOMDocument::loadHTMLFile(), if your PHP installation is configured to allow loading of URLs via the configuration setting allow_url_fopen. Be aware though that this setting is often disabled for security reasons, so use it with care, if you plan on using it.


Here's a simple test-case which should work as expected:

<?php$html = '<html><head>  <title>DOMDocument test-case</title></head><body>  <div class="dummy-container">    <h1 _ngcontent-c0="" class="br-hdng"><span _ngcontent-c0="" class="pr dib">hair fall shamboo<!----></span></h1>  </div></body>';$dom = new DOMDocument;$oldSetting = libxml_use_internal_errors(true);$dom->loadHTML( $html );libxml_use_internal_errors($oldSetting);$xpath = new DOMXPath( $dom );$title = $xpath->query( '//h1[@class="br-hdng"]/span' )->item( 0 )->nodeValue;echo $title;

See this example interpreted online on 3v4l.org

You should replace the contents of $html with the output of your get_content() call. If it doesn't work, then either:

  1. there's something wrong with fetching the HTML with cURL (do var_dump( $html ); before loading into DOMDocument, for instance, to see the contents you retrieved), or...

  2. perhaps you are working inside a namespace, in which case you should prepend a backslash before DOMDocument and DOMXPath, i.e.: new \DOMDocument; and new \DOMXPath( $dom );.


1. LibXML is the XML library that is used by DOMDocument to parse XML/HTML documents.


What's wrong with php then?

php doesn't run javascript. presumably, puppeteer from your javascript code, as well as requests_html from your python code, both run javascript.

your problem is that this page loads the br-hdng title & products with javascript, it's not part of the HTML at all. it's all actually loaded from https://www.purplle.com/api/shop/itemsv3 , with a bunch of GET parameters, . you need to do JSON parsing here, not HTML parsing :) but before you can access that api, you need cookies given by the search page, and the search string must match the api search string (otherwise the api will just return errors), check this:

<?phpdeclare(strict_types = 0);header ( "Content-Type: text/plain;charset=UTF-8" );$ch = curl_init ();curl_setopt_array ( $ch, array (        CURLOPT_ENCODING => '',        CURLOPT_COOKIEFILE => '', // enables cookie handling without saving them anywhere. this page requires cookie handling.        CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0', // 'libcurl/? PHP/' . PHP_VERSION, // many websites block requests without a useragent        CURLOPT_RETURNTRANSFER => 1 ) );// we don't care what's on this page, we just need to fetch it to create a cookie session.$search_query = 'hair fall shamboo';curl_setopt ( $ch, CURLOPT_URL, 'https://www.purplle.com/search?q=' . rawurlencode ( $search_query ) );curL_exec ( $ch );$url = 'https://www.purplle.com/api/shop/itemsv3?' . http_build_query ( array (        'list_type' => 'search',        'custom' => '',        'list_type_value' => $search_query,        'page' => '1',        'sort_by' => 'rel',        'elite' => '0' ) );// $url = 'https://www.purplle.com/api/shop/itemsv3?list_type=search&custom=&list_type_value=hair%20fall%20shamboo&page=1&sort_by=rel&elite=0';// $out = tmpfile ();// curl_setopt_array ( $ch, array (// CURLOPT_HTTPHEADER => array (// 'Accept: application/json, text/plain, */*',// 'Accept-Language: en-US,en;q=0.5',// 'Referer: https://www.purplle.com/search?q=hair%20fall%20shamboo',// // Cookie: __cfduid=d3199415b5ce18cbff2779802b1f843331544901552; csrftoken=f8f18b5deae92972f63343e13c6a460b; purpllesession=hedxkc%2FkdGye%2BYi6ebmJktUN1LeqA5rdVXu96%2F0j0yqtP2xZ8LfwpK8daXqPSkeZulO9ZvqpMYXTmY8oMD03VcG9vdKGBm30R9fU%2FQygtXBFhZvfvsu0scyaL3FqHbePp2zG45MevWU961eg82KAkCuHk0qFM8URQBRyYV5gg8TeqnTPgI3tF87H5nJ%2BmfO4pn%2BRWmIuWXvgNXAO%2F8GEaH6lJVl17QZm9c5vwi10OYeLfmSdIMy6V2Pp0ZjLTzuFw2de5jpR0zsbHHKZ0C2e548PiDl3taHIE5wuZO4HYIeXUqTpE98%2Fo3kztoU1bTlXGZgu%2FxVQ3EWLRFWQ2t57UawA%2FuERlD8vvOyFGbYHGAWVxgFTR%2FObAhFLHns5kqoj; _autm30d=null; visitorppl=NZ5tqQpGlFYWg2MrDl1302113161544901552; session_initiated=Direct; _tmpsess=1; token=desktop_5c1553b07c61c_7955_16122018; __uzma=5c1553b085a480.63440826; __uzmb=1544901552; __uzmc=632121030774; __uzmd=1544901552// 'Connection: keep-alive'// ),// // CURLOPT_CONNECT_TO=>array('www.purplle.com:443:dumpinput.ratma.net:80'),// CURLOPT_STDERR => $out,// CURLOPT_VERBOSE => 1// ) );// var_dump ( $url );curl_setopt ( $ch, CURLOPT_URL, $url );$json = curl_exec ( $ch );$data = json_decode ( $json, true );// var_dump ($json, $data );$title = $data ['list_title'];echo 'title: ' . $title . "\n";foreach ( $data ['items'] as $item ) {    echo "name: ", $item ['name'], "\n";}

output:

title: hair fall shambooname: VLCC Hair fall Shampoo 350 ML (Buy1 Get1) & Ayurveda Hair Oil Combo (470 ml)name: Biotique Bio Kelp Protein Shampoo For Falling Hair (190 ml)name: Biotique Fresh Texture Shampoo - Bio Henna Leaf (120 ml)name: Good Vibes Scalp Purifying Shampoo -Neem And Aloe Vera (200 ml)name: Khadi Shikakai Sat Hair Cleanser Scalp Therapy (210 ml) By Swati Gramodyogname: Good Vibes Apple Cider Vinegar Shampoo (120 ml)name: Good Vibes Refreshing Shampoo - Green Apple (200 ml)name: Good Vibes Hydrating Shampoo -Marigold (200 ml)name: Alps Goodness Smoothening Shampoo - Keratin (50 ml)name: Alps Goodness Softening Shampoo - Coconut & Almond (50 ml)name: Alps Goodness Split End Control Shampoo - Coconut, Garlic & Shea Butter (50 ml)name: Passion Indulge Papain Shampoo & Conditioner For Soft & Shiny Hair (200 ml + 100 ml)name: Good Vibes Apple Cider Vinegar Shampoo (200 ml)name: Alps Goodness Split End Control Shampoo - Coconut, Garlic & Shea Butter (200 ml)name: Alps Goodness Nourishing Shampoo - Argan Oil & Olive (200 ml)name: Alps Goodness Moisturizing Shampoo - Ginger & Egg (200 ml)name: Alps Goodness Conditioning Shampoo - Pure Honey (200 ml)name: Alps Goodness Hydrating Shampoo - Tea Tree (200 ml)name: Alps Goodness Smoothening Shampoo - Keratin (200 ml)name: Alps Goodness Softening Shampoo - Coconut & Almond (200 ml)name: Good Vibes Scalp Purifying Shampoo -Neem And Aloe Vera (120 ml)name: Good Vibes Hydrating Shampoo - Marigold (120 ml)name: Alps Goodness Conditioning Shampoo - Pure Honey (50 ml)name: Alps Goodness Moisturizing Shampoo - Ginger & Egg (50 ml)