Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP php php

Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP

Just seeing that Salathe actually answered the same, but taking your comment into account and to stress this a bit more:

You do not need to specify any DTD. As long as you use the DOMDocument::loadHTML or DOMDocument::loadHTMLFile functions, the HTML id attribute is actually registered for the the xpath id() function. With the demo HTML given in, you even get an error when you load the document:

Warning: DOMDocument::loadHTMLFile(): ID priceInfo already defined in ...

Which is already a sign that this is a true ID attribute because it moans about duplicates. A related sample code looks like:

$xpath = 'id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]';$doc = new DOMDocument();$doc->loadHTMLFile(__DIR__ . '/../data/file-11796340.html');$xp = new DOMXPath($doc);$r = $xp->query($xpath);echo $xpath, "\n";echo $r ? $r->length : 0, ' elements found', "\n";if (!$r) return;foreach($r as $node) {    echo " - ", $node->nodeValue, "\n";}

The output is:

id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]1 elements found - hello

In case you need more control, first run an xpath to mark all HTML id attributes as ID for xpath:

$r = $xp->query("//*[@id]");if ($r) foreach($r as $node) {    $node->setIdAttribute('id', true);}

You can then use the same xpath with the id() function, no need to change it.

Can't you just translate id("...") to //*[@id="..."][1] at the start of your expression?

For instance, if can assume you won't have any parentheses in the id(...) expressions:

$queryRewritten =   preg_replace('/^id\(([^\)]+)\)/','//*[@id=$1][1]',$query);

Sample code

EDIT: corrected the replacement, id() imust be the first in the expression

This isn't a full answer but it's too big to put as a comment and it may help you a little.

If you have control over the input XML, then instead of using a DTD to declare id attributes, you can declare them explicitly in the XML document itself by prefixing id attributes with xml:.

For example, if you had XML of

<foo id="x27"/>

and changed it to

<foo xml:id="x27"/>

then the id() function would recognise that attribute as a formal XML id type, not just as an attribute with the name id.

I know this "trick" works on the Saxon processor, but I must admit I've not tried it with PHP.

W3C xml:id