Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP php php

Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP


Just seeing that Salathe actually answered the same, but taking your comment into account and to stress this a bit more:

You do not need to specify any DTD. As long as you use the DOMDocument::loadHTML or DOMDocument::loadHTMLFile functions, the HTML id attribute is actually registered for the the xpath id() function. With the demo HTML given in http://jsbin.com/elatum/2/edit, you even get an error when you load the document:

Warning: DOMDocument::loadHTMLFile(): ID priceInfo already defined in ...

Which is already a sign that this is a true ID attribute because it moans about duplicates. A related sample code looks like:

$xpath = 'id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]';$doc = new DOMDocument();$doc->loadHTMLFile(__DIR__ . '/../data/file-11796340.html');$xp = new DOMXPath($doc);$r = $xp->query($xpath);echo $xpath, "\n";echo $r ? $r->length : 0, ' elements found', "\n";if (!$r) return;foreach($r as $node) {    echo " - ", $node->nodeValue, "\n";}

The output is:

id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]1 elements found - hello

In case you need more control, first run an xpath to mark all HTML id attributes as ID for xpath:

$r = $xp->query("//*[@id]");if ($r) foreach($r as $node) {    $node->setIdAttribute('id', true);}

You can then use the same xpath with the id() function, no need to change it.


Can't you just translate id("...") to //*[@id="..."][1] at the start of your expression?

For instance, if can assume you won't have any parentheses in the id(...) expressions:

$queryRewritten =   preg_replace('/^id\(([^\)]+)\)/','//*[@id=$1][1]',$query);

Sample code

EDIT: corrected the replacement, id() imust be the first in the expression


This isn't a full answer but it's too big to put as a comment and it may help you a little.

If you have control over the input XML, then instead of using a DTD to declare id attributes, you can declare them explicitly in the XML document itself by prefixing id attributes with xml:.

For example, if you had XML of

<foo id="x27"/>

and changed it to

<foo xml:id="x27"/>

then the id() function would recognise that attribute as a formal XML id type, not just as an attribute with the name id.

I know this "trick" works on the Saxon processor, but I must admit I've not tried it with PHP.

W3C xml:id