Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP
Just seeing that Salathe actually answered the same, but taking your comment into account and to stress this a bit more:
You do not need to specify any DTD. As long as you use the DOMDocument::loadHTML
or DOMDocument::loadHTMLFile
functions, the HTML id
attribute is actually registered for the the xpath id()
function. With the demo HTML given in http://jsbin.com/elatum/2/edit, you even get an error when you load the document:
Warning: DOMDocument::loadHTMLFile(): ID priceInfo already defined in ...
Which is already a sign that this is a true ID attribute because it moans about duplicates. A related sample code looks like:
$xpath = 'id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]';$doc = new DOMDocument();$doc->loadHTMLFile(__DIR__ . '/../data/file-11796340.html');$xp = new DOMXPath($doc);$r = $xp->query($xpath);echo $xpath, "\n";echo $r ? $r->length : 0, ' elements found', "\n";if (!$r) return;foreach($r as $node) { echo " - ", $node->nodeValue, "\n";}
The output is:
id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]1 elements found - hello
In case you need more control, first run an xpath to mark all HTML id
attributes as ID for xpath:
$r = $xp->query("//*[@id]");if ($r) foreach($r as $node) { $node->setIdAttribute('id', true);}
You can then use the same xpath with the id()
function, no need to change it.
Can't you just translate id("...")
to //*[@id="..."][1]
at the start of your expression?
For instance, if can assume you won't have any parentheses in the id(...)
expressions:
$queryRewritten = preg_replace('/^id\(([^\)]+)\)/','//*[@id=$1][1]',$query);
EDIT: corrected the replacement, id() imust be the first in the expression
This isn't a full answer but it's too big to put as a comment and it may help you a little.
If you have control over the input XML, then instead of using a DTD to declare id
attributes, you can declare them explicitly in the XML document itself by prefixing id
attributes with xml:
.
For example, if you had XML of
<foo id="x27"/>
and changed it to
<foo xml:id="x27"/>
then the id() function would recognise that attribute as a formal XML id
type, not just as an attribute with the name id
.
I know this "trick" works on the Saxon processor, but I must admit I've not tried it with PHP.