XPath pulling more than one match XPath pulling more than one match xml xml

XPath pulling more than one match


The part of your question that I think I understand is this:

Let's say that I want to match XML where each direct child of the root has an attribute begin that is smaller than the next sibling.

<node id="0" begin="2">    <node id="1" begin="2">        <node id="2" begin="2"/>        <node id="3" begin="3"/>    </node>    <node id="4" begin="5">        <node id="5" begin="5"/>    </node>    <node id="6" begin="6"/></node>

This XPath should match:

/node/node[number(@begin) < number(../node/@begin)]

Now, it's fairly clear why that gives you an error. Within the predicate, .. selects the node with id=0, this has three child nodes (with ids 1, 4, and 6), and each of these has a @begin attribute, so number(../node/@begin) is selecting a sequence of three attributes.

Your query doesn't seem in any way related to the prose requirement, namely

where each direct child of the root has an attribute begin that is smaller than the next sibling

The condition for that would be

node[every $n in node[position() lt last()] satisfies (number($n/@begin) lt number($n/following-sibling::node/@begin)]


Regarding the a-sequence-of-more-than-one-item-is-not-allowed exception you're facing, notice that XPath 2.0 and above, and XQuery, supports function invocation on path step (.../number()). That said, you can call number() on individual node passing a single begin attribute at a time to avoid the exception :

/node/node[number(@begin) < ../node/number(@begin)]

However, the predicate expression used in the XPath above evaluates to true when at least there is one sibling node with begin attribute value greater than begin attribute of current node, which seems not the desired behavior.

You can apply the same fix on the suggested XQuery, but apparently there was another similar problem due to lt being used to compare a value against a sequence of values (to be clear, I'm referring to the 2nd lt in the suggested XQuery). You can try the following, slightly modified, XQuery instead :

for $node in node[    every $n in node[position() lt last()]     satisfies not($n/following-sibling::node[number(@begin) lt number($n/@begin)])]return $node

"One way to go about is to use a numeric comparison of the begin attribute that is available in the corpus. It is numerical ascending, so if we want to ensure the order of XPath is intact, we can say that the numeric value of each child node of @cat="np" should be less than the next by using number()."

If I understand this correctly, you can use the following XPath :

/node/node[    not(        node[position() < last()]            [number(@begin) > following-sibling::node/number(@begin)]    )]

demo

The XPath should return all 2nd level node elements, where, for every child node except the last within current 2nd level node, none of the following-sibling node has a numerically lower value of begin attribute than that of current child node.

Given the following sample XML :

<node id="0" begin="2">    <node id="0" begin="1" cat="np">        <node id="1" begin="1" pt="art" text="the" />        <node id="2" begin="3" pt="n" text="time" />        <node id="3" begin="2" pt="adj" text="available" />    </node>    <node id="0" begin="1" cat="np">        <node id="1" begin="1" pt="art" text="the" />        <node id="2" begin="2" pt="adj" text="concerned" />        <node id="3" begin="3" pt="n" text="man" />    </node></node>

Only the 2nd node would be selected, for it is the only 2nd level node that have begin attribute values in ascending order :

<node id="0" begin="1" cat="np">   <node id="1" begin="1" pt="art" text="the"/>   <node id="2" begin="2" pt="adj" text="concerned"/>   <node id="3" begin="3" pt="n" text="man"/></node>

Update April 19th, 2017 :

"...However, I would like this cat="np" to match, and make the not() function less aggressive, i.e. only require that nodes specified in XPath (in this example rel="det" pt="vnw" lemma="die", and the two rel="mod" pt="adj" nodes) follow the order requirement where the begin attribute should be smaller than the next item of the XPath structure."

Then we need to add another predicate to specify those nodes within the not(), that is where we check the attribute order requirement :

node[(@rel="det" and @pt="vnw" and @lemma="die") or (@rel="mod" and @pt="adj")]    [position() < last()]    [number(@begin) >          following-sibling::node[(@rel="det" and @pt="vnw" and @lemma="die") or (@rel="mod" and @pt="adj")]/number(@begin)    ]

So the complete expression would be as follows :

//node[@cat="np" and     not(node[(@rel="det" and @pt="vnw" and @lemma="die") or (@rel="mod" and @pt="adj")]            [position() < last()]            [number(@begin) >                  following-sibling::node[                    (@rel="det" and @pt="vnw" and @lemma="die") or (@rel="mod" and @pt="adj")                 ]/number(@begin)            ]    )     and node[@rel="det" and @pt="vnw" and @lemma="die"]     and count(node[@rel="mod" and @pt="adj"]) > 1]

demo


in terms of your recursive search request:

Using //node[@pt=("art" or "adj" or "n")]/ancestor::* searches from the inner levels of your xml tree. In your sample xml this will return (per element group) each top level in a recursive manner.

For more info: http://www.w3.org/TR/xpath-30/