In R XML Package, what is the difference between xmlParse and xmlTreeParse? In R XML Package, what is the difference between xmlParse and xmlTreeParse? xml xml

In R XML Package, what is the difference between xmlParse and xmlTreeParse?


Here some feedback after using XML package.

  • xmlParse is a version of xmlTreeParse where argument useInternalNodes is set to TRUE.
  • If you want to get an R object use xmlTreeParse. This can be not very efficient and unnecessary if you want just to extract partial part of the xml document.
  • If you don't want to get an R object, just a c pointer, use xmlParse. But you should know some xpath bases to manipulate the result.
  • Use asText=TRUE if you have a text not a file or an url as input.

Here an example where I show the difference between the 2 functions:

txt <- "<doc>          <el> aa </el>       </doc>"library(XML)res <- xmlParse(txt,asText=TRUE)res.tree <- xmlTreeParse(txt,asText=TRUE)

Now inspecting the 2 objects:

class(res)[1] "XMLInternalDocument" "XMLAbstractDocument"> class(res.tree)[1] "XMLDocument"         "XMLAbstractDocument"

You see that res is an internal document. It is pointer to a C object. res.tree is an R object. You can get its attributes like this :

 res.tree$doc$children$doc<doc> <el>aa</el></doc>

For res, you should use a valid xpath request and one of theses functions ( xpathApply, xpathSApply ,getNodeSet) to inspect it. for example:

xpathApply(res,'//el')

Once you create a valid Xml Node , you can apply xmlValue, xmlGetAttr,..to extract node information. So here this 2 statements are equivalent:

## we have already an R object, just apply xmlValue to the right childxmlValue(res.tree$doc$children$doc)## xpathSApply create an R object and pass it toxpathSApply(res,'//el',xmlValue)