Parsing HTML file in R
Looking at the documentation for xmlValue
suggests that there is another function by the name of xmlName
, which extracts just the name of the tag. Using these two, what you want can be computed:
doc.html.name.value <- xpathApply(doc.html, '//h2|//p', function(x) { list(name=xmlName(x), content=xmlValue(x)); })> doc.html.name.value[[1]]$name[1] "h2"$content[1] "\r\nGeorge Eliot\r\n"