Parsing HTML file in R Parsing HTML file in R xml xml

Parsing HTML file in R


Looking at the documentation for xmlValue suggests that there is another function by the name of xmlName, which extracts just the name of the tag. Using these two, what you want can be computed:

doc.html.name.value <- xpathApply(doc.html, '//h2|//p', function(x) { list(name=xmlName(x), content=xmlValue(x)); })> doc.html.name.value[[1]]$name[1] "h2"$content[1] "\r\nGeorge Eliot\r\n"