Native shell command set to extract node value from XML Native shell command set to extract node value from XML xml xml

Native shell command set to extract node value from XML


--format is used only to format (indent, etc) the document. You can do that using --xpath (tested in Ubuntu, libxml v20900):

$ xmllint --xpath "//project/parent/version/text()" pom.xml1.5.0


I've managed to solve it for the time being with this rather unwiedly script using xmllint --shell.

echo "cat //project/parent/version" | xmllint --shell pom.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g'

If the XML nodes have namespace attributes like my pom.xml had, things get heavier, basically extracting the node by name:

echo "cat //*[local-name()='project']/*[local-name()='parent']/*[local-name()='version']" | xmllint --shell pom.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g'

Hope it helps. If anyone can simply these expressions, I'd be grateful.


I came here looking for a nice way to scrape a value from a website. The following example may be useful to those (unlike the poster) who have a version of xmllint which supports --xpath.

I needed to pull the most recent stable version of the elasticsearch .debfile and install it. The maintainers have helpfully put the version number in a span with the class "version".

version=`curl -s http://www.elasticsearch.org/download/ |\ xmllint --html --xpath '//span[@class="version"]/text()'\ 2>/dev/null - `;

What goes on:

We use the curl -s (silent) option.

curl -s http://www.elasticsearch.org/download/

We use the xmllint --html and --xpath switches. The xpath arguments (in single quotes)

'//span[@class="version"]/text()'

... looks for a <span> node with the class attribute (@class) "version", and extracts the text value (/text()).

Since xmllint is (surprise!) a linter, it will squawk about the inevitable garbage in your html stream. We direct the stderr to /dev/null in the usual way:

 2>/dev/null

Finally, note the " - " at the end of the xmllint command, which tells xmllint the stream is coming from stdin.