Grep and Sed Equivalent for XML Command Line Processing Grep and Sed Equivalent for XML Command Line Processing xml xml

Grep and Sed Equivalent for XML Command Line Processing


I've found xmlstarlet to be pretty good at this sort of thing.

http://xmlstar.sourceforge.net/

Should be available in most distro repositories, too. An introductory tutorial is here:

http://www.ibm.com/developerworks/library/x-starlet.html


Some promising tools:

  • nokogiri: parsing HTML/XML DOMs in ruby using XPath & CSS selectors

  • hpricot: deprecated

  • fxgrep:Uses its own XPath-like syntax to query documents. Written in SML, so installation may be difficult.

  • LT XML:XML toolkit derived from SGML tools, including sggrep, sgsort, xmlnorm and others. Uses its own query syntax. The documentation is very formal. Written in C. LT XML 2 claims support of XPath, XInclude and other W3C standards.

  • xmlgrep2:simple and powerful searching with XPath. Written in Perl using XML::LibXML and libxml2.

  • XQSharp:Supports XQuery, the extension to XPath. Written for the .NET Framework.

  • xml-coreutils:Laird Breyer's toolkit equivalent to GNU coreutils. Discussed in an interesting essay on what the ideal toolkit should include.

  • xmldiff:Simple tool for comparing two xml files.

  • xmltk: doesn't seem to have package in debian, ubuntu, fedora, or macports, hasn't had a release since 2007, and uses non-portable build automation.

xml-coreutils seems the best documented and most UNIX-oriented.


To Joseph Holsten's excellent list, I add the xpath command-line script which comes with Perl library XML::XPath. A great way to extract information from XML files:

 xpath -q -e '/entry[@xml:lang="fr"]' *xml