Is there any XPath processor for SAX model? Is there any XPath processor for SAX model? xml xml

Is there any XPath processor for SAX model?


My current list (compiled from web search results and the other answers) is:

The next step is to use the examples of XMLDog and compare the performance of all these approaches. Then, the test cases should be extended to the supported XPath expressions.


We regularly parse 1GB+ complex XML files by using a SAX parser which extracts partial DOM trees that can be conveniently queried using XPath. I blogged about it here: http://softwareengineeringcorner.blogspot.com/2012/01/conveniently-processing-large-xml-files.html - Sources are available on github - MIT License.


XPath DOES work with SAX, and most XSLT processors (especially Saxon and Apache Xalan) do support executing XPath expressions inside XSLTs on a SAX stream without building the entire dom.

They manage to do this, very roughly, as follows :

  1. Examining the XPath expressions they need to match
  2. Receiving SAX events and testing if that node is needed or will be needed by one of the XPath expressions.
  3. Ignoring the SAX event if it is of no use for the XPath expressions.
  4. Buffering it if it's needed

How they buffer it is also very interesting, cause while some simply create DOM fragments here and there, others use very optimized tables for quick lookup and reduced memory consumption.

How much they manage to optimize largely depends on the kind of XPath queries they find. As the already posted Saxon documentation clearly explain, queries that move "up" and then traverse "horizontally" (sibling by sibling) the document obviously requires the entire document to be there, but most of them require just a few nodes to be kept into RAM at any moment.

I'm pretty sure of this because when I was still making every day webapp using Cocoon, we had the XSLT memory footprint problem each time we used a "//something" expression inside an XSLT, and quite often we had to rework XPath expressions to allow a better SAX optimization.