Reading a big XML file using stax and dom Reading a big XML file using stax and dom xml xml

Reading a big XML file using stax and dom


You could use a StAX (javax.xml.stream) parser and transform (javax.xml.transform) each section to a DOM node (org.w3c.dom):

import java.io.*;import javax.xml.stream.*;import javax.xml.transform.*;import javax.xml.transform.stax.StAXSource;import javax.xml.transform.dom.DOMResult;import org.w3c.dom.*public class Demo {    public static void main(String[] args) throws Exception  {        XMLInputFactory xif = XMLInputFactory.newInstance();        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));        xsr.nextTag(); // Advance to statements element        TransformerFactory tf = TransformerFactory.newInstance();        Transformer t = tf.newTransformer();        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {            DOMResult result = new DOMResult();            t.transform(new StAXSource(xsr), result);            Node domNode = result.getNode();        }    }}

Also see:


Blaise Doughan's answer fails in clean java 7 and 8 due to https://bugs.openjdk.java.net/browse/JDK-8016914

java.lang.NullPointerExceptionat com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.setXmlVersion(CoreDocumentImpl.java:860)at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.setDocumentInfo(SAX2DOM.java:144)

Funny thing: if you use jaxb unmarshaller, you don't get the NPE:

package com.common.config;import java.io.*;import javax.xml.bind.JAXBContext;import javax.xml.bind.JAXBElement;import javax.xml.bind.Unmarshaller;import javax.xml.stream.*;import org.w3c.dom.*;public class Demo {    public static void main(String[] args) throws Exception  {        XMLInputFactory xif = XMLInputFactory.newInstance();        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));        // Advance to root element        xsr.nextTag(); // TODO: nextTag() can't skip DTD        xsr.next(); // Advance to first item or EOD        final JAXBContext jaxbContext = JAXBContext.newInstance();        final Unmarshaller unm = jaxbContext.createUnmarshaller();        while(true) {            // previous unmarshal() already did advance to next element or whitespace            if (xsr.getEventType() == XMLStreamReader.START_ELEMENT) {                JAXBElement<Object> jel = unm.unmarshal(xsr, Object.class);                Node domNode = (Node)jel.getValue();                System.err.println(domNode.getNodeName());            } else if (!xsr.hasNext()) {                    break;            } else {                xsr.next();            }        }    }}

The reason is: com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXConnector$1 does not implement Locator2 therefore it has no getXMLVersion().


you can try XMLDog from JLibs.

It evaluates xpath on xml document using SAX (i.e without loading entire xml into memory).and returns dom nodes for the nodes as they are hit.

thus you can evaluate xpath /Items/Item on your fat xml document. you will be notified as each Item node is parsed. you can process the current Item dom node, and continue.

Thus it is suitable for evaluating xpaths on large documents