Merge Two XML Files in Java Merge Two XML Files in Java xml xml

Merge Two XML Files in Java


Not very elegant, but you could do this with the DOM parser and XPath:

public class MergeXmlDemo {  public static void main(String[] args) throws Exception {    // proper error/exception handling omitted for brevity    File file1 = new File("merge1.xml");    File file2 = new File("merge2.xml");    Document doc = merge("/run/host/results", file1, file2);    print(doc);  }  private static Document merge(String expression,      File... files) throws Exception {    XPathFactory xPathFactory = XPathFactory.newInstance();    XPath xpath = xPathFactory.newXPath();    XPathExpression compiledExpression = xpath        .compile(expression);    return merge(compiledExpression, files);  }  private static Document merge(XPathExpression expression,      File... files) throws Exception {    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory        .newInstance();    docBuilderFactory        .setIgnoringElementContentWhitespace(true);    DocumentBuilder docBuilder = docBuilderFactory        .newDocumentBuilder();    Document base = docBuilder.parse(files[0]);    Node results = (Node) expression.evaluate(base,        XPathConstants.NODE);    if (results == null) {      throw new IOException(files[0]          + ": expression does not evaluate to node");    }    for (int i = 1; i < files.length; i++) {      Document merge = docBuilder.parse(files[i]);      Node nextResults = (Node) expression.evaluate(merge,          XPathConstants.NODE);      while (nextResults.hasChildNodes()) {        Node kid = nextResults.getFirstChild();        nextResults.removeChild(kid);        kid = base.importNode(kid, true);        results.appendChild(kid);      }    }    return base;  }  private static void print(Document doc) throws Exception {    TransformerFactory transformerFactory = TransformerFactory        .newInstance();    Transformer transformer = transformerFactory        .newTransformer();    DOMSource source = new DOMSource(doc);    Result result = new StreamResult(System.out);    transformer.transform(source, result);  }}

This assumes that you can hold at least two of the documents in RAM simultaneously.


I use XSLT to merge XML files. It allows me to adjust the merge operation to just slam the content together or to merge at an specific level. It is a little more work (and XSLT syntax is kind of special) but super flexible. A few things you need here

a) Include an additional fileb) Copy the original file 1:1c) Design your merge point with or without duplication avoidance

a) In the beginning I have

<xsl:param name="mDocName">yoursecondfile.xml</xsl:param><xsl:variable name="mDoc" select="document($mDocName)" />

this allows to point to the second file using $mDoc

b) The instructions to copy a source tree 1:1 are 2 templates:

<!-- Copy everything including attributes as default action --><xsl:template match="*">    <xsl:element name="{name()}">         <xsl:apply-templates select="@*" />        <xsl:apply-templates />    </xsl:element></xsl:template><xsl:template match="@*">    <xsl:attribute name="{name()}"><xsl:value-of select="." /></xsl:attribute></xsl:template>

With nothing else you get a 1:1 copy of your first source file. Works with any type of XML. The merging part is file specific. Let's presume you have event elements with an event ID attribute. You do not want duplicate IDs. The template would look like this:

 <xsl:template match="events">    <xsl:variable name="allEvents" select="descendant::*" />    <events>        <!-- copies all events from the first file -->        <xsl:apply-templates />        <!-- Merge the new events in. You need to adjust the select clause -->        <xsl:for-each select="$mDoc/logbook/server/events/event">            <xsl:variable name="curID" select="@id" />            <xsl:if test="not ($allEvents[@id=$curID]/@id = $curID)">                <xsl:element name="event">                    <xsl:apply-templates select="@*" />                    <xsl:apply-templates />                </xsl:element>            </xsl:if>        </xsl:for-each>    </properties></xsl:template>

Of course you can compare other things like tag names etc. Also it is up to you how deep the merge happens. If you don't have a key to compare, the construct becomes easier e.g. for log:

 <xsl:template match="logs">     <xsl:element name="logs">          <xsl:apply-templates select="@*" />          <xsl:apply-templates />          <xsl:apply-templates select="$mDoc/logbook/server/logs/log" />    </xsl:element>

To run XSLT in Java use this:

    Source xmlSource = new StreamSource(xmlFile);    Source xsltSource = new StreamSource(xsltFile);    Result xmlResult = new StreamResult(resultFile);    TransformerFactory transFact = TransformerFactory.newInstance();    Transformer trans = transFact.newTransformer(xsltSource);    // Load Parameters if we have any    if (ParameterMap != null) {       for (Entry<String, String> curParam : ParameterMap.entrySet()) {            trans.setParameter(curParam.getKey(), curParam.getValue());       }    }    trans.transform(xmlSource, xmlResult);

or you download the Saxon SAX Parser and do it from the command line (Linux shell example):

#!/bin/bashnotify-send -t 500 -u low -i gtk-dialog-info "Transforming $1 with $2 into $3 ..."# That's actually the only relevant line belowjava -cp saxon9he.jar net.sf.saxon.Transform -t -s:$1 -xsl:$2 -o:$3notify-send -t 1000 -u low -i gtk-dialog-info "Extraction into $3 done!"

YMMV


Thanks to everyone for their suggestions unfortunately none of the methods suggested turned out to be suitable in the end, as I needed to have rules for the way in which different nodes of the structure where mereged.

So what I did was take the DTD relating to the XML files I was merging and from that create a number of classes reflecting the structure. From this I used XStream to unserialize the XML file back into classes.

This way I annotated my classes making it a process of using a combination of the rules assigned with annotations and some reflection in order to merge the Objects as opposed to merging the actual XML structure.

If anyone is interested in the code which in this case merges Nmap XML files please see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz the codes not perfect and I will admit not massively flexible but it definitely works. I'm planning to reimplement the system with it parsing the DTD automatically when I have some free time.