XML Diff and Merge XML Diff and Merge xml xml

XML Diff and Merge


In my last job, we had a similar problem: We had to detect changes, insertions, and deletions of specific items between two XML files. The files weren't arbitrary XML; they had to adhere to our XSD.

Our solution was to implement a kind of merge sort: Parse the files (using a SAX parser, not a DOM parser, to permit arbitrarily large files), and store the parsed data in separate HashMaps. Then, we compared the contents of the two maps using a merge-sort type of algorithm.

Naturally, the larger the files got, the more memory pressure we experienced, so I ultimately wrote a FileHashMap class that pushed the HashMap's value space to random access files. While theoretically slower, this solution allowed our comparisons to work with very large files, without thrashing or OutOfMemoryError conditions. (A version of that FileHashMap class is available in this library: http://www.clapper.org/software/java/util/)

I have no idea whether what I just described is even remotely close to what you need, but I thought I'd share it, just in case.

Good luck.


Side note: there is now a standard format for XML-aware "patches", in RFC 5261. There is at least one free software program, xmlpatch, which implements it. It is written in C, you may call it from Java.


There are any number of open-source XML diff tools written in Java that you can crib from. One list of such tools is here.