Merge multiple XML files from command line Merge multiple XML files from command line xml xml

Merge multiple XML files from command line


High-tech answer:

Save this Python script as xmlcombine.py:

#!/usr/bin/env pythonimport sysfrom xml.etree import ElementTreedef run(files):    first = None    for filename in files:        data = ElementTree.parse(filename).getroot()        if first is None:            first = data        else:            first.extend(data)    if first is not None:        print ElementTree.tostring(first)if __name__ == "__main__":    run(sys.argv[1:])

To combine files, run:

python xmlcombine.py ?.xml > combined.xml

For further enhancement, consider using:

  • chmod +x xmlcombine.py:Allows you to omit python in the command line

  • xmlcombine.py !(combined).xml > combined.xml:Collects all XML files except the output, but requires bash's extglob option

  • xmlcombine.py *.xml | sponge combined.xml:Collects everything in combined.xml as well, but requires the sponge program

  • import lxml.etree as ElementTree:Uses a potentially faster XML parser


xml_grep

http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep

xml_grep --pretty_print indented --wrap products --descr '' --cond "product" *.xml > combined.xml

  • --wrap : encloses/wraps the the xml result with the given tag. (here: products)
  • --cond : the xml subtree to grep (here: product)


Low-tech simple answer:

echo '<products>' > combined.xmlgrep -vh '</\?products>\|<?xml' *.xml >> combined.xmlecho '</products>' >> combined.xml

Limitations:

  • The opening and closing tags need to be on their own line.
  • The files need to all have the same outer tags.
  • The outer tags must not have attributes.
  • The files must not have inner tags that match the outer tags.
  • Any current contents of combined.xml will be wiped out instead of getting included.

Each of these limitations can be worked around, but not all of them easily.