Merge multiple XML files from command line
High-tech answer:
Save this Python script as xmlcombine.py
:
#!/usr/bin/env pythonimport sysfrom xml.etree import ElementTreedef run(files): first = None for filename in files: data = ElementTree.parse(filename).getroot() if first is None: first = data else: first.extend(data) if first is not None: print ElementTree.tostring(first)if __name__ == "__main__": run(sys.argv[1:])
To combine files, run:
python xmlcombine.py ?.xml > combined.xml
For further enhancement, consider using:
chmod +x xmlcombine.py
:Allows you to omitpython
in the command linexmlcombine.py !(combined).xml > combined.xml
:Collects all XML files except the output, but requires bash'sextglob
optionxmlcombine.py *.xml | sponge combined.xml
:Collects everything incombined.xml
as well, but requires thesponge
programimport lxml.etree as ElementTree
:Uses a potentially faster XML parser
xml_grep
http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep
xml_grep --pretty_print indented --wrap products --descr '' --cond "product" *.xml > combined.xml
- --wrap : encloses/wraps the the xml result with the given tag. (here:
products
) - --cond : the xml subtree to grep (here:
product
)
Low-tech simple answer:
echo '<products>' > combined.xmlgrep -vh '</\?products>\|<?xml' *.xml >> combined.xmlecho '</products>' >> combined.xml
Limitations:
- The opening and closing tags need to be on their own line.
- The files need to all have the same outer tags.
- The outer tags must not have attributes.
- The files must not have inner tags that match the outer tags.
- Any current contents of
combined.xml
will be wiped out instead of getting included.
Each of these limitations can be worked around, but not all of them easily.