merging xml files using python's ElementTree merging xml files using python's ElementTree xml xml

merging xml files using python's ElementTree


Although this is mostly a duplicate and the answer can be found here, I already did this so i can share this python code:

import os, os.path, sysimport globfrom xml.etree import ElementTreedef run(files):    xml_files = glob.glob(files +"/*.xml")    xml_element_tree = None    for xml_file in xml_files:        data = ElementTree.parse(xml_file).getroot()        # print ElementTree.tostring(data)        for result in data.iter('results'):            if xml_element_tree is None:                xml_element_tree = data                 insertion_point = xml_element_tree.findall("./results")[0]            else:                insertion_point.extend(result)     if xml_element_tree is not None:        print ElementTree.tostring(xml_element_tree)

However this question contains another problem not present in the other post. The sample XML files are not valid XML so its not possible to have a XML tag with:

<sample="1">    ...</sample>

is not possible change to something like:

<sample id="1">    ...</sample>


You could try this solution:

import globfrom xml.etree import ElementTreedef newRunRun(folder):    xml_files = glob.glob(folder+"/*.xml")    node = None    for xmlFile in xml_files:              tree = ElementTree.parse(xmlFile)        root = tree.getroot()        if node is None:            node = root        else:            elements = root.find("./results")                       for element in elements._children:                node[1].append(element)                    print ElementTree.tostring(node)folder = "resources"newRunRun(folder) 

As you can see, I´m using the first doc as a container, inserting inside it the elements of others docs... This is the ouput generated:

<sample id="1"><workflow value="x" version="1" />  <results>   <result type="Q">      <result_data type="value" value="11" />      <result_data type="value" value="21" />      <result_data type="value" value="13" />      <result_data type="value" value="12" />      <result_data type="value" value="15" />    </result>  <result type="T">      <result_data type="value" value="19" />      <result_data type="value" value="15" />      <result_data type="value" value="14" />      <result_data type="value" value="13" />      <result_data type="value" value="12" />    </result>  </results></sample>

Using the version: Python 2.7.15