Faithfully Preserve Comments in Parsed XML
Tested with Python 2.7 and 3.5, the following code should work as intended.
#!/usr/bin/env python# CommentedTreeBuilder.pyfrom xml.etree import ElementTreeclass CommentedTreeBuilder(ElementTree.TreeBuilder): def comment(self, data): self.start(ElementTree.Comment, {}) self.data(data) self.end(ElementTree.Comment)
Then, in the main code use
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
as the parser instead of the current one.
By the way, comments work correctly out of the box with lxml
. That is, you can just do
import lxml.etree as ETtree = ET.parse(filename)
without needing any of the above.
Python 3.8 added the insert_comments
argument to TreeBuilder
which:
class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)
When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).
Example:
parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))
Martin's Code didn't work for me. I modified the same with the following which works as intended.
import xml.etree.ElementTree as ETclass CommentedTreeBuilder(ET.XMLTreeBuilder): def __init__(self, *args, **kwargs): super(CommentedTreeBuilder, self).__init__(*args, **kwargs) self._parser.CommentHandler = self.comment def comment(self, data): self._target.start(ET.Comment, {}) self._target.data(data) self._target.end(ET.Comment)
This is the test
parser=CommentedTreeBuilder() tree = ET.parse(filename, parser) tree.write('out.xml')