Faithfully Preserve Comments in Parsed XML Faithfully Preserve Comments in Parsed XML xml xml

Faithfully Preserve Comments in Parsed XML


Tested with Python 2.7 and 3.5, the following code should work as intended.

#!/usr/bin/env python# CommentedTreeBuilder.pyfrom xml.etree import ElementTreeclass CommentedTreeBuilder(ElementTree.TreeBuilder):    def comment(self, data):        self.start(ElementTree.Comment, {})        self.data(data)        self.end(ElementTree.Comment)

Then, in the main code use

parser = ElementTree.XMLParser(target=CommentedTreeBuilder())

as the parser instead of the current one.

By the way, comments work correctly out of the box with lxml. That is, you can just do

import lxml.etree as ETtree = ET.parse(filename)

without needing any of the above.


Python 3.8 added the insert_comments argument to TreeBuilder which:

class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)

When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).

Example:

parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))


Martin's Code didn't work for me. I modified the same with the following which works as intended.

import xml.etree.ElementTree as ETclass CommentedTreeBuilder(ET.XMLTreeBuilder):    def __init__(self, *args, **kwargs):        super(CommentedTreeBuilder, self).__init__(*args, **kwargs)        self._parser.CommentHandler = self.comment    def comment(self, data):        self._target.start(ET.Comment, {})        self._target.data(data)        self._target.end(ET.Comment)

This is the test

    parser=CommentedTreeBuilder()    tree = ET.parse(filename, parser)    tree.write('out.xml')