Python xml minidom. generate <text>Some text</text> element Python xml minidom. generate <text>Some text</text> element xml xml

Python xml minidom. generate <text>Some text</text> element


Can this easily be done?

It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.

The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.

>>> doc= pxdom.parseString('<a><b>c</b></a>')>>> doc.domConfig.setParameter('format-pretty-print', True)>>> print doc.pxdomContent<?xml version="1.0" encoding="utf-16"?><a>  <b>c</b></a>

(Adjust encoding and output format for whatever type of serialisation you're doing.)

If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:

>>> from xml.dom import minidom>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):...     if len(self.childNodes)==1 and self.firstChild.nodeType==3:...         writer.write(indent)...         self.oldwritexml(writer) # cancel extra whitespace...         writer.write(newl)...     else:...         self.oldwritexml(writer, indent, addindent, newl)... >>> minidom.Element.oldwritexml= minidom.Element.writexml>>> minidom.Element.writexml= newwritexml

All the usual caveats about the badness of monkey-patching apply.


I was looking for exactly the same thing, and I came across this post. (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result:

In [1]: import pxdomIn [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'In [3]: doc = pxdom.parseString(xml)In [4]: doc.domConfig.setParameter('format-pretty-print', True)In [5]: print doc.pxdomContent<?xml version="1.0" encoding="utf-16"?><a>  <b>fda</b><c>    <b>fdsa</b>  </c></a>

The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (i.e. I barely know what it means, and understand it's bad), so I looked for another solution.

I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). So I just write the unformatted to the file, then call the xmlindent from within the python program:

from subprocess import Popen, PIPEPopen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name],       stderr=PIPE,       stdout=PIPE).communicate()

The -w switch causes the file to be overwritten, but annoyingly leaves a named e.g. "myfile.xml~" which you'll probably want to remove. The other switches are there to get the formatting right (for me).

P.S. xmlindent is a stream formatter, i.e. you can use it as follows:

cat myfile.xml | xmlindent > myfile_indented.xml

So you might be able to use it in a python script without writing to a file if you needed to.


This could be done with toxml(), using regular expressions to tidy things up.

>>> from xml.dom.minidom import Document>>> import re>>> doc = Document()>>> root = doc.createElement('root')>>> _ = doc.appendChild(root)>>> main = doc.createElement('Text')>>> _ = root.appendChild(main)>>> text = doc.createTextNode('Some text here')>>> _ = main.appendChild(text)>>> out = doc.toxml()>>> niceOut = re.sub(r'><', r'>\n<', re.sub(r'(<\/.*?>)', r'\1\n', out))>>> print niceOut<?xml version="1.0" ?><root><Text>Some text here</Text></root>