XML Parsing: Element Tree (etree) vs. minidom [duplicate] XML Parsing: Element Tree (etree) vs. minidom [duplicate] python python

XML Parsing: Element Tree (etree) vs. minidom [duplicate]


DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.

The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.

Depending on your build, each of those has an underlying C implementation that makes them run fast.

None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).

There is also third-party module called lxml which is also a popular choice (full featured and fast).


Python has two interfaces probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM.

If you're concerned about speed, there's also lxml, which builds an ElementTree-compatible DOM using libxml2 and should be quite fast – they have a benchmark suite comparing themselves to ElementTree's Python and C implementations available.

If you're concerned about memory use, you shouldn't be using a tree API anyway; PullDOM might be a better choice, but I'm extrapolating from experience using Java's excellent pull parser – there doesn't seem to be much current information on PullDOM.