XML Parsing: Element Tree (etree) vs. minidom [duplicate]

python xml-parsing elementtree minidom

DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.

The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.

Depending on your build, each of those has an underlying C implementation that makes them run fast.

None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).

There is also third-party module called lxml which is also a popular choice (full featured and fast).

python xml-parsing elementtree minidom

Python has two interfaces probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM.

If you're concerned about speed, there's also lxml, which builds an ElementTree-compatible DOM using libxml2 and should be quite fast – they have a benchmark suite comparing themselves to ElementTree's Python and C implementations available.

If you're concerned about memory use, you shouldn't be using a tree API anyway; PullDOM might be a better choice, but I'm extrapolating from experience using Java's excellent pull parser – there doesn't seem to be much current information on PullDOM.

CodeHunter

XML Parsing: Element Tree (etree) vs. minidom [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last