Comparing XML in a unit test in Python

python xml elementtree

This is an old question, but the accepted Kozyarchuk's answer doesn't work for me because of attributes order, and the minidom solution doesn't work as-is either (no idea why, I haven't debugged it).

This is what I finally came up with:

from doctest import Examplefrom lxml.doctestcompare import LXMLOutputCheckerclass XmlTest(TestCase):    def assertXmlEqual(self, got, want):        checker = LXMLOutputChecker()        if not checker.check_output(want, got, 0):            message = checker.output_difference(Example("", want), got, 0)            raise AssertionError(message)

This also produces a diff that can be helpful in case of large xml files.

python xml elementtree

First normalize 2 XML, then you can compare them. I've used the following using lxml

obj1 = objectify.fromstring(expect)expect = etree.tostring(obj1)obj2 = objectify.fromstring(xml)result = etree.tostring(obj2)self.assertEquals(expect, result)

python xml elementtree

If the problem is really just the whitespace and attribute order, and you have no other constructs than text and elements to worry about, you can parse the strings using a standard XML parser and compare the nodes manually. Here's an example using minidom, but you could write the same in etree pretty simply:

def isEqualXML(a, b):    da, db= minidom.parseString(a), minidom.parseString(b)    return isEqualElement(da.documentElement, db.documentElement)def isEqualElement(a, b):    if a.tagName!=b.tagName:        return False    if sorted(a.attributes.items())!=sorted(b.attributes.items()):        return False    if len(a.childNodes)!=len(b.childNodes):        return False    for ac, bc in zip(a.childNodes, b.childNodes):        if ac.nodeType!=bc.nodeType:            return False        if ac.nodeType==ac.TEXT_NODE and ac.data!=bc.data:            return False        if ac.nodeType==ac.ELEMENT_NODE and not isEqualElement(ac, bc):            return False    return True

If you need a more thorough equivalence comparison, covering the possibilities of other types of nodes including CDATA, PIs, entity references, comments, doctypes, namespaces and so on, you could use the DOM Level 3 Core method isEqualNode. Neither minidom nor etree have that, but pxdom is one implementation that supports it:

def isEqualXML(a, b):    da, db= pxdom.parseString(a), pxdom.parseString(a)    return da.isEqualNode(db)

(You may want to change some of the DOMConfiguration options on the parse if you need to specify whether entity references and CDATA sections match their replaced equivalents.)

A slightly more roundabout way of doing it would be to parse, then re-serialise to canonical form and do a string comparison. Again pxdom supports the DOM Level 3 LS option ‘canonical-form’ which you could use to do this; an alternative way using the stdlib's minidom implementation is to use c14n. However you must have the PyXML extensions install for this so you still can't quite do it within the stdlib:

from xml.dom.ext import c14ndef isEqualXML(a, b):    da, bd= minidom.parseString(a), minidom.parseString(b)    a, b= c14n.Canonicalize(da), c14n.Canonicalize(db)    return a==b

CodeHunter

Comparing XML in a unit test in Python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last