XML (.xsd) feed validation against a schema XML (.xsd) feed validation against a schema python python

XML (.xsd) feed validation against a schema


Definitely lxml.

Define an XMLParser with a predefined schema, load the the file fromstring() and catch any XML Schema errors:

from lxml import etreedef validate(xmlparser, xmlfilename):    try:        with open(xmlfilename, 'r') as f:            etree.fromstring(f.read(), xmlparser)         return True    except etree.XMLSchemaError:        return Falseschema_file = 'schema.xsd'with open(schema_file, 'r') as f:    schema_root = etree.XML(f.read())schema = etree.XMLSchema(schema_root)xmlparser = etree.XMLParser(schema=schema)filenames = ['input1.xml', 'input2.xml', 'input3.xml']for filename in filenames:    if validate(xmlparser, filename):        print("%s validates" % filename)    else:        print("%s doesn't validate" % filename)

Note about encoding

If the schema file contains an xml tag with an encoding (e.g. <?xml version="1.0" encoding="UTF-8"?>), the code above will generate the following error:

Traceback (most recent call last):  File "<input>", line 2, in <module>    schema_root = etree.XML(f.read())  File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML  File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocumentValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

A solution is to open the files in byte mode: open(..., 'rb')

[...]def validate(xmlparser, xmlfilename):    try:        with open(xmlfilename, 'rb') as f:[...]with open(schema_file, 'rb') as f:[...]


The python snippet is good, but an alternative is to use xmllint:

xmllint -schema sample.xsd --noout sample.xml


import xmlschemadef get_validation_errors(xml_file, xsd_file):    schema = xmlschema.XMLSchema(xsd_file)    validation_error_iterator = schema.iter_errors(xml_file)    errors = list()    for idx, validation_error in enumerate(validation_error_iterator, start=1):        err = validation_error.__str__()        errors.append(err)        print(err)    return errors