lxml.etree.XML ValueError for Unicode string lxml.etree.XML ValueError for Unicode string python-3.x python-3.x

lxml.etree.XML ValueError for Unicode string


data = open(module_path+'/data/ex-fire.xslt')xslt_content = data.read()

This implicitly decodes the bytes in the file to Unicode text, using the default encoding. (This might give wrong results, if the XML file isn't in that encoding.)

xslt_root = etree.XML(xslt_content)

XML has its own handling and signalling for encodings, the <?xml encoding="..."?> prolog. If you pass a Unicode string starting with <?xml encoding="..."?> to a parser, the parser would like to reintrepret the rest of the byte string using that encoding... but can't, because you've already decoded the byte input to a Unicode string.

Instead, you should either pass the undecoded byte string to the parser:

data = open(module_path+'/data/ex-fire.xslt', 'rb')xslt_content = data.read()xslt_root = etree.XML(xslt_content)

or, better, just have the parser read straight from the file:

xslt_root = etree.parse(module_path+'/data/ex-fire.xslt')


You can also decode the UTF-8 string and encode it with ascii before passing it to etree.XML

 xslt_content = data.read() xslt_content = xslt_content.decode('utf-8').encode('ascii') xslt_root = etree.XML(xslt_content)


I made it work by simply reencoding with the default options

xslt_content = data.read().encode()