Error 'failed to load external entity' when using Python lxml

In concert with what mzjn said, if you do want to pass a string to etree.parse(), just wrap it in a StringIO object.

Example:

from lxml import etreefrom StringIO import StringIOmyString = "<html><p>blah blah blah</p></html>"tree = etree.parse(StringIO(myString))

This method is used in the lxml documentation.

python xml lxml elementtree

etree.parse(source) expects source to be one of

a file name/path
a file object
a file-like object
a URL using the HTTP or FTP protocol

The problem is that you are supplying the XML content as a string.

You can also do without urllib2.urlopen(). Just use

tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")

Demonstration (using lxml 2.3.4):

>>> from lxml import etree>>> tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")>>> tree.getroot()<Element {http://www.w3.org/2005/Atom}feed at 0xedaa08>>>>

In a competing answer, it is suggested that lxml fails because of the stylesheet referenced by the processing instruction in the document. But that is not the problem here. lxml does not try to load the stylesheet, and the XML document is parsed just fine if you do as described above.

If you want to actually load the stylesheet, you have to be explicit about it. Something like this is needed:

from lxml import etreetree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")# Create an _XSLTProcessingInstruction objectpi = tree.xpath("//processing-instruction()")[0] # Parse the stylesheet and return an ElementTreexsl = pi.parseXSL()

python xml lxml elementtree

lxml docs for parse says To parse from a string, use the fromstring() function instead.

parse(...)    parse(source, parser=None, base_url=None)    Return an ElementTree object loaded with source elements.  If no parser    is provided as second argument, the default parser is used.    The ``source`` can be any of the following:    - a file name/path    - a file object    - a file-like object    - a URL using the HTTP or FTP protocol    To parse from a string, use the ``fromstring()`` function instead.    Note that it is generally faster to parse from a file path or URL    than from an open file object or file-like object.  Transparent    decompression from gzip compressed sources is supported (unless    explicitly disabled in libxml2).

CodeHunter

Error 'failed to load external entity' when using Python lxml

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last