error with parse function in lxml

~~lxml.html.parse does not fetch URLs.~~

Here's how to do it with urllib2:

>>> from urllib2 import urlopen>>> from lxml.html import parse>>> page = urlopen('http://www.google.com')>>> p = parse(page)>>> p.getroot()<Element html at 1304050>

Update
Steven is right. lxml.etree.parse should accept and load URLs. I missed that. I've tried deleting this answer, but I'm not allowed.

I retract my statement about it not fetching URLs.

python windows parsing lxml

According to the api docs it should work: http://lxml.de/api/lxml.html-module.html#parse

This seems to be a bug in lxml 2.2.2. I just tested on windows with python 2.6 and 2.7 and it does work with 2.3.0.

So: upgrade your lxml and you'll be fine.

I don't know exactly in which versions of lxml the problem occurs, but I believe the problem was not so much with lxml itself, but with the version of libxml2 used to build the windows binary. (certain versions of libxml2 had a problem with http on windows)

python windows parsing lxml

Since line breaks are not allowed in comments, here's my implementation of MattH's answer:

from urllib2 import urlopenfrom lxml.html import parsesite_url = ('http://www.google.com')try:    page = parse(site_url).getroot()except IOError:    page = parse(urlopen(site_url)).getroot()

CodeHunter

error with parse function in lxml

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last