jquery-like HTML parsing in Python? jquery-like HTML parsing in Python? python python

jquery-like HTML parsing in Python?


If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
Soupselect is a CSS selector extension for BeautifulSoup.

Usage:

from bs4 import BeautifulSoup as Soupfrom soupselect import selectimport urllibsoup = Soup(urllib.urlopen('http://slashdot.org/'))select(soup, 'div.title h3')
    [<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,     <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,    ..]


Consider PyQuery:

http://packages.python.org/pyquery/

>>> from pyquery import PyQuery as pq>>> from lxml import etree>>> import urllib>>> d = pq("<html></html>")>>> d = pq(etree.fromstring("<html></html>"))>>> d = pq(url='http://google.com/')>>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read())>>> d = pq(filename=path_to_html_file)>>> d("#hello")[<p#hello.hello>]>>> p = d("#hello")>>> p.html()'Hello world !'>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")[<p#hello.hello>]>>> p.html()u'you know <a href="http://python.org/">Python</a> rocks'>>> p.text()'you know Python rocks'