Finding html element with class using lxml Finding html element with class using lxml python-3.x python-3.x

Finding html element with class using lxml


The TopGear page that you use for testing doesn't have any <div class="channel"> elements. But this works (for example):

el = doc.xpath("//div[@class='channel-title-container']")

Or this:

el = doc.xpath("//div[@class='a yb xr']")

To find <div> elements with a class attribute that contains the string channel, you could use

el = doc.xpath("//div[contains(@class, 'channel')]") 


You can use lxml.cssselect to simplify class and id request: http://lxml.de/dev/cssselect.html


HTML uses classes (a lot), which makes them convenient to hook XPath queries. However XPath has no knowledge/support of CSS classes (or even space-separated lists) which makes classes a pain in the ass to check: the canonically correct way to look for elements having a specific class is:

//*[contains(concat(' ', normalize-space(@class), ' '), '$className')]

In your case this is

el = doc.xpath(    "//div[contains(concat(' ', normalize-space(@class), ' '), 'channel')]")# print(el)# [<Element div at 0x7fa44e31ccc8>, <Element div at 0x7fa44e31c278>, <Element div at 0x7fa44e31cdb8>]

or use own XPath function hasclass(*classes)

def _hasaclass(context, *cls):    return "your implementation ..." xpath_utils = etree.FunctionNamespace(None)xpath_utils['hasaclass'] = _hasaclassel = doc.xpath("//div[hasaclass('channel')]")