Finding html element with class using lxml
The TopGear page that you use for testing doesn't have any <div class="channel">
elements. But this works (for example):
el = doc.xpath("//div[@class='channel-title-container']")
Or this:
el = doc.xpath("//div[@class='a yb xr']")
To find <div>
elements with a class
attribute that contains the string channel
, you could use
el = doc.xpath("//div[contains(@class, 'channel')]")
HTML uses classes (a lot), which makes them convenient to hook XPath queries. However XPath has no knowledge/support of CSS classes (or even space-separated lists) which makes classes a pain in the ass to check: the canonically correct way to look for elements having a specific class is:
//*[contains(concat(' ', normalize-space(@class), ' '), '$className')]
In your case this is
el = doc.xpath( "//div[contains(concat(' ', normalize-space(@class), ' '), 'channel')]")# print(el)# [<Element div at 0x7fa44e31ccc8>, <Element div at 0x7fa44e31c278>, <Element div at 0x7fa44e31cdb8>]
or use own XPath function hasclass(*classes)
def _hasaclass(context, *cls): return "your implementation ..." xpath_utils = etree.FunctionNamespace(None)xpath_utils['hasaclass'] = _hasaclassel = doc.xpath("//div[hasaclass('channel')]")