Get text content of an HTML element using XPath? Get text content of an HTML element using XPath? xml xml

Get text content of an HTML element using XPath?


You want to select all descendant text, not just child text:

//div[a[contains(., "Add to cart")]]/p//text()

Note the double slash between p and text() there.

This potentially will also include a lot of inter-tag whitespace though, you you'll need to clean that up. Example using lxml:

>>> import lxml.etree as ET>>> tree = ET.fromstring('''<div>... <div>...     <p>...     <span class="abc">Monitor</span> <b>$300</b>...     </p>...     <a href="/add">Add to cart</a>... </div>... <div>...     <p>...     <span class="abc">Keyboard</span> $20 ...     </p>...     <a href="/add">Add to cart</a>... </div>... </div>''')>>> tree.xpath('//div[a[contains(., "Add to cart")]]/p//text()')['\n    ', 'Monitor', ' ', '$300', '\n    ', '\n    ', 'Keyboard', ' $20 \n    ']>>> res = _>>> [txt for txt in (txt.strip() for txt in res) if txt]['Monitor', '$300', 'Keyboard', '$20']