Get text content of an HTML element using XPath?
You want to select all descendant text, not just child text:
//div[a[contains(., "Add to cart")]]/p//text()
Note the double slash between p
and text()
there.
This potentially will also include a lot of inter-tag whitespace though, you you'll need to clean that up. Example using lxml
:
>>> import lxml.etree as ET>>> tree = ET.fromstring('''<div>... <div>... <p>... <span class="abc">Monitor</span> <b>$300</b>... </p>... <a href="/add">Add to cart</a>... </div>... <div>... <p>... <span class="abc">Keyboard</span> $20 ... </p>... <a href="/add">Add to cart</a>... </div>... </div>''')>>> tree.xpath('//div[a[contains(., "Add to cart")]]/p//text()')['\n ', 'Monitor', ' ', '$300', '\n ', '\n ', 'Keyboard', ' $20 \n ']>>> res = _>>> [txt for txt in (txt.strip() for txt in res) if txt]['Monitor', '$300', 'Keyboard', '$20']