Get all text inside a tag in lxml
Try:
def stringify_children(node): from lxml.etree import tostring from itertools import chain parts = ([node.text] + list(chain(*([c.text, tostring(c), c.tail] for c in node.getchildren()))) + [node.tail]) # filter removes possible Nones in texts and tails return ''.join(filter(None, parts))
Example:
from lxml import etreenode = etree.fromstring("""<content>Text outside tag <div>Text <em>inside</em> tag</div></content>""")stringify_children(node)
Produces: '\nText outside tag <div>Text <em>inside</em> tag</div>\n'