Get all text inside a tag in lxml Get all text inside a tag in lxml python python

Get all text inside a tag in lxml


Just use the node.itertext() method, as in:

 ''.join(node.itertext())


Try:

def stringify_children(node):    from lxml.etree import tostring    from itertools import chain    parts = ([node.text] +            list(chain(*([c.text, tostring(c), c.tail] for c in node.getchildren()))) +            [node.tail])    # filter removes possible Nones in texts and tails    return ''.join(filter(None, parts))

Example:

from lxml import etreenode = etree.fromstring("""<content>Text outside tag <div>Text <em>inside</em> tag</div></content>""")stringify_children(node)

Produces: '\nText outside tag <div>Text <em>inside</em> tag</div>\n'