Only extracting text from this element, not its children Only extracting text from this element, not its children python python

Only extracting text from this element, not its children


what about .find(text=True)?

>>> BeautifulSoup.BeautifulSOAP('<html>yes<b>no</b></html>').find(text=True)u'yes'>>> BeautifulSoup.BeautifulSOAP('<html><b>no</b>yes</html>').find(text=True)u'no'

EDIT:

I think that I've understood what you want now. Try this:

>>> BeautifulSoup.BeautifulSOAP('<html><b>no</b>yes</html>').html.find(text=True, recursive=False)u'yes'>>> BeautifulSoup.BeautifulSOAP('<html>yes<b>no</b></html>').html.find(text=True, recursive=False)u'yes'


You could use contents

>>> print soup.html.contents[0]yes

or to get all the texts under html, use findAll(text=True, recursive=False)

>>> soup = BeautifulSoup.BeautifulSOAP('<html>x<b>no</b>yes</html>')>>> soup.html.findAll(text=True, recursive=False) [u'x', u'yes']

above joined to form a single string

>>> ''.join(soup.html.findAll(text=True, recursive=False)) u'xyes'


This works for me in bs4:

import bs4node = bs4.BeautifulSoup('<html><div>A<span>B</span>C</div></html>').find('div')print "".join([t for t in node.contents if type(t)==bs4.element.NavigableString])

output:

AC