Extracting Text Between HTML Comments with BeautifulSoup Extracting Text Between HTML Comments with BeautifulSoup python-3.x python-3.x

Extracting Text Between HTML Comments with BeautifulSoup


You just need to iterate through all of the available comments to see if it is one of your required entries, and then display the text for the following element as follows:

from bs4 import BeautifulSoup, Commenthtml = """<html><body><p>p tag text</p><!--UNIQUE COMMENT-->I would like to get this text<!--SECOND UNIQUE COMMENT-->I would also like to find this text</body></html>"""soup = BeautifulSoup(html, 'lxml')for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):    if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:        print comment.next_element.strip()

This would display the following:

I would like to get this textI would also like to find this text


An improvement to the Martin's answer - you can search for specific comments directly - no need to iterate over all the comment and then check the values - do it in one go:

comments_to_search_for = {'UNIQUE COMMENT', 'SECOND UNIQUE COMMENT'}for comment in soup.find_all(text=lambda text: isinstance(text, Comment) and text in comments_to_search_for):    print(comment.next_element.strip())

Prints:

I would like to get this textI would also like to find this text


Python'sbs4 module has a Comment class. You can use that extract the comments.

from bs4 import BeautifulSoup, Commenthtml = """<html><body><p>p tag text</p><!--UNIQUE COMMENT-->I would like to get this text<!--SECOND UNIQUE COMMENT-->I would also like to find this text</body></html>"""soup = BeautifulSoup(html, 'lxml')comments = soup.findAll(text=lambda text:isinstance(text, Comment))

This will give you the Comment elements.

[u'UNIQUE COMMENT', u'SECOND UNIQUE COMMENT']