Beautiful Soup and extracting a div and its contents by ID Beautiful Soup and extracting a div and its contents by ID python python

Beautiful Soup and extracting a div and its contents by ID


You should post your example document, because the code works fine:

>>> import BeautifulSoup>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div id="articlebody"> ... </div></body></html')>>> soup.find("div", {"id": "articlebody"})<div id="articlebody"> ... </div>

Finding <div>s inside <div>s works as well:

>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div><div id="articlebody"> ... </div></div></body></html')>>> soup.find("div", {"id": "articlebody"})<div id="articlebody"> ... </div>


To find an element by its id:

div = soup.find(id="articlebody")


Beautiful Soup 4 supports most CSS selectors with the .select() method, therefore you can use an id selector such as:

soup.select('#articlebody')

If you need to specify the element's type, you can add a type selector before the id selector:

soup.select('div#articlebody')

The .select() method will return a collection of elements, which means that it would return the same results as the following .find_all() method example:

soup.find_all('div', id="articlebody")# orsoup.find_all(id="articlebody")

If you only want to select a single element, then you could just use the .find() method:

soup.find('div', id="articlebody")# orsoup.find(id="articlebody")