Why do we still need parser like BeautifulSoup if we can use Selenium?
Selenium itself is quite powerful in terms of locating elements and, it basically has everything you need for extracting data from HTML. The problem is, it is slow. Every single selenium command goes through the JSON wire HTTP protocol and there is a substantial overhead.
In order to improve the performance of the HTML parsing part, it is usually much faster to let BeautifulSoup
or lxml
parse the page source retrieved from .page_source
.
In other words, a common workflow for a dynamic web page is something like:
- open the page in a browser controlled by selenium
- make the necessary browser actions
- once the desired data is on the page, get the
driver.page_source
and close the browser - pass the page source to an HTML parser for further parsing