Why do we still need parser like BeautifulSoup if we can use Selenium?

python selenium beautifulsoup web-crawler urllib2

Selenium itself is quite powerful in terms of locating elements and, it basically has everything you need for extracting data from HTML. The problem is, it is slow. Every single selenium command goes through the JSON wire HTTP protocol and there is a substantial overhead.

In order to improve the performance of the HTML parsing part, it is usually much faster to let BeautifulSoup or lxml parse the page source retrieved from .page_source.

In other words, a common workflow for a dynamic web page is something like:

open the page in a browser controlled by selenium
make the necessary browser actions
once the desired data is on the page, get the driver.page_source and close the browser
pass the page source to an HTML parser for further parsing

CodeHunter

Why do we still need parser like BeautifulSoup if we can use Selenium?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last