Selenium vs Jsoup performance Selenium vs Jsoup performance selenium selenium

Selenium vs Jsoup performance


If you have the DOM parsed already into JSoup, then I would recommend using JSoup. It is much faster than selenium, since it does not need to bother with a "living" DOM. Selenium must always check if the element handles are still valid before doing any operations with them.

If you can, avoid selenium altogether, since its overhead is really noticeable when you do serious scraping. Selenium shines however, if your content is dynamically generated by JavaScript in the client. JSoup can't handle this at all, since it does not execute JavaScript.

Addendum to answer comment

Short answer : It depends!

Longer:If the website you are scraping is generated by JavaScript and it does not change after generation, it is perfectly fine to access it with selenium, especially, if the DOM is complex and would take long to read into JSoup, although JSoup is fairly fast. However, JSoup will generate the DOM in memory again, so if your DOM is huge you will not only have it in a memory consuming way in selenium, but also in JSoup. This may or may not be an issue in your case, but it is worth keeping in mind.

From my personal experience I would kill the selenium process as soon as possible after getting the final HTML and parse this in JSoup again, since it is as you say: Jsoup scraping is way easier than the corresponding selenium selector constructs, especially if you are sure that any changes in the DOM after the initial creation are irrelevant to your scraping.