Selenium HtmlUnitDriver Web Scrape Got Captcha Page From EC2 Server Selenium HtmlUnitDriver Web Scrape Got Captcha Page From EC2 Server selenium selenium

Selenium HtmlUnitDriver Web Scrape Got Captcha Page From EC2 Server


Are you covering these topics:

-Which agent are you using? Make sure you are using the same agent which you would use in a human navigation, more details in this link.

-Are you inserting waits in your navigation? If as soon as a page load you try to click or navigate, this isn't simulating a regular navigation. More details.

-Which driver are you using, there is a trick with chromedriver to rename a internal variable "cdc_" to other name like "aaa_" then if there is a javascript code in the server trying to detect this variable (cdc_), it will fail. More details.

-There are more things to be studied if you really need to not be detected by the server:

-Is there a honeypot in place?-Are your IP (EC2 IP) already blocked? You could redirect using a VPN tunnel.

Interesting articles:

https://www.kdnuggets.com/2018/02/web-scraping-tutorial-python.html

https://antoinevastel.com/bot%20detection/2017/08/05/detect-chrome-headless.html

https://intoli.com/blog/making-chrome-headless-undetectable/