Selenium HtmlUnitDriver Web Scrape Got Captcha Page From EC2 Server
Are you covering these topics:
-Which agent are you using? Make sure you are using the same agent which you would use in a human navigation, more details in this link.
-Are you inserting waits in your navigation? If as soon as a page load you try to click or navigate, this isn't simulating a regular navigation. More details.
-Which driver are you using, there is a trick with chromedriver to rename a internal variable "cdc_" to other name like "aaa_" then if there is a javascript code in the server trying to detect this variable (cdc_), it will fail. More details.
-There are more things to be studied if you really need to not be detected by the server:
-Is there a honeypot in place?-Are your IP (EC2 IP) already blocked? You could redirect using a VPN tunnel.
Interesting articles:
https://www.kdnuggets.com/2018/02/web-scraping-tutorial-python.html
https://antoinevastel.com/bot%20detection/2017/08/05/detect-chrome-headless.html
https://intoli.com/blog/making-chrome-headless-undetectable/