Can a website detect when you are using Selenium with chromedriver?
Basically, the way the Selenium detection works, is that they test for predefined JavaScript variables which appear when running with Selenium. The bot detection scripts usually look anything containing word "selenium" / "webdriver" in any of the variables (on window object), and also document variables called $cdc_
and $wdc_
. Of course, all of this depends on which browser you are on. All the different browsers expose different things.
For me, I used Chrome, so, all that I had to do was to ensure that $cdc_
didn't exist anymore as a document variable, and voilà (download chromedriver source code, modify chromedriver and re-compile $cdc_
under different name.)
This is the function I modified in chromedriver:
File call_function.js:
function getPageCache(opt_doc) { var doc = opt_doc || document; //var key = '$cdc_asdjflasutopfhvcZLmcfl_'; var key = 'randomblabla_'; if (!(key in doc)) doc[key] = new Cache(); return doc[key];}
(Note the comment. All I did I turned $cdc_
to randomblabla_
.)
Here is pseudocode which demonstrates some of the techniques that bot networks might use:
runBotDetection = function () { var documentDetectionKeys = [ "__webdriver_evaluate", "__selenium_evaluate", "__webdriver_script_function", "__webdriver_script_func", "__webdriver_script_fn", "__fxdriver_evaluate", "__driver_unwrapped", "__webdriver_unwrapped", "__driver_evaluate", "__selenium_unwrapped", "__fxdriver_unwrapped", ]; var windowDetectionKeys = [ "_phantom", "__nightmare", "_selenium", "callPhantom", "callSelenium", "_Selenium_IDE_Recorder", ]; for (const windowDetectionKey in windowDetectionKeys) { const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey]; if (window[windowDetectionKeyValue]) { return true; } }; for (const documentDetectionKey in documentDetectionKeys) { const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey]; if (window['document'][documentDetectionKeyValue]) { return true; } }; for (const documentKey in window['document']) { if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) { return true; } } if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true; if (window['document']['documentElement']['getAttribute']('selenium')) return true; if (window['document']['documentElement']['getAttribute']('webdriver')) return true; if (window['document']['documentElement']['getAttribute']('driver')) return true; return false;};
According to user szx, it is also possible to simply open chromedriver.exe in a hex editor, and just do the replacement manually, without actually doing any compiling.
As we've already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called "Distil Networks" in play here. And, according to the company CEO's interview:
Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.
It'll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:
- it's not related to the actions you take with selenium - once you navigate to the site, you get immediately detected and banned. I've tried to add artificial random delays between actions, take a pause after the page is loaded - nothing helped
- it's not about browser fingerprint either - tried it in multiple browsers with clean profiles and not, incognito modes - nothing helped
- since, according to the hint in the interview, this was "reverse engineering", I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver
Decided to post it as an answer, since clearly:
Can a website detect when you are using selenium with chromedriver?
Yes.
Also, what I haven't experimented with is older selenium and older browser versions - in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let's detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It's just a theory that needs to be tested.
Replacing cdc_
string
You can use vim
or perl
to replace the cdc_
string in chromedriver
. See answer by @Erti-Chris Eelmaa to learn more about that string and how it's a detection point.
Using vim
or perl
prevents you from having to recompile source code or use a hex-editor.
Make sure to make a copy of the original chromedriver
before attempting to edit it.
Our goal is to alter the cdc_
string, which looks something like $cdc_lasutopfhvcZLmcfl
.
The methods below were tested on chromedriver version 2.41.578706
.
Using Vim
vim /path/to/chromedriver
After running the line above, you'll probably see a bunch of gibberish. Do the following:
- Replace all instances of
cdc_
withdog_
by typing:%s/cdc_/dog_/g
.dog_
is just an example. You can choose anything as long as it has the same amount of characters as the search string (e.g.,cdc_
), otherwise thechromedriver
will fail.
- To save the changes and quit, type
:wq!
and pressreturn
.- If you need to quit without saving changes, type
:q!
and pressreturn
.
- If you need to quit without saving changes, type
Using Perl
The line below replaces all cdc_
occurrences with dog_
. Credit to Vic Seedoubleyew:
perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver
Make sure that the replacement string (e.g., dog_
) has the same number of characters as the search string (e.g., cdc_
), otherwise the chromedriver
will fail.
Wrapping Up
To verify that all occurrences of cdc_
were replaced:
grep "cdc_" /path/to/chromedriver
If no output was returned, the replacement was successful.
Go to the altered chromedriver
and double click on it. A terminal window should open up. If you don't see killed
in the output, you've successfully altered the driver.
Make sure that the name of the altered chromedriver
binary is chromedriver
, and that the original binary is either moved from its original location or renamed.
My Experience With This Method
I was previously being detected on a website while trying to log in, but after replacing cdc_
with an equal sized string, I was able to log in. Like others have said though, if you've already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, etc.