Can a website detect when you are using Selenium with chromedriver? Can a website detect when you are using Selenium with chromedriver? python python

Can a website detect when you are using Selenium with chromedriver?


Basically, the way the Selenium detection works, is that they test for predefined JavaScript variables which appear when running with Selenium. The bot detection scripts usually look anything containing word "selenium" / "webdriver" in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used Chrome, so, all that I had to do was to ensure that $cdc_ didn't exist anymore as a document variable, and voilà (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

This is the function I modified in chromedriver:

File call_function.js:

function getPageCache(opt_doc) {  var doc = opt_doc || document;  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';  var key = 'randomblabla_';  if (!(key in doc))    doc[key] = new Cache();  return doc[key];}

(Note the comment. All I did I turned $cdc_ to randomblabla_.)

Here is pseudocode which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {    var documentDetectionKeys = [        "__webdriver_evaluate",        "__selenium_evaluate",        "__webdriver_script_function",        "__webdriver_script_func",        "__webdriver_script_fn",        "__fxdriver_evaluate",        "__driver_unwrapped",        "__webdriver_unwrapped",        "__driver_evaluate",        "__selenium_unwrapped",        "__fxdriver_unwrapped",    ];    var windowDetectionKeys = [        "_phantom",        "__nightmare",        "_selenium",        "callPhantom",        "callSelenium",        "_Selenium_IDE_Recorder",    ];    for (const windowDetectionKey in windowDetectionKeys) {        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];        if (window[windowDetectionKeyValue]) {            return true;        }    };    for (const documentDetectionKey in documentDetectionKeys) {        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];        if (window['document'][documentDetectionKeyValue]) {            return true;        }    };    for (const documentKey in window['document']) {        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {            return true;        }    }    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;    if (window['document']['documentElement']['getAttribute']('selenium')) return true;    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;    if (window['document']['documentElement']['getAttribute']('driver')) return true;    return false;};

According to user szx, it is also possible to simply open chromedriver.exe in a hex editor, and just do the replacement manually, without actually doing any compiling.


As we've already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called "Distil Networks" in play here. And, according to the company CEO's interview:

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

It'll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it's not related to the actions you take with selenium - once you navigate to the site, you get immediately detected and banned. I've tried to add artificial random delays between actions, take a pause after the page is loaded - nothing helped
  • it's not about browser fingerprint either - tried it in multiple browsers with clean profiles and not, incognito modes - nothing helped
  • since, according to the hint in the interview, this was "reverse engineering", I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver

Decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, what I haven't experimented with is older selenium and older browser versions - in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let's detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It's just a theory that needs to be tested.


Replacing cdc_ string

You can use vim or perl to replace the cdc_ string in chromedriver. See answer by @Erti-Chris Eelmaa to learn more about that string and how it's a detection point.

Using vim or perl prevents you from having to recompile source code or use a hex-editor.

Make sure to make a copy of the original chromedriver before attempting to edit it.

Our goal is to alter the cdc_ string, which looks something like $cdc_lasutopfhvcZLmcfl.

The methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim /path/to/chromedriver

After running the line above, you'll probably see a bunch of gibberish. Do the following:

  1. Replace all instances of cdc_ with dog_ by typing :%s/cdc_/dog_/g.
    • dog_ is just an example. You can choose anything as long as it has the same amount of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.
  2. To save the changes and quit, type :wq! and press return.
    • If you need to quit without saving changes, type :q! and press return.

Using Perl

The line below replaces all cdc_ occurrences with dog_. Credit to Vic Seedoubleyew:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string (e.g., dog_) has the same number of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.


Wrapping Up

To verify that all occurrences of cdc_ were replaced:

grep "cdc_" /path/to/chromedriver

If no output was returned, the replacement was successful.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don't see killed in the output, you've successfully altered the driver.

Make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you've already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, etc.


matomo