Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect selenium #182

Open
Zen33 opened this issue May 27, 2022 · 27 comments
Open

How to detect selenium #182

Zen33 opened this issue May 27, 2022 · 27 comments
Labels
enhancement New feature or request Research

Comments

@Zen33
Copy link

Zen33 commented May 27, 2022

`from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome = webdriver.Chrome(executable_path='./chromedriver.exe', chrome_options=chrome_options)
chrome.get('https://abrahamjuliot.github.io/creepjs')`

Invoke Chrome via the selenium package in Python, seemingly without being intercepted by creepjs, any suggestions? Thanks.

@abrahamjuliot
Copy link
Owner

This is on my mind. I will look into this and see what I can find.

@Zen33
Copy link
Author

Zen33 commented Jun 7, 2022

Thanks for your reply, I found that the project named botD could detect this case, but I guess that might depended on server-side analytics.

@abrahamjuliot
Copy link
Owner

Nice. There might be some new tricks at botD. Here are some resources by Antoine Vastel:

@Zen33
Copy link
Author

Zen33 commented Jun 8, 2022

I had tried fp-collect months ago. Since this project was not maintained any more (2019), the command line above: chrome_options.add_argument('--disable-blink-features=AutomationControlled') cannot be detected through https://antoinevastel.com/bots/

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Oct 14, 2022

I finally got around to testing this more in depth, and we do detect Selenium in headless. Even with Web Driver and the User Agent hidden, there are many headless signals available.

Detection of non-headless Selenium is missed, but I think that it is an unnecessary detection. Automated patterns can be detected through event listeners, but that's not a focus yet. I might create a test page for that.

Similarly, Puppeteer and Playwright can run Google Chrome in non-headless and use automation without being detected. I think all that is fine, as long as the web traffic is producing good activity and okay fingerprints.

This is the script I used.

import time
from selenium import webdriver


options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-blink-features=AutomationControlled') # web driver off
options.headless = True
options.add_argument("--window-size=800,600")
# make sure you download the driver that supports the chrome.exe
options.binary_location = "C:\Program Files\Google\Chrome Beta\Application\chrome.exe"
driver = webdriver.Chrome(options=options)


def save_screenshot(driver: webdriver.Chrome, path: str = 'selen_screenshot.png') -> None:
  # Ref: https://stackoverflow.com/a/52572919/
  original_size = driver.get_window_size()
  required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
  required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
  driver.set_window_size(required_width, required_height)
  # driver.save_screenshot(path)  # has scrollbar
  driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
  driver.set_window_size(original_size['width'], original_size['height'])

try:
  driver.get('https://abrahamjuliot.github.io/creepjs/')
  time.sleep(10)
  save_screenshot(driver)
  input("press any key to exit...")
finally:
  driver.quit()

@abrahamjuliot
Copy link
Owner

image

@Zen33
Copy link
Author

Zen33 commented Oct 17, 2022

Good job! I'll take a look the latest version of creepjs for the rest of this month, thanks.

@abrahamjuliot abrahamjuliot added the enhancement New feature or request label Jan 5, 2023
@Thorin-Oakenpants
Copy link

just fyi

@abrahamjuliot
Copy link
Owner

Nice. Just started researching this.

@Zen33
Copy link
Author

Zen33 commented Feb 23, 2023

Thanks.

@kaliiiiiiiiii
Copy link

kaliiiiiiiiii commented Mar 18, 2023

@abrahamjuliot for selenium, and all chromedriver-driven browsers, check the two following values:

navigator.webdriver ==> remote debugging enabled resource

  • enabled?
  • lied?
    • spoofed with undetected-chromedriver script ? :
Object.defineProperty(window, 'navigator', {
    value: new Proxy(navigator, {
         has: (target, key) => (key === 'webdriver' ? false : key in target),
         get: (target, key) =>
              key === 'webdriver' ?
              false :
              typeof target[key] === 'function' ?
              target[key].bind(target) :
              target[key]
         })
    });                

cdc_adoQpoasnfa76pfcZLmcfl_Xxxxxxx

  • gets added to every new page with the following script:
(function () {
    window.cdc_adoQpoasnfa76pfcZLmcfl_Array = window.Array;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Object = window.Object;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise = window.Promise;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Proxy = window.Proxy;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol = window.Symbol;
}) ();
  • the "random" strings seem to be hardcoded, but maybe directly using regex as following:
let objectToInspect = window,
    result = [];
while(objectToInspect !== null)
    { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
    objectToInspect = Object.getPrototypeOf(objectToInspect); }
return result.filter(i => i.match(/.+_.+_(Array|Promise|Symbol)/ig))
  • exist ?
  • lied?
    • spoofed with old (Version<=V3.2) undetected-chromedriver script ? :
let objectToInspect = window,
    result = [];
    while(objectToInspect !== null)
        { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
        objectToInspect = Object.getPrototypeOf(objectToInspect); }
    result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
        &&delete window[p]&&console.log('removed',p))

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Mar 18, 2023

Nice. These have been on my mind. There's also a way to get the cdc_... properties from the descriptors and bypass any random names added. I might add this at some point (a general detection).

undetected-chromedriver detection is excellent, but too specific for public concepts. The devs can fix the code and the detection becomes obsolete.

Good tips.

@vxuv
Copy link

vxuv commented Mar 18, 2023

Some more additional flags for detecting selenium and selenium adjacent softwares:

window["__nightmare"]
window["cdc_adoQpoasnfa76pfcZLmcfl_Array"]
window["cdc_adoQpoasnfa76pfcZLmcfl_Promise"]
window["cdc_adoQpoasnfa76pfcZLmcfl_Symbol"]
window["OSMJIF"]
window["_Selenium_IDE_Recorder"]
window["__$webdriverAsyncExecutor"]
window["__driver_evaluate"]
window["__driver_unwrapped"]
window["__fxdriver_evaluate"]
window["__fxdriver_unwrapped"]
window["__lastWatirAlert"]
window["__lastWatirConfirm"]
window["__lastWatirPrompt"]
window["__phantomas"]
window["__selenium_evaluate"]
window["__selenium_unwrapped"]
window["__webdriverFuncgeb"]
window["__webdriver__chr"]
window["__webdriver_evaluate"]
window["__webdriver_script_fn"]
window["__webdriver_script_func"]
window["__webdriver_script_function"]
window["__webdriver_unwrapped"]
window["awesomium"]
window["callSelenium"]
window["calledPhantom"]
window["calledSelenium"]
window["domAutomationController"]
window["watinExpressionError"]
window["watinExpressionResult"]
window["spynner_additional_js_loaded"]
document["$chrome_asyncScriptInfo"]
window["fmget_targets"]
window["geb"]

@vxuv
Copy link

vxuv commented Mar 18, 2023

Also worthwhile to check the types of navigator.plugins to ensure that it hasn't been tampered with.

@kaliiiiiiiiii
Copy link

@vxuv @abrahamjuliot

Other relevant values can be found here:
https://github.com/HMaker/HMaker.github.io/blob/master/selenium-detector/chromedriver.js

It detects objects created//used by chromedriver.

@JWally
Copy link
Contributor

JWally commented May 5, 2023

@abrahamjuliot for selenium, and all chromedriver-driven browsers, check the two following values:

navigator.webdriver ==> remote debugging enabled resource

  • enabled?

  • lied?

    • spoofed with undetected-chromedriver script ? :
Object.defineProperty(window, 'navigator', {
    value: new Proxy(navigator, {
         has: (target, key) => (key === 'webdriver' ? false : key in target),
         get: (target, key) =>
              key === 'webdriver' ?
              false :
              typeof target[key] === 'function' ?
              target[key].bind(target) :
              target[key]
         })
    });                

cdc_adoQpoasnfa76pfcZLmcfl_Xxxxxxx

  • gets added to every new page with the following script:
(function () {
    window.cdc_adoQpoasnfa76pfcZLmcfl_Array = window.Array;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Object = window.Object;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise = window.Promise;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Proxy = window.Proxy;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol = window.Symbol;
}) ();
  • the "random" strings seem to be hardcoded, but maybe directly using regex as following:
let objectToInspect = window,
    result = [];
while(objectToInspect !== null)
    { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
    objectToInspect = Object.getPrototypeOf(objectToInspect); }
return result.filter(i => i.match(/.+_.+_(Array|Promise|Symbol)/ig))
  • exist ?

  • lied?

    • spoofed with old (Version<=V3.2) undetected-chromedriver script ? :
let objectToInspect = window,
    result = [];
    while(objectToInspect !== null)
        { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
        objectToInspect = Object.getPrototypeOf(objectToInspect); }
    result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
        &&delete window[p]&&console.log('removed',p))

I was trying to hide the "webdriver=true" navigator attribute, and asked chatgpt. Its answer is spookily similar to yours.

const originalNavigator = navigator;

const proxyNavigator = new Proxy(originalNavigator, {
  get(target, prop) {
    if (prop === 'webdriver') {
      return false;
    }
    return target[prop];
  },
  ownKeys(target) {
    const keys = Reflect.ownKeys(target);
    return keys.filter((key) => key !== 'webdriver');
  },
  getOwnPropertyDescriptor(target, prop) {
    if (prop === 'webdriver') {
      return undefined;
    }
    return Reflect.getOwnPropertyDescriptor(target, prop);
  },
});

// Replace the global navigator object with the proxy object
Object.defineProperty(window, 'navigator', {
  value: proxyNavigator,
  configurable: false,
  enumerable: false,
  writable: false,
});

I'm hoping against hope, but any way to see if an object is a Proxy?

I feel like chrome is teasing me in the console.

image

@abrahamjuliot
Copy link
Owner

That is funny. I asked Bing Chat (gpt4) about our detecting JS Proxies and what it thought about our methods here. It didn't like our code and insisted we try outdated techniques on stack overflow.

@JWally
Copy link
Contributor

JWally commented May 5, 2023

That's pretty solid btw!!!!

Reflect.setPrototypeOf(navigator, Object.create(navigator)) seems to be an approach for differentiating a proxied navigator object and the real deal.

Returns true if its proxied; false if its not.

Honest question - why doesn't webdriver = true set the bot-score at 100?

@abrahamjuliot
Copy link
Owner

The bot score has some game elements and includes tags like friend and stranger. By default, everyone is treated as a bot. From there, we just want to establish some level of trust. The more transparent and normal the player, the less they are perceived as untrustworthy. This allows use of web driver and headless UAs since these are designed for transparency.

@JWally
Copy link
Contributor

JWally commented May 5, 2023

I'm sorry, I meant to say the 'headlessRating'.

Instead of just having 20% weight if true (I think its 1 in 5 attributes), it feels like it should automatically trip the value to 100% when its true.

I could be wrong, but I doubt you'd get too many false positives where "normal" users have webdriver set to true - seems like a really strong signal when present.

Just a thought as I'm going through your library trying to distil heuristics I can steal :-)
Outstanding work btw!

@abrahamjuliot
Copy link
Owner

Ah yes, that's a good idea. I might change that at some point.
Attempted overrides of navigator.webdriver can maybe additionally give weight to the stealth rating.

@abrahamjuliot
Copy link
Owner

BTW, I need to remove these from headless rating and move to like headless. These can appear in Android WebView, Smart TVs and other Chromium flavors.

noChrome
hasPermissionsBug

@JWally
Copy link
Contributor

JWally commented May 6, 2023

That's a really good point. I totally forgot about the plethora of platforms that can legitimately ping a service, but look "weird".

I don't know if its worth the investment, but it might be interesting to have somewhere an attribute called "framework" or "automated" or something.

I'm sure some clever AI could parse out good rules, but things like you're using Windows, Chromium, and Webdriver is true? Minimum 99.8% chance you're automated. Its not good or bad, but its definitely not normal and it'd be useful to flag.

I could be here for weeks MMQB'ing this thing into the ground; dorking out over what-if's and things I think would be useful :-P

Thanks again for maintaining this thing!!

@vxuv
Copy link

vxuv commented May 6, 2023

Ah yes, that's a good idea. I might change that at some point. Attempted overrides of navigator.webdriver can maybe additionally give weight to the stealth rating.

It's possible to override the types for these object if I remember correctly. Probably still nice as an extra measure.

@NCLnclNCL
Copy link

I'm sorry, I meant to say the 'headlessRating'.

Instead of just having 20% weight if true (I think its 1 in 5 attributes), it feels like it should automatically trip the value to 100% when its true.

I could be wrong, but I doubt you'd get too many false positives where "normal" users have webdriver set to true - seems like a really strong signal when present.

Just a thought as I'm going through your library trying to distil heuristics I can steal :-) Outstanding work btw!

Headless=new new cannot detect

@JWally
Copy link
Contributor

JWally commented Jun 30, 2023

@NCLnclNCL you're 100% right.

Right now, I'm keeping a Bayesian score, and if the browser is chromium and the OS isn't Linux; that's a big red flag its a bot. Not 100%, but definitely worth paying attention to. /shrug

@NCLnclNCL
Copy link

@NCLnclNCL you're 100% right.

Right now, I'm keeping a Bayesian score, and if the browser is chromium and the OS isn't Linux; that's a big red flag its a bot. Not 100%, but definitely worth paying attention to. /shrug

i think very hard to detect headless=new bro, it can slow than old headless but it is perfect to antidetect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Research
Projects
None yet
Development

No branches or pull requests

7 participants