Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

just opening one for my research on bot detection and stuff #190

Open
vis2021t opened this issue Jul 12, 2022 · 138 comments
Open

just opening one for my research on bot detection and stuff #190

vis2021t opened this issue Jul 12, 2022 · 138 comments

Comments

@vis2021t
Copy link

I looked over the tls fingerprinting, You talked about but there is something I read at akamai research where they stated that bot are able to bypass to get on gud side :-
https://www.akamai.com/blog/security/bots-tampering-with-tls-to-avoid-detection

I came across a 2 step tls fingerprinting but I lost that pdf 🥲🥲 dammit

Will try to find it but do u know about it?

@vis2021t
Copy link
Author

vis2021t commented Jul 12, 2022

True, bots can still bypass it. I have some good resources. Have not heard of the 2 step.

Everything is bypassable in the world of Javascript well Thanks for resources I am looking into them just now

@vis2021t
Copy link
Author

vis2021t commented Jul 12, 2022

1-s2 0-S0167404821003990-ga1_lrg

I found this chart which maybe something of our interest

@vis2021t
Copy link
Author

vis2021t commented Jul 13, 2022

I was wondering to look over CVE for specific browser and it's version,

If for demo purpose we can proceed ahead and identify too much info on the device/browser

I know it's actually creepy but comeon it's in the name too lol

It's not a bad idea u know We can identify many things if we play well but I'm not sure it's a gud idea to implement but it's a definitely gud section to look still not sure for implementation.

What do u feel?

  • I think Platform lies should be considered as a part of bot lies, Like We can keep them as I have noticed bots have different level in creepjs bot detection section

@abrahamjuliot
Copy link
Owner

Not a bad idea. Maybe start with a test page. What I sometimes do is begin with a test page and experiment/research there. If we get stable results, we can release on the main page. If it has good performance and good fingerprinting, we can implement it in the main fingerprint.

Platform lies part of bot lies

I like this idea. I will look into it.

@vis2021t
Copy link
Author

vis2021t commented Jul 13, 2022

I am really interested in chrome://chrome-urls/ There are many thing which can make things go really really really deep

++ I am looking over cve which can verify the browser version for us but I was thinking over more of the section of bot detection, hmm and yea I saw there are Many features which are not supported in Chrome android at the section of Chrome flags there is a section for what is not supported on my device maybe can be something of notice? I guess So maybe we can look Into it

@abrahamjuliot
Copy link
Owner

This one is interesting… till it gets patched. In Chrome, it can be used to validate if a device is really on macOS.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Share_API#api.navigator.canshare
https://bugs.chromium.org/p/chromium/issues/detail?id=1144920

@vis2021t
Copy link
Author

See I told u Cve and bugs are great place for us to look even if it will be patched for later versions it will still be there for people who don't usually update ( I was one of them ) And I know many who don't update

@vis2021t
Copy link
Author

vis2021t commented Jul 14, 2022

Btw Do u have anything in mind for bot detection ahead?

I mean in the end Creepjs is a bot detection repo sort of itself,

from the section of lies till loosing their expected features

So I was curious if u had something in research lately

Note:- Android and iOs devices never come with Angle as their gpu if they are real, Google emulator Friendly web test had the same thing and I have seen it only in bots till yet when it comes to these 2 os,

It can be a small point

I mean Imagine seeing intel as the gpu of Android device user 😂 aah dude nevermind just want to convey that hardware filter are an essential parts in gpu to

combining confidence methodology it can be a gud charm

@vis2021t
Copy link
Author

I think I will love to go ahead at bugs amd cve section for creepjs Look at this:- 😈

This place is really a treasure for us

Screenshot_20220714-105929_Kiwi Browser

@vis2021t
Copy link
Author

mmm don't u think we should bring up geckodriver too in headless section as Till yet it is focused on chromedriver

@abrahamjuliot
Copy link
Owner

Good idea. We should absolutely include geckodriver and more.

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Jul 17, 2022

bot detection and research

Nothing on my mind, atm. But, ideas are welcome.

gpu hardware filter

This is on my mind. I've been slow to get to it. We should definitely look out for GPU lies in reported mobile devices. Samsung Xclipse 920 has Angle, but I think we can determine Angle is not iOS.

@vis2021t
Copy link
Author

bot detection and research

Nothing on my mind, atm. But, ideas are welcome.

gpu hardware filter

This is on my mind. I've been slow to get to it. We should definitely look out for GPU lies in reported mobile devices. Samsung Xclipse 920 has Angle, but I think we can determine Angle is not iOS.

mm but expect that device almost every device comes with real like mediatek helio or Qualcomm

@vis2021t
Copy link
Author

Hi, was busy with something well let's get back to research

I found something interesting to look at:-

mdn/content#6849

@vis2021t
Copy link
Author

https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=927531

found something to look at

it's regarding 2 step tls fingerprinting

@abrahamjuliot
Copy link
Owner

Nice. I wonder if TLS fingerprint is distinct on mobile devices vs desktop. I presume no.

@vis2021t
Copy link
Author

vis2021t commented Aug 2, 2022

Do u have a report of what is the top 5 browser version Creepjs usually gets to see

I am curious if people use older version as there are bugs and vulnerability if old one is there _ that might be an interesting approach if we go in ethical way

@abrahamjuliot
Copy link
Owner

It depends on the date, but the top 5 versions usually consist of versions at or near the latest stable releases of Blink, Gecko, and WebKit. Here's yesterday, for example:

image

We do get a lot of older browsers, though. The window test page contains a pool of browser versions seen in the last 40 days.

I'm sure we would see even older browsers if the code was geared for ES5. Right now, the target is ES2019.

@vis2021t
Copy link
Author

vis2021t commented Aug 6, 2022

found something

Navigator.connection.type only there for android and ios

can be a part as it is something quite not people hide

if windows and Linux it's not there they says privacy issues........ Like they gave it to android and ios
well better for us enj0y

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Aug 7, 2022

Nice. I plan to add this. Looks like type is only on Android and Chrome OS, but we could use this to determine if a device is really Android/Chrome OS. There are a lot of interesting ways this API can be used for fingerprinting. These are also in client hint headers.

https://wicg.github.io/netinfo/#privacy-considerations
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers#network_client_hints

rtt in Headless Chrome is 0, but I'm not sure if that is always the case and exclusive to headless.

@vis2021t
Copy link
Author

vis2021t commented Aug 7, 2022

I wanna test the networkinformation type to Google mobile friendly display test

I think majority of the big brand bots uses simulation instead of emulation so it could be a part in terms of bot who are stating to be android but They are not , can be considered as suspicious by us

I am currently learn typescript for js as we are switching at that

I will explore Navigator more deep into every inner parts of it

@vis2021t
Copy link
Author

vis2021t commented Aug 7, 2022

What is your net speed hope it's not in gbps lol
rtt? I checked on my chrome browser ( Kiwi browser [ Android Chromium based browser with pc dev tools ] to look at navigation.connection and research )

here is my result:-
Screenshot_20220807-065732_Kiwi Browser

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Aug 8, 2022

I need to test more in Kiwi. Here's Chrome canary

image

Chrome OS

image

@vis2021t
Copy link
Author

vis2021t commented Aug 8, 2022

mm I wonder if brave mobile is different from normal brave in a way

I wasn't aware of jsconsole.com so I was using this for other browsers

javascript:(function () { 
    var script =  document.createElement('script');
    script.src="//cdn.jsdelivr.net/npm/eruda"; 
    document.body.appendChild(script);
    script.onload = function () { 
        eruda.init() 
    } 
})();

@vis2021t
Copy link
Author

vis2021t commented Aug 8, 2022

I need to test more in Kiwi. Here's Chrome canary

Does it mean headless rtt is 0 as a special case?

I tested on Chrome, Brave, Kiwi , Chromium on both Android and Windows and Linux

All results are more than 0 in rtt normally

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Aug 8, 2022

Does it mean headless rtt is 0 as a special case?

I imagine 0 is very rare. I read somewhere that 0 was seen in some Edge browsers. Not sure if that is accurate, though. 0 could be a result of dev tools network emulation or other rare network patterns. I have a commit incoming soon that will include network info and more.

@abrahamjuliot
Copy link
Owner

I did some research on 192.168... and it seems to be exclusive to home WIFI networks. Something very interesting is the first set of characters following candidate:... is a hash string that actually contains the base IP address, but only on the host connection and only in Chrome and more recent versions of Safari.

Here's the ComputeFoundation function in the Chromium source code which contains this method, base_address.ipaddr().ToString().

the draft outlines the computation in greater detail (section "5.1.1.3. Computing Foundations")
https://datatracker.ietf.org/doc/id/draft-ietf-ice-rfc5245bis-16.txt

@vis2021t
Copy link
Author

vis2021t commented Aug 9, 2022

I imagine 0 is very rare. I read somewhere that 0 was seen in some Edge browsers. Not sure if that is accurate, though. 0 could be a result of dev tools network emulation or other rare network patterns. I have a commit incoming soon that will include network info and more.

Hmm what can we do I think we can take it as a suspicious point maybe

if it's unusually rare, it can be a thing, but I'm not sure if we should

it's sort of similar to :- likeHeadless one in our creepjs we can do likeUnusal or something

@abrahamjuliot
Copy link
Owner

Good idea. Added to like headless.

image

@abrahamjuliot
Copy link
Owner

It's all good.

Dark Reader is great. That's a good detection, too. It can be a human indicator if it is on. Something like this, maybe.

image

@vis2021t
Copy link
Author

That's a good detection, too. It can be a human indicator if it is on.

True

I use dark reader all the time was working on a website so I saw it while debugging haha, will look for more interesting plugins which may leak some things over documents etc

@vis2021t
Copy link
Author

vis2021t commented Oct 5, 2022

Hi, I was looking around gmail and I saw the are able to detect a secure or a suspicious browser, somewhat like we do at creepjs. But I am curious with their mechanism. I saw it after when we enter gmail address there is a detection script there.
If browser is ok or not ( including bot detection ), It's always good to take inspirations haha

Wanna explore together?

@abrahamjuliot
Copy link
Owner

Sure, I imagine they use UA client hints to detect unseen devices and then warn backup email of unknown device log in to x account. The difficulty is de-obfuscating their code. This repo has a lot we can also look at.

@vis2021t
Copy link
Author

Sure
finally ur back haha , kinda missed us.

anyway I think gmail uses something more complex

even puppeteer stealth can't get in login even in normal like same useragent etc without headless written there

I think that's why I want us to see what's intresting there

when u were inactive I was learning over dev tools detection from this repo :-
https://github.com/AEPKILL/devtools-detector
and I tested it, it's working smooth with detections

but for now I'm really more interested in gmail detection

Because of the above reason

that's why I got interested maybe there can be something more we could learn ? who knows

@vis2021t
Copy link
Author

Sure This repo has a lot we can also look at.

Damm that repo, I can sense some awesome thing right there

@vis2021t
Copy link
Author

Is the obfuscators absolutely foolproof?
No, while it's impossible to recover the exact original source code, someone with the time, knowledge and patience can reverse-engineer it.

Since the JavaScript runs on the browser, the browser's JavaScript engine must be able to read and interpret it, so there's no way to prevent that. And any tool that promises that is not being honest.

-- mentioned in https://obfuscator.io/#FAQ

one of the best obfuscator I have seen till yet

@vis2021t
Copy link
Author

https://github.com/chris124567/commercial-bot-detectors/blob/master/files/google_botguard_deobfuscated.js

lol this is exactly what we needed

@abrahamjuliot
Copy link
Owner

abrahamjuliot commented Oct 12, 2022

devtools-detector

Nice. I ran into that recently. That's a good detection.

https://obfuscator.io/#FAQ

Good points. The Googlebot code looks like a challenge. I can see it collects the error stack here.

@vis2021t
Copy link
Author

vis2021t commented Oct 12, 2022

devtools-detector

Good points. The Googlebot code looks like a challenge. I can see it collects the error stack here.

Agreed, I am working over a small project rn which includes me to use ejs and express and a cdn of maybe vuejs, react native or any front end framework.

I literally learned all 3 ( vue, angular and react ) within 5 days. u can imagine it's been a mind blowing week for me
Vuejs and React meet upto my requirements
I will be completed with work day after tomorrow

will start over looking googlebot one probably day after tomorrow.

haaah ~ sigh in tiredness ~

@vis2021t
Copy link
Author

Hii , I'm done with my project.

Let's research 💝

I'm gonna look at the Google botgaurd.
any information u discovered? maybe?

@vis2021t
Copy link
Author

vis2021t commented Oct 18, 2022

I found something, I even opened a issue as research the owner is kinda active too I noticed now so

that's the latest code of Google botgaurd reverse attempt:-

https://github.com/icetroll/botguard-RE

we can learn from here

@abrahamjuliot
Copy link
Owner

Nice. That is a lot of code. I think it has to do with behavioral fingerprints. I see a few event listeners connected to DOM elements.

@abrahamjuliot
Copy link
Owner

I've been researching ways to detect Selenium and found some interesting leaks. Fascinating article here. Those values seem to be manipulated by different bots, but the object prototype contains unique keys that are important to the internal code. I haven't tested it, but I think it's possible to override those functions with eval code and use them to get internal values.

image

@vis2021t
Copy link
Author

vis2021t commented Oct 21, 2022

Naughty Eval
image

Very well I see now, can it also be refer as a info disclosure? If it works properly as we expect it to be, I am looking into the google bot code pattern detection (it is interesting but really nested), and also looking at the previous code challenge of google bot

@abrahamjuliot
Copy link
Owner

The prototype functions might only reveal Selenium code and possibly different versions of the code.

@vis2021t
Copy link
Author

vis2021t commented Oct 22, 2022

The prototype functions might only reveal Selenium code and possibly different versions of the code.

That too will be really interesting for creepjs. I am sure, maybe a sure bot detection haha.

Rn I am giving names to the code of g-botgaurd to like understand it's working

@vis2021t
Copy link
Author

vis2021t commented Oct 25, 2022

I have understood quite much about Google botgaurd, I will give u a summary properly here it is intresting ngl

@vis2021t
Copy link
Author

Any update over ur research?

@abrahamjuliot
Copy link
Owner

Nothing yet. But, a lot on my mind. I think the storage bytes are an incredible high entropy fingerprint in Chrome. It depends on the machine and what it's used for, but if there are no changes in storage, the fingerprints can categorize a machine in 1 trillion possible fingerprints (to put it lightly). In private tabs, chromium reduces entropy (unstable per session and low bytes available).

Unrelated, I have this idea I might experiment with at some point. It's essentially a soft/superfast fingerprinting (less than 10ms and mostly low entropy), then it progressively slows down and expands into high entropy if anomalous hashes are detected. The idea is to make bad fingerprints move more slowly and good fingerprints move more quickly.

@vis2021t
Copy link
Author

I looked over current gmail working, I found that they are monitoring and using the performance api very well, which I didnt knew thought of I am exploring more but I saw the new v3 is

I exploring other's antibot and monitoring behavior to expand creepjs , rn I am seeing this:-

https://developer.chrome.com/docs/extensions/mv3/intro/

@vis2021t
Copy link
Author

seeing their website it's interesting how they use api's and clever javascript ( what is more interesting is that they have mentioned in their code as comment that those codes were written in the year 2016 if they are not lying it's quite fascinating ), till yet I am seeing and writing what api they are using then I will summarize things here as I go

@vis2021t
Copy link
Author

vis2021t commented Nov 5, 2022

I am resuming my research summarizing from today let's see I can put up some intresting points

@HMaker
Copy link

HMaker commented Nov 13, 2022

about chromedriver detection, check https://github.com/HMaker/HMaker.github.io/tree/master/selenium-detector

most of the tests can be easily bypassed by patching chromedriver src though.

@abrahamjuliot
Copy link
Owner

chromedriver detection

Very nice detection, there.. Can these functions be patched or removed? The functions names can be modified, but wouldn't the prototype still leak the names.

@HMaker
Copy link

HMaker commented Nov 13, 2022

You can also change the prototype completely, also you could make chromedriver store that on window instead of document.

chromedriver is just a CDP wrapper, but it sits at higher level of chromium architecture, so they use the page global JS state to store automation related vars.

@vis2021t
Copy link
Author

vis2021t commented Nov 16, 2022

Gmail stuff summary

They use proxy detection (mostly based on performance api ) + worker is their focus just like we have here + they do have few basic feature detection and with err detection and buckets etc and rest it's just they made them lengthy

@vis2021t
Copy link
Author

vis2021t commented Nov 16, 2022

        Chromedriver Detector
        Detected!

funnnnnnnnn

@vis2021t
Copy link
Author

vis2021t commented Nov 16, 2022

about chromedriver detection

get a hug dude lol it's a good repo great effort really loved it

@vis2021t
Copy link
Author

I was thinking to challenge myself against creepjs techniques hehe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants