Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl-impersonate: An alternative to utls? #345

Open
mmmray opened this issue Mar 21, 2024 · 21 comments
Open

curl-impersonate: An alternative to utls? #345

mmmray opened this issue Mar 21, 2024 · 21 comments

Comments

@mmmray
Copy link

mmmray commented Mar 21, 2024

In a similar theme as #336, I see significant overlap between the technical solutions the scraping/botting community comes up with, and the needs for censorship circumvention. This is interesting because there is some commercial/industrial power (and therefore developer time) behind developing web scraping techniques, more so than behind censorship circumvention.

This week I came across https://github.com/lwthiker/curl-impersonate, which appears to be identical in function to uTLS, except written by completely different people, written for a different purpose, and using the same TLS libraries as the mimicked browsers. It seems to me that the level of impersonation it provides might lie somewhere between naiveproxy and utls because of that.

I wonder if anybody has the resources to try this tool in the field and see how well it performs against e.g. xray's integration of utls.

@mmmray
Copy link
Author

mmmray commented Mar 21, 2024

related thread: refraction-networking/utls#103

@gaukas
Copy link

gaukas commented Mar 22, 2024

You raised a very good question: what are the better approaches to TLS (client library) parroting than uTLS?

One common challenge in maintaining parroting library such as uTLS is that popular implementations are getting highly volatile with new technologies being rolled out, and it could take huge effort in keeping up with the latest version. Also, there might be behaviors that we don't completely understand and/or cannot efficiently mimic (e.g., TLS handshake message fragmentation, etc), resulting in imperfect parrots.

Per The Parrot is Dead, parroting will never be an efficient approach and instead we should try to use as much as the "real implementation": Instead of pretending to be doing a TLS handshake (shadow-tls), do a real TLS handshake and TLS communication (XTLS/Reality); Instead of pretending to be Chrome (uTLS), use Chrome (naiveproxy).

So now the challenge is: how do we efficiently use these existing popular implementations in tools that are written in a different programming language? (also consider the cross-platform compatibility)

@gaukas
Copy link

gaukas commented Mar 22, 2024

how do we efficiently use these existing popular implementations in tools that are written in a different programming language?

I hope water can be one of the answers. (despite it could be heavier than native code)

@0x391F
Copy link

0x391F commented Mar 22, 2024

how do we efficiently use these existing popular implementations in tools that are written in a different programming language?

I hope water can be one of the answers. (despite it could be heavier than native code)

How about NSS? It's the crypto library of Firefox, Thunderbird, etc.

@gaukas
Copy link

gaukas commented Mar 22, 2024

How about NSS? It's the crypto library of Firefox, Thunderbird, etc.

It's good, but still facing the same challenge as naiveproxy(cronet) that you need a cross-platform, cross-language interface.

That's true for all libraries no matter how well adopted it is.

@mmmray
Copy link
Author

mmmray commented Mar 23, 2024

You raised a very good question: what are the better approaches to TLS (client library) parroting than uTLS?

I think more important than "the best approach" in an academic sense (parroting vs not, full browser dialler vs naiveproxy approach vs only using BoringSSL/NSS) is which approach has the most resources thrown behind it, is most easily deployable, and therefore has a chance of continuous funding. I see curl-impersonate in an interesting position here where it is already sufficiently easy to use for commercial applications across a variety of languages, since bindings to curl exist for a bunch of them. E.g. just the Python bindings for curl-impersonate have 1.2k stars

curl-impersonate provides builds for a variety of SSL libraries, one of them is NSS, necessary to impersonate Firefox.

@gaukas
Copy link

gaukas commented Mar 23, 2024

I guess we are almost on the same page. I totally agree with you that curl-impersonate shows very promising future with its great flexibility of being able to directly replace curl (including libcurl) which is very popular and has great support across different programming languages.

However in practice, let's say how hard is it to integrate curl-impersonate with Go (since most of the circumvention tools are written in it)? Not very easy when without CGO, I guess. Here, a common challenge for such multi-binding library is they almost always rely on C, which is fine when being done with good flexibility on devices such as PC or full-scaled servers, but not necessarily for less flexible platforms like mobile phones, gateway/routers, IoT devices, etc.

@markpash
Copy link

markpash commented Mar 23, 2024

However in practice, let's say how hard is it to integrate curl-impersonate with Go (since most of the circumvention tools are written in it)? Not very easy when without CGO, I guess. Here, a common challenge for such multi-binding library is they almost always rely on C, which is fine when being done with good flexibility on devices such as PC or full-scaled servers, but not necessarily for less flexible platforms like mobile phones, gateway/routers, IoT devices, etc.

This is a problem I've been thinking about for over a year. While WATER and wasm is obviously an elegant solution, it requires embedding some kind of runtime in any software that intends to use the wasm binary. So it's fundamentally different from the traditional library/binding approach.

So is the concern that libraries and software written in various languages are not exactly portable/buildable on all platforms that require it? Do we need better build and cross-build tools? Cross compilation is a massive pain, I solve this with my oven. I think through the use of Nix as the build environment and Zig as a C toolchain and cross-compiler, there's a staggering amount of libraries and tools that can be cross-compiled and linked for a lot of platforms. (wasm can of course be a build target too!)

Would the perfect world be that transports can be written in various languages, and are cross-compiled to static libraries and wasm binaries?

(sorry if this diverges from the topic)

@gaukas
Copy link

gaukas commented Mar 23, 2024

Would the perfect world be that transports can be written in various languages, and are cross-compiled to static libraries and wasm binaries?

Ideally, yes, we want natively written transport modules. Pluggable Transports is a good example for that, which also reveals some inherent challenges in doing so such as high maintenance cost. Cross-compilation (into shared library) is on the other hand, shifting the challenge to how to efficiently integrating the library with various programming languages on various platforms.

It is not trivial and if solved, the development and maintenance cost of different circumvention tools can be greatly reduced, leading to much better efficiency in developing and deploying new techniques.

@markpash
Copy link

Ideally, yes, we want natively written transport modules. Pluggable Transports is a good example for that, which also reveals some inherent challenges in doing so such as high maintenance cost. Cross-compilation (into shared library) is on the other hand, shifting the challenge to how to efficiently integrating the library with various programming languages on various platforms.

To me this sounds like a combination of a tooling problem (lack of widespread use of versatile build tools) and a code problem (lack of bindings for each library).

It is not trivial and if solved, the development and maintenance cost of different circumvention tools can be greatly reduced, leading to much better efficiency in developing and deploying new techniques.

Indeed, we wouldn't be worried and limited by something like CGO if the libraries we intend to import were static and would build on any platform/arch. Reaching out to whatever project to improve their code/build system is a high barrier to entry for those who may only wish to write Go or Rust or whatever language they choose.

@klzgrad
Copy link

klzgrad commented Mar 24, 2024

It's one thing to have a cross compile toolchain for oneself, it's another thing to convince downstream application developers to migrate their original toolchains to your toolchain to link this library just to have a marginally better defense against the theoretical threat of being an imperfect parrot, which almost never works because being a better parrot is almost never a higher priority than the user facing features that the application developers deal with.

@mmmray
Copy link
Author

mmmray commented Mar 30, 2024

I remember those conversations: @klzgrad proposed to integrate chromium network stack into v2ray (or was it another fork?), and @RPRX was pushing back against using CGO. If I find them again I will link them here.

I wonder if it would've been easier to integrate naiveproxy as a separate process into a GUI wrapper like qv2ray or v2rayN, as a generic TLS reverse proxy. But then it would have to be done per-GUI. Then again, there are not that many popular GUIs. naiveproxy is integrated in v2rayN as a separate "core" apparently, but I don't think it can be chained with v2ray.

I started a new project in Rust to play around with a few tls-fingerprinting ideas that can evolve independently of v2ray, but still bundled in a single project/"platform" so at least there are not too many "microservices" flying around.

https://github.com/mmmray/minidialer

I hope that with Rust as the basis, it's easier to link various C libraries. And since at least some parts of it are very generic and not married to specific v2ray transports, it can hopefully find use outside of censorship circumvention and attract contributors from other communities as well. (See also the Future Ideas section)

Maybe the solution all along was to to bundle v2ray + chrome into a docker container, then converting the container back to a binary...

@markpash
Copy link

I hope that with Rust as the basis, it's easier to link various C libraries.

I think regardless of Go or Rust, if we try hard enough, we can static link whatever we want. Just need to configure the linker correctly. It just sucks for massive projects.

@gaukas
Copy link

gaukas commented Mar 30, 2024

we can static link whatever we want

Not without CGO. One reason why we don't like CGO is that all the dependency libraries also need to be built for the target (which involves excessive amount of troubleshooting), and it is not realistic to automate the cross-build on update.

Plus the fact that the interfaces are almost never designed to be C-compatible without explicitly keeping that in mind, thus a C-compatible wrapper for each dependency is usually required.

@klzgrad
Copy link

klzgrad commented Mar 30, 2024

proposed to integrate chromium network stack into...

I asked nekohasekai to integrate Cronet into Go, there was something half to almost done, and he seemed to have lost interest because this was all pain and no gain. Maybe he can explain why this is too much for a Go developer.

I asked v2fly people about remote calling a Cronet process without messing with Go ABI. Haven't seen real interest in it.

I don't know if RPRX did any pushback except his empty fork of naiveproxy.

if we try hard enough, we can static link whatever we want. Just need to configure the linker correctly. It just sucks for massive projects.

If it sucks enough, it becomes intractable. If it's a smaller project CGO may be easier. Chromium's build system and Go's build system are two large beasts with extreme impedance mismatch. First it takes a large amount of klzgrad/naiveproxy@4e5adba special casing just to convince Go/CGO to complete the basic linking of libcronet.a into executable by "translating" the linker flags Chromium uses to link libcronet.a. Then these translated linker flags have to be transformed again into a Go package SagerNet/cronet-go@61c4e93 with hardcoded linker flags so it can be used by other Go packages. Then on each Chromium update or Go update, these two transformations of linker flags could break on any of the tens of platforms/architectures. The linker flags required by Chromium look like this:
// #cgo LDFLAGS: -fuse-ld=lld -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--icf=all -Wl,--color-diagnostics -Wl,-mllvm,-instcombine-lower-dbg-declare=0 -flto=thin -Wl,--thinlto-jobs=all -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy=cache_size=10%:cache_size_bytes=40g:cache_size_files=100000 -Wl,-mllvm,-import-instr-limit=30 -fwhole-program-vtables -m64 -no-canonical-prefixes -rdynamic -Wl,-z,defs -Wl,--as-needed -Wl,--lto-O2 -pie -Wl,--disable-new-dtags -ldl -lpthread -lrt -latomic -lresolv -lm ./build/linux/amd64/libcronet_static.a

@mmmray
Copy link
Author

mmmray commented Mar 30, 2024

I asked v2fly people about remote calling a Cronet process without messing with Go ABI. Haven't seen real interest in it.

I mean, if there is already a separate cronet process and that's acceptable UX, I am not sure that v2fly or v2ray have to do anything for integration. You can re-implement xray's dialer (the JS part) in C, or wrap chromium network stack in a basic HTTP reverse proxy. Then it will work with all of them.

@klzgrad
Copy link

klzgrad commented Mar 30, 2024

re-implement xray's dialer (the JS part) in C, or wrap chromium network stack in a basic HTTP reverse proxy

I think people expect some tighter integration, something more efficient, or it's indistinguishable from just running naiveproxy.exe along with v2ray.exe with the latter being a thin wrapper.

@RPRX
Copy link

RPRX commented Mar 31, 2024

I remember those conversations: @klzgrad proposed to integrate chromium network stack into v2ray (or was it another fork?), and @RPRX was pushing back against using CGO. If I find them again I will link them here.

v2ray/discussion#754 (comment)

I don't know if RPRX did any pushback except his empty fork of naiveproxy.

这个 empty fork 指的是 https://github.com/XTLS/naiveproxy-reality 吧,它是想 给 naiveproxy 的 TLS 加上 REALITY还在等人做

不过,这是件很简单的事情,客户端主要是修改 session id、修改验证证书的逻辑,所需代码很少,就像 给 uTLS 加上 REALITY

This empty fork refers to https://github.com/XTLS/naiveproxy-reality. It's trying to add REALITY to naiveproxy's TLS, and is waiting for someone to do it.

However, it's a very simple thing to do, the client side is mainly to modify the session id, modify the logic of the authentication certificate, the required code is very little, just like add REALITY to uTLS

@RPRX
Copy link

RPRX commented Mar 31, 2024

此外:
However:

XTLS/Xray-core#1794 (comment)

现有 TLS 代理的一个小特征是,总是只和主域名连接,而不像浏览器一样加载相关资源,即使是对主域名的流量特征也不像浏览器

而根据我用 WSS 的 经历,GFW 已经会结合你最近的行为去判断你的这个请求是否合理,所以加料之一就是 同时自动加载网页

但现在白名单的趋势越来越明显,以后只用原版浏览器可能不行了

One of the minor features of existing TLS proxies is that they always connect only to the main domain and don't load the relevant resources like a browser, even though the traffic profile to the main domain is not browser-like

And according to my experience with WSS, GFW already combines your recent behavior to determine whether this request of yours is reasonable or not, so one of the add-ons is simultaneously auto-loading the pages

But now the trend of whitelisting is getting more and more obvious, it may not work in the future with just the original browser

XTLS/Xray-core#2219 (comment)

我还是说一下吧,如果 仅观测 单个连接对于绝大多数网站,我用 WireShark 看过 uTLS 的 Chrome 指纹是没问题的,但是:

1. 若实时干涉,有很多方法能让 uTLS 露馅,但这对连接可能是破坏性的

2. 若观测多个连接,现在我们对 uTLS 的使用有一些统计特征,比如说,部分网站会发 Sesssion Ticket,现在我们当没看见
   不过(好像是)RFC 8446 也说了这个机制会导致观测者关联不同的连接,所以 Chrome 是仅用一次,需要我们模仿

3. 对于极少部分网站,比如 dl.google.com,Chrome 的 Client Finished 会附加额外信息,而 uTLS 的相关机制并不完善

I'll go ahead and say it, if Observation Only Single Connection, For the vast majority of sites, I've looked at Chrome fingerprints with uTLS using WireShark and it's fine, but:

  1. if interfered with in real time, there are many ways to expose uTLS, but it can be destructive to the connection
  2. if observing multiple connections, we now have some statistical characterization of the use of uTLS, e.g., some sites issue Sesssion Tickets, which we don't see right now, but (I think it's) RFC 8446 also says that this mechanism causes the observer to associate a different connection, so Chrome is using it only once, and we need to mimic that.
  3. for a few sites, such as dl.google.com, Chrome's Client Finished appends additional information, and the uTLS mechanism for doing so is not well established

@mmmray
Copy link
Author

mmmray commented Mar 31, 2024

现有 TLS 代理的一个小特征是,总是只和主域名连接,而不像浏览器一样加载相关资源,即使是对主域名的流量特征也不像浏览器

A small feature of the existing TLS proxy is that it always only connects to the main domain name, rather than loading related resources like a browser. Even the traffic characteristics of the main domain name are not like those of a browser.

I think this traffic characteristic could be solved if the browser dialer was reversed. Instead of having the browser open localhost:3000 to launch the dialer to connect to server example.com, the user opens https://example.com/#localhost:3000 and let the dialer connect from the server website to the local xray. This:

  • forces the user to open the webpage genuinely
  • maybe allows the browser to reuse connections
  • allows the server to update the JS dialer code without forcing the user to update xray, or change protocols from websocket to something proprietary

I am also optimistic that browser dialers are not yet a dead end with regard to performance, even if they are JS. I get 1 Mbps download speed with the one from xray, but minidialer can do 300 Mbps (saturating bandwidth). I am hoping for somebody to reproduce those results.

@mmmray
Copy link
Author

mmmray commented Apr 13, 2024

continuing some xray-specific questions here: XTLS/Xray-core#3263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants