Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active probing weakness found in the Xray implementation of Shadowsocks #625

Open
gfw-report opened this issue Jun 29, 2021 · 56 comments
Open

Comments

@gfw-report
Copy link

Dear Xray-core developers,

@Maker2002 reported that:

I'm using shadowsocks with chacha20-ietf-poly1305 by https://github.com/XTLS/Xray-core.
Only the listening port was blocked 40 mins ago.

While we have no evidence that the blocking is caused by the active probing attack, our initial testing indeed suggests that the Xray implementation of Shadowsocks has active probing weaknesses. These active probing attacks have been previously proposed by Frolov et al. and found being used by the GFW (Alice et al.).

We document how we spotted the weakness because similar active probing weaknesses may be quickly spotted in other parts of the Xray or in other circumvention tools.

First, we get the latest xray binary (v1.4.2):

wget https://github.com/XTLS/Xray-core/releases/download/v1.4.2/Xray-linux-64.zip
unzip Xray-linux-64.zip xray

Second, we save the following configurations to config.json. The server listens on port 12345, and uses aes-128-gcm:

{
    "log": {
        "loglevel": "debug"
    },
    "inbounds": [
        {
            "port": 12345,
            "protocol": "shadowsocks",
            "settings": {
                "clients": [
			{
                        "password": "example_user_1",
                        "method": "aes-128-gcm"
                    }
                ],
                "network": "tcp,udp"
            }
        }
    ],
    "outbounds": [
        {
            "protocol": "freedom"
        }
    ]
}

Third, we started the Xray using the configuration above: ./xray < config.json. One can also open another terminal to monitor the traffic with: sudo tcpdump -i lo port 12345 -v.

Forth, we open another terminal and send random/invalid bytes to its listening port 12345:

python3 -c "print('a' * 215)" | nc -v localhost 12345

The Xray implementation of Shadowsocks using aes-128-gcm exhibits the following fingerprint:

  • when sending more than 214-byte data, the server will close the connection immediately with FIN/ACK.
  • when sending between 209- and 214-byte data, the server will either close the connection immediately with FIN/ACK; or read-forever;
  • when sending less than 209-byte data, the server will read-forever.

The thresholds vary when different encryption methods are used. And the thresholds can be more complex when different encryption methods are used on the same port (this feature is supported by xray).

We understand that Xray has already had some mitigation attempts, like read-forever (https://github.com/XTLS/Xray-core/blob/main/proxy/shadowsocks/protocol.go#L162-L165) and varied drain size (https://github.com/XTLS/Xray-core/blob/main/proxy/shadowsocks/protocol.go#L61-L67). However, it seems that more efforts are required to eliminate these distinguishable fingerprints demonstrated above.

You may find the following links helpful:

@RPRX
Copy link
Member

RPRX commented Jun 29, 2021

解决方案

首先感谢你的工作,这是我们已知的协议边界探测问题,对于这类问题,我的计划是全局 error->drain。此前我发现了现行 VMess AEAD 协议的数个漏洞并报告给了 v2fly 团队,也包括全局 error->drain 的主张:v2fly/v2ray-core#940 (comment)

关于 Shadowsocks 协议

Xray-core 的 Shadowsocks 实现最初继承自 v2ray-core,在此基础上我进行了:

  1. 整体性能优化(done)
  2. FullCone NAT 支持(done)
  3. AEAD 单端口多用户支持(done)
  4. 安全及防探测改进(still in progress)

其中第四点仍在进行中,这也是我在 shadowsocks-org 提出以下主张的原因:

此外,众所周知,我还发现了两个现行 Shadowsocks & AEAD 的漏洞 / 弱点:

总体上,我认为 Shadowsocks 已经是一个漏洞百出的协议。根本上,它缺乏现代的前向安全,亦无法彻底抵御重放攻击。(实际上 Xray-core 的 Shadowsocks 也没有 bloom filter,因为这个方案治标不治本,Victoria Raymond 应该也是看到了这一点。此外,SS 正在研究是否移除 bloom filter:shadowsocks/shadowsocks-rust#556然后,贯穿 Shadowsocks 的“往返流量无关”的设计导致了一大堆本可避免的漏洞,0-RTT 又带来了一堆令人左右为难的副作用。还有不得不提,TCP、UDP 同端口跑未知流量完全可以被精准识别。

Shadowsocks 协议具有历史意义,所以 Xray-core 将继续尽可能兼容它。但长期来看,我倾向于设计一个 Securesocks 作为替代品。

@moranno
Copy link

moranno commented Jun 30, 2021

4.安全及防探测改进

希望能早日看到你的shadowsocks实现中的这方面改进。

@AkinoKaede
Copy link
Contributor

AkinoKaede commented Jul 1, 2021

Dear GFW Report,

Thanks for your work. After a simple code review, I think this problem is caused by my wrong code. I will do some tests and try to fix this problem in my free time.

Yours sincerely,
AkinoKaede

@wc7086
Copy link

wc7086 commented Jul 3, 2021

我的其中一台服务器上使用弱密码的 Shadowsocks-libev 端口没有被封禁,而 Shadowsocks 使用 Xray-core 实现的端口被封禁。 这似乎可以间接证明这个问题的影响。 接下来,我将把所有服务器的 Xray-core 替换为 #629 补丁的版本。

@moranno
Copy link

moranno commented Jul 3, 2021

@RPRX 能否review一下 #629 的PR,如果没问题发一个release就更好了。感谢!

@RPRX
Copy link
Member

RPRX commented Jul 3, 2021

@wc7086 需要更多信息,比如使用程度

@gfw-report 可能是为 SS 加动态增删用户的 API 时原本的“根据第一个用户信息而 drain”的行为被删掉了导致的问题(之前我以为是后面的问题),麻烦测试一下 #629 是否修复了它,谢谢

@maker2002 等待测试

@RPRX
Copy link
Member

RPRX commented Jul 3, 2021

@gfw-report BTW,如果需要的话,我可以为 Xray-core 的 Shadowsocks 和 VMess 协议加一个主动探测行为记录器,字节级时序数据

@gfw-report
Copy link
Author

@gfw-report 可能是为 SS 加动态增删用户的 API 时原本的“根据第一个用户信息而 drain”的行为被删掉了导致的问题(之前我以为是后面的问题),麻烦测试一下 #629 是否修复了它,谢谢

It seems that the problem has not been fixed completely. We will be more than happy to help with more testings.

@gfw-report BTW,如果需要的话,我可以为 Xray-core 的 Shadowsocks 和 VMess 协议加一个主动探测行为记录器,字节级时序数据

That would be great for monitoring the active probings against Xray-core in the future. The following design may be sufficient to alarm us when active probing occurs:

./xray --enable-probe-logging probe.csv --whitelist 1.1.1.1 2.2.2.2/24

where --white-list will take a list of IP addresses or IP ranges. When --enable-probe-logging is passed, it will write any data sent from IP addresses that are not on the whitelist to probe.csv.

The priority of this feature can be wishlist. It helps Xray community to stay alarmed to active probings. Without this feature, we can still capture traffic and analyze the potential probes from pcap files (though it will take longer time for us to find the active probing).

@wc7086
Copy link

wc7086 commented Jul 4, 2021

@wc7086 需要更多信息,比如使用程度

那台服务器每月 300GB 流量都是差不多用完的,使用 xary-core 的 ss 走的流量不会超过当月总流量的十分之一,在 xray-core 的 ss 端口封禁前 443 端口就被封禁了,ss-libev 至少有八个人在使用。在五月之前那台服务器上只有 ss-libev 和 SSH 的端口对外开放,且在同一个端口使用同一密码已经不间断运行将近两年。

It seems that the problem has not been fixed completely. We will be more than happy to help with more testings.

看来我应该停止使用 Xray-core 的 Shadowsocks 实现。

@RPRX
Copy link
Member

RPRX commented Jul 4, 2021

@wc7086 ss-libev 的加密方式是?

@wc7086
Copy link

wc7086 commented Jul 5, 2021

@wc7086 ss-libev 的加密方式是?

都是 Chacha20-ietf-poly1305

@moranno
Copy link

moranno commented Jul 15, 2021

@RPRX 请问关于这个问题有没有修订进展呀?感谢!

@moranno
Copy link

moranno commented Jul 25, 2021

我的其中一台服务器上使用弱密码的 Shadowsocks-libev 端口没有被封禁,而 Shadowsocks 使用 Xray-core 实现的端口被封禁。 这似乎可以间接证明这个问题的影响。 接下来,我将把所有服务器的 Xray-core 替换为 #629 补丁的版本。

使用1.4.2版的xray,shadowsocks,chacha20加密,无混淆,无插件,每日流量3G左右,最近几天每天都被封一个端口。请问你有没有测试使用#629 补丁后的情况怎样呀?

@wc7086
Copy link

wc7086 commented Jul 28, 2021

我的其中一台服务器上使用弱密码的 Shadowsocks-libev 端口没有被封禁,而 Shadowsocks 使用 Xray-core 实现的端口被封禁。 这似乎可以间接证明这个问题的影响。 接下来,我将把所有服务器的 Xray-core 替换为 #629 补丁的版本。

使用1.4.2版的xray,shadowsocks,chacha20加密,无混淆,无插件,每日流量3G左右,最近几天每天都被封一个端口。请问你有没有测试使用#629 补丁后的情况怎样呀?

这个补丁并没有完美解决这个问题,所以我决定不在服务器使用 xray 的 shadowsocks 实现

@cwyin7788
Copy link

cwyin7788 commented Jul 30, 2021

最近在网上找资料,无意中找到这个issue,见到好像说xray的ss有漏洞,有些大哥搭的ss也好像被封禁了,( 禁端口)...

我不是很懂程序码、抓包、侦错之类的,我只是一个搭建服务器fq的一个普通用户。

最近我在xray搭了ss (套warp+haproxy负载均衡),已经稳定运行了5天,处理了43GB多的数据,在haproxy的日志里看到有墙的IP主动探测(不多,可能几个小时两三个),但居然挺了这么多天没有问题,所以我想知道是墙忽然大发慈悲饶了我服务器一命?还是我用的加密墙侦测不到?又会不会这个搭xray+ss的方式误打误撞令墙的侦测失效呢?(这个方式我找遍网上也没见有人这样搭啊)

我不懂看数据包、程序码之类的,如果因为我觉得「误打误撞」令墙的检测失效而让觉位见笑,请见谅,因为不懂,就拿出来讨论一下。

这是我在xray里搭的ss所使用的配置,加密:aes-256-gcm,密码:11位数字,无混淆,无插件,只限tcp,xray 1.42

我是这样搭起ss的:

我有6台服务器(全都是外国vps,没有国内中转),6台都装了xray+ss,配置都一样,都是套了CF warp

我用一台服务器,(Server A),开了一个端口9999,用来挡刀,以haproxy监听9999,配上负载均衡,balance roundrobin,把进入9999的数据,转送给后面6台服务器(包括Server A的localhost)的ss端口60000。

每台服务器的60000端口,都用iptables配置了除了CF WARP的IP 8.0.0.0/8之外,不让任何IP连接进来。

但9999这个端口,就没有添加任何限制,(因为我是动态IP,限制了,我就连自己的客户端也不能从外面连进服务器了) 。

我现在的情况是,我用ss连上Server A:9999, 上网是正常的,而且速度也不慢,开了五天,没有任何干扰和封端口的情况出现,而且确实做到了负载均衡的效果(6个IP轮换)。

螢幕截圖 2021-07-30 下午10 42 01

由于我所有服务器都套了WARP,而且ss端口也只接受warp的IP连进来,所以我每台服务器日志纪录到的客户端的IP都是8.x.x.x,我见到有墙的IP扫描我:(墙的IP我是在xay日志纪录错误的时间再加上haproxy的日志里在该时间段找出不是我自己的IP而得出来的。),见到墙的IP扫描我时,都是出现下面这三个错误:

2021/07/30 15:21:04 8.37.43.15:55436 rejected proxy/shadowsocks: failed to match an user > cipher: message authentication failed

2021/07/30 14:49:44 8.37.43.15:59740 rejected proxy/shadowsocks: failed to read 50 bytes > read tcp xxx.xxx.xxx.xxx:60000->8.37.43.15:59740: i/o timeout

2021/07/30 23:17:22 8.39.127.139:50422 rejected proxy/shadowsocks: failed to read 50 bytes > unexpected EOF

上面这三个错误,分布于6台服务器,6台服务器都有上面的错误,但这种扫描不多,不频密。

我用来挡刀的服务器(Server A)能存活了5天而不死,会不会是我套了WARP的效果,扰乱了墙的主动检测呢?

我说找资料的意思,是想找找网上有没有和我用同一个方案配置出来的ss,如果有,看会不会有「抗封锁」的效果,但我找不到像我这样配置的。

当然我的「抗封锁」可能是我自己的一厢情愿,说不定我的Server A明天就会被封了,但因为不懂,所以才拿了一个傻话题出来讨论一下,再次希望大家不要见笑。

@AkinoKaede
Copy link
Contributor

@cwyin7788 你的做法對抵抗邊界協定探測沒有任何幫助…

@cwyin7788
Copy link

cwyin7788 commented Jul 31, 2021

@cwyin7788 你的做法對抵抗邊界協定探測沒有任何幫助…

谢谢回覆 ,因为我见到有一篇文章说搭ss配白名单IP可以有效阻止墙的主动检测,但因为我是动态IP,又没搭中转,想来想去我就只有想到这个方法了,但我又不知道深入的原理,就上来问问。希望我的ss能继续撑下去吧。

🙏🙏🙏🙏🙏🙏

@moranno
Copy link

moranno commented Aug 18, 2021

我的其中一台服务器上使用弱密码的 Shadowsocks-libev 端口没有被封禁,而 Shadowsocks 使用 Xray-core 实现的端口被封禁。 这似乎可以间接证明这个问题的影响。 接下来,我将把所有服务器的 Xray-core 替换为 #629 补丁的版本。

使用1.4.2版的xray,shadowsocks,chacha20加密,无混淆,无插件,每日流量3G左右,最近几天每天都被封一个端口。请问你有没有测试使用#629 补丁后的情况怎样呀?

在我上个回复的那段时间内,几乎每天都被封端口,每次端口被封就更换IP,而且被封的IP端口总是集中在几台服务器上(这几台服务器可能有固定的用户群体 可能是手机,移动用户居多),后来我看到clash16.1的漏洞:https://github.com/Dreamacro/clash/issues/1468 ,而后提示所有用户升级自己的客户端版本,最近2-3周以来端口被封的概率小了很多,大概1周被封一次(所以我怀疑被探测到与客户端也有关系?)希望我的发现能给 @RPRX @gfw-report @AkinoKaede 一些排查的方向和灵感。
PS: 我使用XrayR作为服务后端,XrayR使用xray core 1.4.2。

@moranno
Copy link

moranno commented Sep 11, 2021

@wc7086 需要更多信息,比如使用程度

@gfw-report 可能是为 SS 加动态增删用户的 API 时原本的“根据第一个用户信息而 drain”的行为被删掉了导致的问题(之前我以为是后面的问题),麻烦测试一下 #629 是否修复了它,谢谢

v2ray-core的shadowsocks实现修订了一些问题,xray这边能否考虑也合并一些更新?@AkinoKaede
https://github.com/v2fly/v2ray-core/commits/master/proxy/shadowsocks

use shadowsocket's bloomring for shadowsocket's replay protection
added shadowsockets iv check for tcp socket
unified drain support for vmess and shadowsockets

再次感谢你们的辛勤付出!

@AkinoKaede
Copy link
Contributor

@moranno,布隆过滤器能够防止重放攻击,但是添加大量的数据之后也会逐渐提升假阳性的概率。目前 shadowsocks-rust 已经移除了布隆过滤器,而 V2Ray 也没有默认启用这个特性。
最重要的一点,目前我没有这个项目任何的权限,能否合并决定权不在我手上。

@moranno
Copy link

moranno commented Sep 12, 2021

最近在网上找资料,无意中找到这个issue,见到好像说xray的ss有漏洞,有些大哥搭的ss也好像被封禁了,( 禁端口)...

我不是很懂程序码、抓包、侦错之类的,我只是一个搭建服务器fq的一个普通用户。

最近我在xray搭了ss (套warp+haproxy负载均衡),已经稳定运行了5天,处理了43GB多的数据,在haproxy的日志里看到有墙的IP主动探测(不多,可能几个小时两三个),但居然挺了这么多天没有问题,所以我想知道是墙忽然大发慈悲饶了我服务器一命?还是我用的加密墙侦测不到?又会不会这个搭xray+ss的方式误打误撞令墙的侦测失效呢?(这个方式我找遍网上也没见有人这样搭啊)

我不懂看数据包、程序码之类的,如果因为我觉得「误打误撞」令墙的检测失效而让觉位见笑,请见谅,因为不懂,就拿出来讨论一下。

这是我在xray里搭的ss所使用的配置,加密:aes-256-gcm,密码:11位数字,无混淆,无插件,只限tcp,xray 1.42

我是这样搭起ss的:

我有6台服务器(全都是外国vps,没有国内中转),6台都装了xray+ss,配置都一样,都是套了CF warp

我用一台服务器,(Server A),开了一个端口9999,用来挡刀,以haproxy监听9999,配上负载均衡,balance roundrobin,把进入9999的数据,转送给后面6台服务器(包括Server A的localhost)的ss端口60000。

每台服务器的60000端口,都用iptables配置了除了CF WARP的IP 8.0.0.0/8之外,不让任何IP连接进来。

但9999这个端口,就没有添加任何限制,(因为我是动态IP,限制了,我就连自己的客户端也不能从外面连进服务器了) 。

我现在的情况是,我用ss连上Server A:9999, 上网是正常的,而且速度也不慢,开了五天,没有任何干扰和封端口的情况出现,而且确实做到了负载均衡的效果(6个IP轮换)。

螢幕截圖 2021-07-30 下午10 42 01

由于我所有服务器都套了WARP,而且ss端口也只接受warp的IP连进来,所以我每台服务器日志纪录到的客户端的IP都是8.x.x.x,我见到有墙的IP扫描我:(墙的IP我是在xay日志纪录错误的时间再加上haproxy的日志里在该时间段找出不是我自己的IP而得出来的。),见到墙的IP扫描我时,都是出现下面这三个错误:

2021/07/30 15:21:04 8.37.43.15:55436 rejected proxy/shadowsocks: failed to match an user > cipher: message authentication failed

2021/07/30 14:49:44 8.37.43.15:59740 rejected proxy/shadowsocks: failed to read 50 bytes > read tcp xxx.xxx.xxx.xxx:60000->8.37.43.15:59740: i/o timeout

2021/07/30 23:17:22 8.39.127.139:50422 rejected proxy/shadowsocks: failed to read 50 bytes > unexpected EOF

上面这三个错误,分布于6台服务器,6台服务器都有上面的错误,但这种扫描不多,不频密。

我用来挡刀的服务器(Server A)能存活了5天而不死,会不会是我套了WARP的效果,扰乱了墙的主动检测呢?

我说找资料的意思,是想找找网上有没有和我用同一个方案配置出来的ss,如果有,看会不会有「抗封锁」的效果,但我找不到像我这样配置的。

当然我的「抗封锁」可能是我自己的一厢情愿,说不定我的Server A明天就会被封了,但因为不懂,所以才拿了一个傻话题出来讨论一下,再次希望大家不要见笑。

请问你使用的这段时间内,有没有被封端口的情况?

@Johnny256Dawson
Copy link

@Johnny256Dawson
Copy link

@moranno xray 内置的ss应该是没有这个问题的,有问题的应该是ss rust

@Johnny256Dawson
Copy link

@Johnny256Dawson 不知道现在有没有修复

@cwyin7788
Copy link

请问你使用的这段时间内,有没有被封端口的情况?

如果我前置机器后面的ss端口都配置只接受WARP的IP (8.x.x.x.)连进来,运行了应该都有一个多月吧,没事,但当我把这个"只限WARP IP连进来"的规则从iptables删除了(即是任何IP都可以连进来),纯粹只为测试,不到一天,前端机器的ss接收端口就被封了,在后置机器再把那条规则"开启"后又没事,现在继续使用那条规则观察中...

但不使用那条规则玩ss我是不敢再试的了

@moranno
Copy link

moranno commented Sep 17, 2021

请问你使用的这段时间内,有没有被封端口的情况?

如果我前置机器后面的ss端口都配置只接受WARP的IP (8.x.x.x.)连进来,运行了应该都有一个多月吧,没事,但当我把这个"只限WARP IP连进来"的规则从iptables删除了(即是任何IP都可以连进来),纯粹只为测试,不到一天,前端机器的ss接收端口就被封了,在后置机器再把那条规则"开启"后又没事,现在继续使用那条规则观察中...

但不使用那条规则玩ss我是不敢再试的了

请问你用的客户端是什么?我没有限制连入IP,也没有出现1天就被封的情况,被封端口的时间很不规律,被封锁情况可以查看我这个回复:#625 (comment)

@cwyin7788
Copy link

cwyin7788 commented Sep 17, 2021

请问你用的客户端是什么?我没有限制连入IP,也没有出现1天就被封的情况,被封端口的时间很不规律,被封锁情况可以查看我这个回复:#625 (comment)

shadowrocket、qx、v2ray-ng、openwrt的ssr+也有,搭了这个出来快有2个月吧,1个月正常,后来把规则拿掉,一天左右就被封了端口,现在把规则加回,运行了快有2个星期了,也没事。

一周被封一次也算是很频密了,建议你使用aes-256-gcm,chacha听说已经有点不可靠了。

shadowsocks/shadowsocks-libev#2829 (comment)

@qzydustin
Copy link

提供一个数据点,之前使用ss-libev,Chacha20-ietf-poly1305,随机密码随机端口,使用了大约半年没问题。最近转xray,ss协议,Chacha20-ietf-poly1305加密,原密码原端口,搭建后用bt来测试稳定性,大约300多连接数,流量不大,一小时后被端口阻断。更换端口,关闭udp,一小时左右新端口阻断。再次更换端口,更改密码,ip被墙。

@AkinoKaede
Copy link
Contributor

@qzydustin Cloud you try to use #629?

@qzydustin
Copy link

@qzydustin Cloud you try to use #629?

I only have one VPS and it was blocked by firewall. Now I am using port forward to solve this problem. Thanks for your work. I cannot test it now and I am so grateful to you. Your work is meaningfull.

@moranno
Copy link

moranno commented Oct 31, 2021

提供一个数据点,之前使用ss-libev,Chacha20-ietf-poly1305,随机密码随机端口,使用了大约半年没问题。最近转xray,ss协议,Chacha20-ietf-poly1305加密,原密码原端口,搭建后用bt来测试稳定性,大约300多连接数,流量不大,一小时后被端口阻断。更换端口,关闭udp,一小时左右新端口阻断。再次更换端口,更改密码,ip被墙。

不知道你是否有条件测试一下v2ray的ss实现?我可提供服务器。

@moranno
Copy link

moranno commented Oct 31, 2021

@qzydustin Cloud you try to use #629?

这个PR: #791 是否对此issue有缓解帮助呢?

@qzydustin
Copy link

提供一个数据点,之前使用ss-libev,Chacha20-ietf-poly1305,随机密码随机端口,使用了大约半年没问题。最近转xray,ss协议,Chacha20-ietf-poly1305加密,原密码原端口,搭建后用bt来测试稳定性,大约300多连接数,流量不大,一小时后被端口阻断。更换端口,关闭udp,一小时左右新端口阻断。再次更换端口,更改密码,ip被墙。

不知道你是否有条件测试一下v2ray的ss实现?我可提供服务器。

需要如何测试呢?

@AkinoKaede
Copy link
Contributor

@qzydustin Cloud you try to use #629?

这个PR: #791 是否对此issue有缓解帮助呢?

No.

@qzydustin
Copy link

@qzydustin Cloud you try to use #629?

这个PR: #791 是否对此issue有缓解帮助呢?

我之前使用过outline,没有被墙,就是outline配置是随机生成的,无法修改。可能outline解决了这个问题?

@moranno
Copy link

moranno commented Oct 31, 2021

提供一个数据点,之前使用ss-libev,Chacha20-ietf-poly1305,随机密码随机端口,使用了大约半年没问题。最近转xray,ss协议,Chacha20-ietf-poly1305加密,原密码原端口,搭建后用bt来测试稳定性,大约300多连接数,流量不大,一小时后被端口阻断。更换端口,关闭udp,一小时左右新端口阻断。再次更换端口,更改密码,ip被墙。

不知道你是否有条件测试一下v2ray的ss实现?我可提供服务器。

需要如何测试呢?

才发现v2ray的ss实现并不支持单端口多用户,似乎也没有测试的必要了,估计引用outline的代码合并到xray中可能是最好的解决方式? @AkinoKaede 因为你说到:

For the first problem that GFW Report reported, Xray need to read 50 bytes to authenticate the users, it is caused by the multi-user feature, and I don't have a good idea to resolve it.

Outline也是单端口多用户,outline是如何避免这个问题的呢?还是outline也有这个问题?

For the second problem, Xray will wait indefinitely until read a random size if the connection is invalid. It was design to avoid probing weakness, and it called Drain. The seed of random size is decided by the users' key and cipher or the timestamp when server received the first connection.

关于Outline的防probeing和Replay Defenses: https://github.com/Jigsaw-Code/outline-ss-server/blob/master/service/PROBES.md

这个方案有没有可能实现呢?

@moranno
Copy link

moranno commented Nov 1, 2021

可喜可贺,#629 终于被合并了,@gfw-report Could you please test it again if the problem has been solved?

@AkinoKaede
Copy link
Contributor

Outline need to read 50 bytes to authenticate the users, too.

https://github.com/Jigsaw-Code/outline-ss-server/blob/046dbd43cc5e06699297fd5920edc0f448d76b49/service/tcp.go#L60-L72

@gfw-report
Copy link
Author

gfw-report commented Nov 1, 2021

可喜可贺,#629 终于被合并了,@gfw-report Could you please test it again if the problems has been solved?

Thank you @AkinoKaede for spending time and efforts trying to fix this problem.
And thank you @moranno for reporting your servers' status and following on this issue.

Our testing shows that the problem has not been fixed completely as of this commit 63d0cb1. Specifically, the server still reacts inconsistently as reported in #629 (comment). Below is how we tested it and you can try to reproduce it yourself:

Open the first terminal to build and run Xray. The config.json is the same as in #625 (comment):

git clone https://github.com/XTLS/Xray-core.git
cd Xray-core
go build -o xray -trimpath -ldflags "-s -w -buildid=" ./main
./xray < config.json

Open the second terminal to capture traffic:

sudo tcpdump -i lo port 12345

Open the third terminal to send our own probes:

Case 1: After receiving 1 byte of invalid data, the server will wait for 60 seconds and then timeout by sending a FIN+ACK to close the connection.

(python3 -c "print('a' * 1, end='')"; cat) | ncat -v localhost 12345

The server log shows:

[::1]:45826 rejected  proxy/shadowsocks: failed to read 50 bytes > read tcp [::1]:12345->[::1]:45826: i/o timeout
[Info] [2353414061] app/proxyman/inbound: connection ends > proxy/shadowsocks: failed to create request from: [::1]:45826 > proxy/shadowsocks: failed to read 50 bytes > read tcp [::1]:12345->[::1]:45826: i/o timeout

Case 2: After receiving 50 byte of invalid data, the server will wait for 60 seconds and then timeout by sending a FIN+ACK to close the connection:

(python3 -c "print('a' * 50, end='')"; cat) | ncat -v localhost 12345

The server log shows:

[Info] [1350332059] app/proxyman/inbound: connection ends > proxy/shadowsocks: failed to create request from: [::1]:45856 > proxy/shadowsocks: failed to match an user > proxy/shadowsocks: Not Found
[::1]:45856 rejected  proxy/shadowsocks: failed to match an user > proxy/shadowsocks: Not Found

Note that in case 1 and 2, sending more bytes in the following 60 seconds will not refresh server's timeout value. This is good because it will not cause new active probing attack based on refreshed timeout value.

Case 3: When receiving 500 bytes, the server will, with some probabilities, either 1) close the connection immediately with a FIN+ACK; or 2) wait until the 60-second timeout:

(python3 -c "print('a' * 500, end='')"; cat) | ncat -v localhost 12345

In both cases, the server log is the same as follows:

[Info] [3382172859] app/proxyman/inbound: connection ends > proxy/shadowsocks: failed to create request from: [::1]:45864 > proxy/shadowsocks: failed to match an user > proxy/shadowsocks: Not Found
[::1]:45864 rejected  proxy/shadowsocks: failed to match an user > proxy/shadowsocks: Not Found

Don't get discouraged by not having it fixed completely yet, @AkinoKaede: the problem can be hard to fix and you've already done a great job on changing the active probing fingerprints.

We would suggest having a new release. This is because, although the active probing vulnerability is not fixed completely yet, it usually takes time for the censor to adapt to the new fingerprint. We will thus gain more time to have a complete fix. What do you think, @yuhan6665 ?

@yuhan6665
Copy link
Member

@gfw-report Thanks for your testing and detailed explanation. Will continue learning and try some work on the issue. Waiting release from @badO1a5A90.

@AkinoKaede
Copy link
Contributor

Hmm, I think the timeout is caused by default policy config.

@gfw-report
Copy link
Author

Hmm, I think the timeout is caused by default policy config.

You are probably right that the timeout is controlled by the config.

The timeout in case 1 and 2 is not the problem. In fact, our goal is to let the server always timeout when receiving a probe.

The problem is the server will behave inconsistently in Case 3. We don't want the server to sometimes close the connections immediately, while sometimes timeout. We want the server always timeout.

@AkinoKaede
Copy link
Contributor

Hmm, I think the timeout is caused by default policy config.

You are probably right that the timeout is controlled by the config.

The timeout in case 1 and 2 is not the problem. In fact, our goal is to let the server always timeout when receiving a probe.

The problem is the server will behave inconsistently in Case 3. We don't want the server to sometimes close the connections immediately, while sometimes timeout. We want the server always timeout.

Thanks. Although I think this is the expected result, I try to make the server always timeout.

https://github.com/AkinoKaede/Xray-core/tree/feat-shadowsocks-unlimited-drain

@AkinoKaede
Copy link
Contributor

Hmm, I think the timeout is caused by default policy config.

You are probably right that the timeout is controlled by the config.
The timeout in case 1 and 2 is not the problem. In fact, our goal is to let the server always timeout when receiving a probe.
The problem is the server will behave inconsistently in Case 3. We don't want the server to sometimes close the connections immediately, while sometimes timeout. We want the server always timeout.

Thanks. Although I think this is the expected result, I try to make the server always timeout.

https://github.com/AkinoKaede/Xray-core/tree/feat-shadowsocks-unlimited-drain

I tested it on my computer and I think it works.

@gfw-report
Copy link
Author

I tested it on my computer and I think it works.

We confirm that, as of this commit https://github.com/AkinoKaede/Xray-core/commit/c136ff0bf554c6d7ad9b0f8f6e06ea4783b51529, the server will always timeout when receiving random probes with varied length. In specific, we tested by sending random probes with length varied from 1 byte to 1500 bytes to the server, and the server sent a FIN/ACK to close each connection after 60 seconds.

Thank you so much for your time and effort, @AkinoKaede! You did an awesome contribution to the community!

@yuhan6665
Copy link
Member

@AkinoKaede thanks for your work! I'd like you merge you code but I have some questions
https://github.com/AkinoKaede/Xray-core/commit/c136ff0bf554c6d7ad9b0f8f6e06ea4783b51529#diff-127a1c1c69013eee12f494919915940d9819438f9a9329df0ed42ad31650bdebR122

It seems your new drain method doesn't have a end by itself. Will this approach open other attack opportunities or could be just Denial-of-Service by simply create many connections with random characters?
The other question is why VMESS drain doesn't have the same vulnerability for probing? I do apologize if my questions are silly :)

@AkinoKaede
Copy link
Contributor

@AkinoKaede thanks for your work! I'd like you merge you code but I have some questions AkinoKaede@c136ff0#diff-127a1c1c69013eee12f494919915940d9819438f9a9329df0ed42ad31650bdebR122

It seems your new drain method doesn't have a end by itself. Will this approach open other attack opportunities or could be just Denial-of-Service by simply create many connections with random characters? The other question is why VMESS drain doesn't have the same vulnerability for probing? I do apologize if my questions are silly :)

In fact, VMess has the same problem.

@yuhan6665
Copy link
Member

I did some tests with xray and v2fly using the method mentioned above

Xray - 63d0cb1 - VMESS

  • 1 ~ 900 bytes: 60 seconds
  • 920: 60 seconds OR close immediately
  • over 930 bytes: close immediately

Xray - 63d0cb1 - shadowsocks

  • 1, 50: 60 seconds
  • 500: 60 seconds OR close immediately
  • over 530 bytes: close immediately

v2fly v4.43.0 - VMESS

  • 1 ~ 900 bytes: 60 seconds
  • 920: 60 seconds OR close immediately
  • over 930 bytes: close immediately

v2fly v4.43.0 - shadowsocks

  • 1 ~ 1030: 60 seconds
  • 1050: 60 seconds OR close immediately
  • over 1070 bytes: close immediately

Does it mean we have broad problems with xray/v2fly's drain logic? @gfw-report @AkinoKaede

@AkinoKaede
Copy link
Contributor

AkinoKaede commented Nov 6, 2021

So I said it is expected result, I think the edge of close connections and wait is random, that is the the purpose of design drain.

Certainly, read-forever may be a better method.

@moranno
Copy link

moranno commented Nov 6, 2021

Certainly, read-forever may be a better method.

Could you update your code to read-forever? Thanks for your extraordinary work!

@moranno
Copy link

moranno commented Nov 8, 2021

Hmm, I think the timeout is caused by default policy config.

You are probably right that the timeout is controlled by the config.
The timeout in case 1 and 2 is not the problem. In fact, our goal is to let the server always timeout when receiving a probe.
The problem is the server will behave inconsistently in Case 3. We don't want the server to sometimes close the connections immediately, while sometimes timeout. We want the server always timeout.

Thanks. Although I think this is the expected result, I try to make the server always timeout.
https://github.com/AkinoKaede/Xray-core/tree/feat-shadowsocks-unlimited-drain

I tested it on my computer and I think it works.

Hi @AkinoKaede , could you PR this fix, thanks!

@AkinoKaede
Copy link
Contributor

Hmm, I think the timeout is caused by default policy config.

You are probably right that the timeout is controlled by the config.
The timeout in case 1 and 2 is not the problem. In fact, our goal is to let the server always timeout when receiving a probe.
The problem is the server will behave inconsistently in Case 3. We don't want the server to sometimes close the connections immediately, while sometimes timeout. We want the server always timeout.

Thanks. Although I think this is the expected result, I try to make the server always timeout.
https://github.com/AkinoKaede/Xray-core/tree/feat-shadowsocks-unlimited-drain

I tested it on my computer and I think it works.

Hi @AkinoKaede , could you PR this fix, thanks!

This requires the maintainer to decide whether to modify the code.

@Fangliding
Copy link
Member

Our archaeological team has discovered this relic. May I ask if the ancient cache of technology in this relic are applicable to modern stuff

@RPRX
Copy link
Member

RPRX commented Apr 21, 2024

Our archaeological team has discovered this relic. May I ask if the ancient cache of technology in this relic are applicable to modern stuff

曾经和 @yuhan6665 讨论过要不要把全加密协议的抗主动探测措施全下掉,毕竟现在 GFW 已经在靠检测流量一封一个准了

而且全随机数是全加密协议的固有特征,没得蹦跶,不像 TLS 还能在包长度和时序特征上做做文章,所以回落机制是有用的

@gfw-report
Copy link
Author

gfw-report commented Apr 22, 2024

Hi @Fangliding, thank you for reporting this issue. You did a great job on testing and discovering it.


Hi @RPRX,

曾经和 @yuhan6665 讨论过要不要把全加密协议的抗主动探测措施全下掉,毕竟现在 GFW 已经在靠检测流量一封一个准了

I agree with you that the defense against active probing is still necessary. As emphasized by Wu et al. (https://gfw.report/publications/usenixsecurity23/zh/#sec:active-probing):

我们想强调的是,这一发现并不意味着对主动探测的防御没有必要或不再重要 [34,5,9]。恰恰相反,我们认为GFW对纯被动流量分析的依赖,部分原因是Shadowsocks、Outline、VMess和其他许多翻墙软件已经对主动探测采取了有效的防御措施 [34、5、9、19、43、32、71]。GFW仍然向服务器发送主动探测这一事实,意味着审查者仍然试图使用主动探测,尽可能准确地识别翻墙服务器。 >

I'm curious that by saying "fall back", what type of services/logic do you have in mind to handle the active probing?

What do you think of the prior conclusion by Frolov et al. that using "reading forever" as the reaction to any active probing as it is "[t]he most popular behavior" for hosts on the Internet? (See Fig. 13: https://censorbib.nymity.ch/pdf/Frolov2020a.pdf#page=11)

而且全随机数是全加密协议的固有特征,没得蹦跶,不像 TLS 还能在包长度和时序特征上做做文章,所以回落机制是有用的

Speaking of the high entropy of the fully encrypted traffic, what do you think of a design that lowers the entropy like this patch (post and patch)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants