Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP 转发流量场景处理 doq 遇到了奇怪的问题 #935

Open
wen-long opened this issue Feb 3, 2024 · 1 comment
Open

UDP 转发流量场景处理 doq 遇到了奇怪的问题 #935

wen-long opened this issue Feb 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@wen-long
Copy link

wen-long commented Feb 3, 2024

描述问题
部署 UDP 转发加速 hysteria 的场景, 转发 dns over quic 时会导致 dns 请求失败

如何复现
服务器 T 运行 hysteria 服务端(bbr)
服务器 R 运行 UDP 转发到 T , 测试了 gost 和 socat 没有区别(命令 gost -L=udp://:8080/1.1.1.1:8080)
客户端 C 运行启动 hysteria 客户端和 Surge 增强模式

客户端 C 运行 DNS 查询命令如下(dns 客户端使用 https://github.com/natesales/q)
q @quic://dns.nextdns.io A google.com -S --trace --tls-key-log-file=mitmproxy_sslkeylogfile.txt --pmtud=false

预期: q 正常输出
实际: 随机结果, 大概率 DNS 解析失败, 输出 FATA[0005] opening quic session to dns.nextdns.io:853: timeout: no recent network activity

~ q  @quic://dns.nextdns.io A google.com -S --trace --tls-key-log-file=mitmproxy_sslkeylogfile.txt --pmtud=false
DEBU[0000] Server(s): [quic://dns.nextdns.io]
WARN[0000] TLS secret logging enabled
DEBU[0000] Using server dns.nextdns.io:853 with transport quic
DEBU[0000] Using QUIC transport: dns.nextdns.io:853
DEBU[0000] No ALPN tokens specified, using default: "doq"
DEBU[0000] Dialing with QUIC ALPN tokens: [doq]
FATA[0005] opening quic session to dns.nextdns.io:853: timeout: no recent network activity
~ q  @quic://dns.nextdns.io A google.com -S --trace --tls-key-log-file=mitmproxy_sslkeylogfile.txt --pmtud=false
DEBU[0000] Server(s): [quic://dns.nextdns.io]
WARN[0000] TLS secret logging enabled
DEBU[0000] Using server dns.nextdns.io:853 with transport quic
DEBU[0000] Using QUIC transport: dns.nextdns.io:853
DEBU[0000] No ALPN tokens specified, using default: "doq"
DEBU[0000] Dialing with QUIC ALPN tokens: [doq]
FATA[0010] opening quic session to dns.nextdns.io:853: timeout: handshake did not complete in time

极小概率成功

另外,使用 DNS UDP 协议总是成功q @plain://9.9.9.9 A google.com -S, 一般的 TCP 协议也未见异常

客户端抓包对比
UDP 转发的情况1
SCR-20240204-9zy
UDP 转发的情况2
image

使用 snell 协议的情况(正常,作为参照)
SCR-20240204-a3r

日志
均没有异常日志

设备和操作系统
T,R debian 12
C Mac

额外信息
在两组转发+hysteria服务端进行了测试,均发现该问题
怀疑过是 MTU 或者 fastopen 问题, 切换 disablePathMTUDiscovery fastOpen 开关都没有影响结果

@wen-long wen-long added the bug Something isn't working label Feb 3, 2024
@wen-long
Copy link
Author

wen-long commented Feb 4, 2024

应该找到原因了

func (d *Defragger) Feed(m *protocol.UDPMessage) *protocol.UDPMessage {

这里的 Defragger 只有一个, 会要求被分片的包连续到达, 中间穿插了其他 PacketID 的包就会丢掉了

打上合适的日志可以看到

2024/02/04 17:23:54 frag.go:47: m.PacketID = 48970, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:54 frag.go:47: m.PacketID = 8662, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:54 frag.go:47: m.PacketID = 58833, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:54 frag.go:47: m.PacketID = 48970, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:54 frag.go:47: m.PacketID = 8662, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:54 frag.go:47: m.PacketID = 58833, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:54 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 141
2024/02/04 17:23:54 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 37
2024/02/04 17:23:54 frag.go:47: m.PacketID = 40212, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:54 frag.go:47: m.PacketID = 47561, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:54 frag.go:47: m.PacketID = 40212, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:54 frag.go:47: m.PacketID = 47561, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:47: m.PacketID = 15313, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:55 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 1121
2024/02/04 17:23:55 frag.go:47: m.PacketID = 15313, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:66: d.count = 2, d.frags = [1374392740144 1374392739952], m.PacketID = 15313, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 37
2024/02/04 17:23:55 frag.go:47: m.PacketID = 44812, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:55 frag.go:47: m.PacketID = 44812, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:66: d.count = 2, d.frags = [1374391766448 1374391766352], m.PacketID = 44812, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:47: m.PacketID = 27731, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:55 frag.go:47: m.PacketID = 27731, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:66: d.count = 2, d.frags = [1374391766640 1374391766544], m.PacketID = 27731, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:47: m.PacketID = 15733, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:23:55 frag.go:47: m.PacketID = 15733, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:66: d.count = 2, d.frags = [1374393268416 1374393268320], m.PacketID = 15733, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:23:55 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 235
2024/02/04 17:23:55 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 25
2024/02/04 17:23:55 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 72
{"level":"debug","time":1707038635919.616,"msg":"SOCKS5 UDP closed","addr":"127.0.0.1:60362"}


PacketID 48970 8662 58833 这些包应该是被放弃了, 没有走到 all fragments received, assemble, 所以 DOQ 有时候成功有时候失败有时候耗时特别久

修复的话我尝试把放在 map 里面, 日志会如下

2024/02/04 17:26:35 frag.go:47: m.PacketID = 62703, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:26:35 frag.go:47: m.PacketID = 35260, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:26:35 frag.go:47: m.PacketID = 10003, m.FragID = 0, m.FragCount = 2, lenOfm.Data = 1171
2024/02/04 17:26:35 frag.go:47: m.PacketID = 62703, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:26:35 frag.go:66: d.count = 2, d.frags = [1374393267696 1374393267984], m.PacketID = 62703, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:26:35 frag.go:47: m.PacketID = 35260, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:26:35 frag.go:66: d.count = 2, d.frags = [1374393267792 1374393268080], m.PacketID = 35260, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:26:35 frag.go:47: m.PacketID = 10003, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:26:35 frag.go:66: d.count = 2, d.frags = [1374393267888 1374393268176], m.PacketID = 10003, m.FragID = 1, m.FragCount = 2, lenOfm.Data = 81
2024/02/04 17:26:35 udp.go:77: frag 2
2024/02/04 17:26:35 udp.go:77: frag 1
2024/02/04 17:26:35 udp.go:77: frag 1
2024/02/04 17:26:35 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 137
2024/02/04 17:26:35 udp.go:77: frag 1
2024/02/04 17:26:35 udp.go:77: frag 1
2024/02/04 17:26:35 udp.go:77: frag 1
2024/02/04 17:26:35 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 240
2024/02/04 17:26:35 frag.go:47: m.PacketID = 0, m.FragID = 0, m.FragCount = 1, lenOfm.Data = 73
2024/02/04 17:26:35 udp.go:77: frag 1
{"level":"debug","time":1707038795163.9712,"msg":"SOCKS5 UDP closed","addr":"127.0.0.1:60533"}

DOQ 也就不会出现超时失败和耗时久了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant