Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Pod loops back to itself through the Service and cilium currently only supports IPv4 ? #2007

Open
weizhoublue opened this issue Jul 6, 2023 · 15 comments
Assignees
Labels
enhancement source codes enhancement kind/feature

Comments

@weizhoublue
Copy link
Collaborator

action url: https://github.com/spidernet-io/spiderpool/actions/runs/5472017414

@ty-dc
Copy link
Collaborator

ty-dc commented Jul 6, 2023

When the default cni is cilium, the macvlan IP of the pod cannot be accessed in the host.

https://github.com/spidernet-io/cni-plugins/blob/00af686014ae889f1730b13e3a2c3ae2f1eea0c5/test/e2e/common/const.go#L56

/home/ty-test# kubectl exec -ti spiderdoctor-agent-gthhr -n kube-system -- sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 6e:21:3a:70:81:d1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.40.23/16 brd 172.18.255.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793:f::97/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::6c21:3aff:fe70:81d1/64 scope link
       valid_lft forever preferred_lft forever
26: eth0@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ba:65:7c:7b:dd:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.64.78/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd00:10:244::21/128 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::b865:7cff:fe7b:ddf6/64 scope link
       valid_lft forever preferred_lft forever
# exit
root@cyclinder3:/home/ty-test# docker exec -ti 3bec3d6c8287 bash
root@ty-spider-worker:/# ping 172.18.40.23
PING 172.18.40.23 (172.18.40.23) 56(84) bytes of data.
^C
--- 172.18.40.23 ping statistics ---
81 packets transmitted, 0 received, 100% packet loss, time 81898ms

root@ty-spider-worker:/# ping 172.18.40.23 -c 2
PING 172.18.40.23 (172.18.40.23) 56(84) bytes of data.

在实际调试过程中,不论 kdoctor 检测成功或者失败,在主机上访问 pod 的 macvlan IP 都会失败。

image

@weizhoublue
Copy link
Collaborator Author

node request -> pod responce -> pod lxc ( ebpf drop )

I think this is dropped by cilium

@weizhoublue
Copy link
Collaborator Author

这个优先级不高, overlay 场景下, 主机 并不需要 访问 underlay ip, 主机只会用 cilium pod ip 来做健康检查

@cyclinder
Copy link
Collaborator

@weizhoublue
Copy link
Collaborator Author

weizhoublue commented Jul 10, 2023

这个 用例失败,不是 因为 node要直接访问 pod underlay IP 的 原因 把 ? 是 两个 跨主机的 pod 的 underlay ip 不通 ?

我猜是 macvlan 不能去用 cilium 管理的 eth0,而是要单独 用一张卡,这样,cilium 的 过滤规则就 管控不到了 ( 也许 需要 设置 cilium 的 helm device 选项)

@ty-dc
Copy link
Collaborator

ty-dc commented Jul 10, 2023

两个 跨主机的 pod 的 underlay ip 可以通?

/home/ty-test/spiderpool# kubectl get po -n kube-system |grep spiderdoctor
spiderdoctor-agent-62jjz                          1/1     Running     0          5m57s
spiderdoctor-agent-6xxdt                          1/1     Running     0          5m57s

/home/ty-test/spiderpool# kubectl exec -ti spiderdoctor-agent-62jjz -n kube-system -- ip a show net1
2: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether c6:f9:2b:d9:30:3c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.40.121/16 brd 172.18.255.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793:f::a2/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::c4f9:2bff:fed9:303c/64 scope link
       valid_lft forever preferred_lft forever

kubectl exec -ti spiderdoctor-agent-6xxdt  -n kube-system -- ip a show net1
2: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 7a:f5:ed:5b:b1:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.40.146/16 brd 172.18.255.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793:f::57/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::78f5:edff:fe5b:b106/64 scope link
       valid_lft forever preferred_lft forever

/home/ty-test/spiderpool# kubectl exec -ti spiderdoctor-agent-62jjz -n kube-system -- ping 172.18.40.146 -c 2
PING 172.18.40.146 (172.18.40.146) 56(84) bytes of data.
64 bytes from 172.18.40.146: icmp_seq=1 ttl=64 time=0.083 ms
64 bytes from 172.18.40.146: icmp_seq=2 ttl=64 time=0.103 ms

--- 172.18.40.146 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.083/0.093/0.103/0.010 ms

spiderdoctor 测试时,关掉这个 taget ,应该就好了?

  success:
    meanAccessDelayInMs: 15000
    successRate: 1
  target:
    targetAgent:
      testClusterIp: true
      testEndpoint: true
      testIPv4: true
      testIPv6: true
      testIngress: false
      testLoadBalancer: false
      testMultusInterface: false
      testNodePort: true

@cyclinder
Copy link
Collaborator

可能是testNodePort失败导致的,在macvlan-underlay模式下访问NodePort是有已知问题的
spidernet-io/cni-plugins#142

@weizhoublue
Copy link
Collaborator Author

可能是testNodePort失败导致的,在macvlan-underlay模式下访问NodePort是有已知问题的
spidernet-io/cni-plugins#142

这是个 overlay mode 用例,为什么 和 macvlan-underlay模式 有关系

@weizhoublue
Copy link
Collaborator Author

没有有 做根因 分析 ? 不能 失败什么 关什么

@cyclinder
Copy link
Collaborator

在看,v6的clusterip 访问有点问题

@cyclinder
Copy link
Collaborator

cyclinder commented Jul 10, 2023

这个环境cilium pod访问svc(ep为自己)不通

root@cyclinder3:~/cyclinder# kubectl  get po -o wide
NAME                        READY   STATUS    RESTARTS   AGE    IP             NODE                      NOMINATED NODE   READINESS GATES
test-6ccdcc86df-g78dn       1/1     Running   0          3m4s   10.244.65.2    ty-spider-control-plane   <none>           <none>
test-6ccdcc86df-vvgz6       1/1     Running   0          4m9s   10.244.64.25   ty-spider-worker          <none>           <none>
test-pod-85c445cb44-9nnv9   1/1     Running   0          4d     172.18.40.38   ty-spider-worker          <none>           <none>
root@cyclinder3:~/cyclinder# kubectl  describe svc test-ipv6
Name:                     test-ipv6
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=test
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv6
IP:                       fd00:10:233::e5f
IPs:                      fd00:10:233::e5f
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  32686/TCP
Endpoints:                [fd00:10:244::128]:80,[fd00:10:244::70]:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
^Ccommand terminated with exit code 130
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
^Ccommand terminated with exit code 130
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
{"clientIp":"[fd00:10:244::128]:42540","otherDetail":{"/spiderdoctoragent":"route to print request"},"requestHeader":{"Accept":" */* ","User-Agent":" curl/7.81.0 "},"requestUrl":"/","serverName":"test-6ccdcc86df-vvgz6"}
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
^Ccommand terminated with exit code 130
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:4a:3d:eb:1e:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.65.2/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd00:10:244::128/128 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::204a:3dff:feeb:1e7f/64 scope link
       valid_lft forever preferred_lft forever
root@ty-spider-control-plane:/home/cilium# cilium monitor --from 541
Listening for events on 16 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
-> endpoint 541 flow 0xb522b447 , identity 13262->13262 state new ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0x8f414a02 , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0xcf3d762b , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0xcf988979 , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0x8aa61064 , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0xa34f043f , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN

root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@test-6ccdcc86df-g78dn:/# curl [fd00:10:233::e5f]:80
curl: (28) Failed to connect to fd00:10:233::e5f port 80 after 129673 ms: Connection timed out
root@test-6ccdcc86df-g78dn:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:4a:3d:eb:1e:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.65.2/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd00:10:244::128/128 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::204a:3dff:feeb:1e7f/64 scope link
       valid_lft forever preferred_lft forever

@cyclinder
Copy link
Collaborator

This may be a cilium bug, and until we confirm this issue, I recommend setting testIPv6 to false.

cilium/cilium#26733

@ty-dc
Copy link
Collaborator

ty-dc commented Jul 10, 2023

This may be a cilium bug, and until we confirm this issue, I recommend setting testIPv6 to false.

#2015

@weizhoublue
Copy link
Collaborator Author

from julianwiedmann:
that sounds like a feature (pod looping back to itself through a Service) that is currently only supported for IPv4.

@cyclinder
Copy link
Collaborator

  success:
    meanAccessDelayInMs: 15000
    successRate: 1
  target:
    targetAgent:
      testClusterIp: true
      testEndpoint: true
      testIPv4: true
      testIPv6: false
      testIngress: false
      testLoadBalancer: false
      testMultusInterface: true
      testNodePort: true

testMultusInterface is works, so we should set it to true, only disable testIPv6

@ty-dc ty-dc changed the title Night CI 2023-07-06: Failed The Pod loops back to itself through the Service, and cilium currently only supports IPv4 ? Jul 11, 2023
@ty-dc ty-dc changed the title The Pod loops back to itself through the Service, and cilium currently only supports IPv4 ? The Pod loops back to itself through the Service and cilium currently only supports IPv4 ? Jul 11, 2023
@ty-dc ty-dc added enhancement source codes enhancement and removed kind/bug labels Oct 27, 2023
@Icarus9913 Icarus9913 removed their assignment Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement source codes enhancement kind/feature
Projects
None yet
Development

No branches or pull requests

4 participants