New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid SNAT when a pod contacts another pod in host-network #3474
Comments
thanks for the great analysis, we discussed about this in sig-network here https://groups.google.com/g/kubernetes-sig-network/c/m6lwTjKLV8o/m/lnir_lqECwAJ kindnet is very simple and is an internal detail of kind, so it simplifies to avoid masquerading only for the pod subnets that are the ones we are 100% we don't want to masquerade For the nodes IPs yes, is perfectly fine also to non-masquerade at all, but why is this a problem for you? we don't expect anybody to build network solutions on top of kindnet, that is why we have the |
Hi @aojea, thanks for your answer. Our solution points to running upon the cluster CNI. Long story short, we create a geneve tunnel from each node targeting a pod called "gateway" which uses wireguard to connect to another cluster's "gateway". To do it, we have a daemonset running in host-network on each node, it creates a geneve interface using as a remote endpoint the "gateway" IP. In the "gateway" we have a geneve interface connected to each node (they use as a remote endpoint the IP of the nodes). When the traffic goes from the "gateway" to the node, the geneve interface on the node receives the encapsulated traffic with the node IP (the one where the gateway is scheduled) as source IP, which is different from the IP used as remote endpoint inside the node (it is a pod IP) |
Yes, but kindnet is only used in kind, is not something you are going to run your solution on top, you are going to run on calico, cilium, ... you can install those in kind too |
I know we can use other CNIs with kind, but using kindnet would be more convenient for development. However, if you don't think this change would be useful to the community, we can adapt. Otherwise, if it were useful, we would take care of doing the PR with the changes. |
It is important that kindnetd remains a very simple and lightweight default, we're only like to consider the behavior a bug if it doesn't meet Kubernetes conformance requirements and generally not taking feature requests here, because again it's intended to be extremely simple and lightweight but conformant. There's an external forked copy (well forked back to where it started https://github.com/aojea/kindnet), but the OOTB default is not accepting non-critical features. kindnetd is pretty 1:1 with what we've test Kubernetes on GCE with historically, so I'm additionally hesitant to alter the behavior without proof that we're violating SIG Network's expectations. Further: KIND is intended to help a) test kubernetes itself b) help users test Kubernetes applications, and while a) takes priority it doesn't seem this helps with a) and for b) it's detrimental to "help" depend on non-conformant cluster expectations. To the extent possible it should be true that if something works on kind it should work on all conformant clusters. |
we can accept patches there , |
The problem
When a pod pings a pod in host-network on another node, the received packet has the node IP as the source IP instead of the pod IP
Let's make an example:
I have a setup with 2 worker nodes and 3 pods:
IMPORTANT: pod1 is in host-network
When pod2 pings pod1 (
ping -c1 172.21.0.5
), pod1 receives this packet:The received ICMP request has 10.112.1.8 as the source IP, which is the pod2 IP
If we repeat the same test with pod3 and pod1, the results are:
The packet source IP is not the pod one, but the node where the pod is scheduled:
Why this should be a problem.
If 2 pods (one in host-network) need to receive packets with the same source IP used to contact the "other" pod this should be a problem.
In particular, I work for the open source project liqo, and we are developing a modular multicluster network solution. We decided to use geneve to create tunnels between nodes (pods in host-network) and a gateway (a common pod used to reach a remote cluster).
Geneve is working with all the major CNIs (cilium, calico, flannel), but not with kindnet. The cause is the problem I exposed upon.
A possible solution
I think the cause of the problem is this iptables chain contained inside the kind node.
Should be possible to include a RETURN rule for the packets with nodeCIDR as a destination or it should cause some trouble?
The text was updated successfully, but these errors were encountered: