-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Prevent pods form binding to ztunnel ports #49078
base: master
Are you sure you want to change the base?
Conversation
The idea is that if the ztunnel bind to its ports on a specific cgroup from now own only the ztunnel will be allowd to bind to any of the ztunnel ports in the cgroup.
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly LGTM - adding an eBPF loader back in creates complexity, and (light) kernel reqs, but the binding rejection mechanism will be effective short of node compromise, so sounds good.
We are unlikely to conflict with other eBPF CNIs on this, as well, so I'm not too worried about that.
Seems a good approach. Requested some reviews from Intel team as well |
if (port != ztunnel_dns_port) { | ||
return 1; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also return 1 for other ctx->type?
} | ||
} | ||
|
||
is_ztunnel = (sk->mark & ztunnel_mask) == ztunnel_mark; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option: we may use pid of ztunnel or process name to identify if it's ztunnel? Though a sock mark should be good enough to differentiate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the cni agent that installs this, doesn't know the ztunnel's pid, and if ztunnel restarts, the pid changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we may enforce socket option like reuseport inside the bpf hook for ztunnel upgrade case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we want to enforce these here; I think we want the flexibility to set or not set them. because we control the ztunnel I don't think that's a problem.
This code exist because we don't control the application pod's behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, in current implementation, users could not use ztunnel ports anymore within ambient. From user's viewpoint, is it possible to use dynamic ztunnel ports to avoid such conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the current impl, the user won't be able to use the ztunnel ports only in pods in which ztunnel operates. pods outside the mesh and sidecars won't be effected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem with using dynamic ports, is that current the ztunnel doesn't set the iptables rules. more so, the iptable rules are only set once when the pod is created, and stay there if ztunnel restarts.
Using dynamic ports will mean that the ztunnel will need to manage iptables rules and that a new ztunnel would need to replace the rules with the ports of the old ztunnel atomically. While possible, I'm not sure we want to take on this complexity right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - IIRC this was and is the status-quo with sidecars and envoy well-known ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good. However I am wondering how much chances will the pod binds to ztunnel port since ztunnel restart should be quick. And if ztunnel is removed, the related socket has been removed already. Good thing here is this is optional for users to enable though.
Probably not much. The thing this does do, though, is make it easy to prove we can prevent permissioned pods from rewriting iptables rules and silently egressing traffic on other ports with NetworkPolicy - if we can reliably/provably guarantee the well-known ports belong to ztunnel, and nothing else inside the pod (this PR), and we can block all egress not on ztunnel well-known ports with standard host-netns-enforced netpol, we end up with a demonstrably better security model than we had in sidecars, at least - even if we have a privileged container that makes an absolute mess of the pod-internal iptables, a simple egress netpol can mitigate the security-critical effects of that. That approach won't help with plaintext egress, or ingress, but those are less important to have guarantees around than secured egress. |
I'm marking this staleproof as we do need a solution for this problem, even if for some reason we don't adopt this PR impl |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The idea is that if the ztunnel bind to its ports on a specific cgroup (identitied by its mark) from now own only the ztunnel will be allowd to bind to any of the ztunnel ports in the cgroup.
This way if the ztunnel ever restarts, a pod wouldn't be able to bind to its ports and hijack egress traffic.
This is a draft as tests were not written (though i did manually test this), and there's some more clean-up to do, but wanted to have this early draft out to get feedback before continuing.
The core logic is in "cni/pkg/ebpf/block_ztunnel_ports.bpf.c"