Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Prevent pods form binding to ztunnel ports #49078

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yuval-k
Copy link
Contributor

@yuval-k yuval-k commented Jan 30, 2024

The idea is that if the ztunnel bind to its ports on a specific cgroup (identitied by its mark) from now own only the ztunnel will be allowd to bind to any of the ztunnel ports in the cgroup.

This way if the ztunnel ever restarts, a pod wouldn't be able to bind to its ports and hijack egress traffic.

This is a draft as tests were not written (though i did manually test this), and there's some more clean-up to do, but wanted to have this early draft out to get feedback before continuing.

The core logic is in "cni/pkg/ebpf/block_ztunnel_ports.bpf.c"

The idea is that if the ztunnel bind to its ports on a specific cgroup
from now own only the ztunnel will be allowd to bind to any of the
ztunnel ports in the cgroup.
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jan 30, 2024
@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-policy-bot istio-policy-bot added the area/ambient Issues related to ambient mesh label Jan 30, 2024
@istio-testing istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 30, 2024
@yuval-k
Copy link
Contributor Author

yuval-k commented Jan 30, 2024

Copy link
Contributor

@bleggett bleggett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly LGTM - adding an eBPF loader back in creates complexity, and (light) kernel reqs, but the binding rejection mechanism will be effective short of node compromise, so sounds good.

We are unlikely to conflict with other eBPF CNIs on this, as well, so I'm not too worried about that.

@linsun
Copy link
Member

linsun commented Jan 30, 2024

Seems a good approach. Requested some reviews from Intel team as well

if (port != ztunnel_dns_port) {
return 1;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also return 1 for other ctx->type?

}
}

is_ztunnel = (sk->mark & ztunnel_mask) == ztunnel_mark;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option: we may use pid of ztunnel or process name to identify if it's ztunnel? Though a sock mark should be good enough to differentiate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the cni agent that installs this, doesn't know the ztunnel's pid, and if ztunnel restarts, the pid changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we may enforce socket option like reuseport inside the bpf hook for ztunnel upgrade case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we want to enforce these here; I think we want the flexibility to set or not set them. because we control the ztunnel I don't think that's a problem.

This code exist because we don't control the application pod's behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, in current implementation, users could not use ztunnel ports anymore within ambient. From user's viewpoint, is it possible to use dynamic ztunnel ports to avoid such conflicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current impl, the user won't be able to use the ztunnel ports only in pods in which ztunnel operates. pods outside the mesh and sidecars won't be effected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem with using dynamic ports, is that current the ztunnel doesn't set the iptables rules. more so, the iptable rules are only set once when the pod is created, and stay there if ztunnel restarts.

Using dynamic ports will mean that the ztunnel will need to manage iptables rules and that a new ztunnel would need to replace the rules with the ports of the old ztunnel atomically. While possible, I'm not sure we want to take on this complexity right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - IIRC this was and is the status-quo with sidecars and envoy well-known ports.

Copy link
Member

@irisdingbj irisdingbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good. However I am wondering how much chances will the pod binds to ztunnel port since ztunnel restart should be quick. And if ztunnel is removed, the related socket has been removed already. Good thing here is this is optional for users to enable though.

@bleggett
Copy link
Contributor

bleggett commented Jan 31, 2024

Changes looks good. However I am wondering how much chances will the pod binds to ztunnel port since ztunnel restart should be quick. And if ztunnel is removed, the related socket has been removed already. Good thing here is this is optional for users to enable though.

Probably not much. The thing this does do, though, is make it easy to prove we can prevent permissioned pods from rewriting iptables rules and silently egressing traffic on other ports with NetworkPolicy - if we can reliably/provably guarantee the well-known ports belong to ztunnel, and nothing else inside the pod (this PR), and we can block all egress not on ztunnel well-known ports with standard host-netns-enforced netpol, we end up with a demonstrably better security model than we had in sidecars, at least - even if we have a privileged container that makes an absolute mess of the pod-internal iptables, a simple egress netpol can mitigate the security-critical effects of that.

That approach won't help with plaintext egress, or ingress, but those are less important to have guarantees around than secured egress.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 2, 2024
@bleggett bleggett added lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed and removed lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while labels Mar 4, 2024
@bleggett
Copy link
Contributor

bleggett commented Mar 4, 2024

I'm marking this staleproof as we do need a solution for this problem, even if for some reason we don't adopt this PR impl

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 6, 2024
@istio-testing
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed needs-rebase Indicates a PR needs to be rebased before being merged size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants