Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath,endpoint: explicitly remove TC filters during endpoint teardown #32167

Merged
merged 1 commit into from Apr 30, 2024

Conversation

ti-mo
Copy link
Contributor

@ti-mo ti-mo commented Apr 24, 2024

Prior to this commit, we left it up to the kernel to clean up tc attachments when the CNI finally removes the veth when a Pod goes away. This leaves a window of time where an endpoint's tc programs can potentially be invoked after the endpoint's internal tail call maps have already been cleared and the endpoint has been removed from the endpoint map and ipcache, resulting in undefined behaviour.

This patch clearly defines the endpoint teardown sequence as follows:

  • remove (endpoint) routes
  • detach tc(x) hooks
  • remove endpoint from endpoint map
  • remove endpoint policy program(s)
  • delete conntrack map pins
  • remove policy prog array map pin
  • remove internal tail call map pin
  • remove custom calls map pin

This puts the agent more in control of the teardown sequence and will allow us to reason better about failures related to missed tail calls and other flakes.

Accurately manage the teardown sequence of an Endpoint's BPF resources

cc @julianwiedmann

@ti-mo ti-mo added sig/loader Impacts the loading of BPF programs into the kernel. area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/misc This PR makes changes that have no direct user impact. labels Apr 24, 2024
@ti-mo ti-mo requested review from a team as code owners April 24, 2024 13:49
@ti-mo ti-mo requested review from aditighag and rgo3 April 24, 2024 13:49
@ti-mo
Copy link
Contributor Author

ti-mo commented Apr 24, 2024

/test

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly remove TC filters during endpoint teardown

Makes sense along with the inline comments. However, consider defining a tear down flow in the code. One option could be defining states for steps that need to happen in a certain order + unit test.

pkg/endpoint/bpf.go Show resolved Hide resolved
pkg/endpoint/bpf.go Show resolved Hide resolved
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Remove the policy tail call entry for the endpoint. This will disable
// policy evaluation for the endpoint and will result in missing tail calls if
// e.g. bpf_host or bpf_overlay call into the endpoint's policy program.

Thinking about DROP_MISSED_TAIL_CALL errors for in-flight packets, I wonder if there is a way to differentiate the race conditions during tear down from genuine error conditions. Before sending the drop notification with DROP_MISSED_TAIL_CALL reason, what if we were to do another lookup in the ipcache map to identify the cases we can ignore (and potentially send a different reason DROP_EP_DELETED)? (This is assuming that ipcache entry is removed prior to maps clean-up for an ep.)

(Meant to request changes while posting previous review comments.)

@ti-mo
Copy link
Contributor Author

ti-mo commented Apr 25, 2024

Thinking about DROP_MISSED_TAIL_CALL errors for in-flight packets, I wonder if there is a way to differentiate the race conditions during tear down from genuine error conditions.

@aditighag See #32151, that does what you describe. Doing subsequent lookups in the lxcmap/ipcache to decide how to handle missed tail calls is all subject to TOCTOU problems, so the best thing we can do if it happens is flag it and drop the packet.

@ti-mo ti-mo added backport/author The backport will be carried out by the author of the PR. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 25, 2024
@ti-mo ti-mo force-pushed the tb/define-endpoint-teardown branch from 26e218a to 7da33fd Compare April 25, 2024 11:49
@ti-mo
Copy link
Contributor Author

ti-mo commented Apr 25, 2024

/test

@julianwiedmann
Copy link
Member

Personally I'd split this into three patches (link-down, remove TC filters, consolidate BPF map management), they all feel fairly separate and have their own commit description. But otherwise this seems like a great improvement 👍.

And with the hard link-down, I'm also much less concerned about pinning down the "perfect" order in which we need to tear down the BPF map entries. If there are any subtle dependencies between the different maps, we'll find out eventually.

@ti-mo
Copy link
Contributor Author

ti-mo commented Apr 26, 2024

Personally I'd split this into three patches (link-down, remove TC filters, consolidate BPF map management), they all feel fairly separate and have their own commit description. But otherwise this seems like a great improvement 👍.

Technically yes, but it doesn't make sense to backport them partially, for example. And if we run into conflicts cherry-picking one of the commits, it's more difficult to track what the overall intention was in the upstream patch.

And with the hard link-down, I'm also much less concerned about pinning down the "perfect" order in which we need to tear down the BPF map entries. If there are any subtle dependencies between the different maps, we'll find out eventually.

Careful, this still matters to some extent. If host or overlay find an entry in the endpoint map, they'll still try to jump into the endpoint's policy program. (ignoring TOCTOU here) With #32151 this is now less of an issue, and at least after issuing an ifdown, we don't need to worry about the tc entrypoints potentially firing anymore. We should still try to tear things down gracefully to reduce the likelihood of undefined behaviour.

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved
pkg/datapath/loader/loader.go Show resolved Hide resolved
pkg/datapath/loader/netlink.go Show resolved Hide resolved
pkg/endpoint/bpf.go Show resolved Hide resolved
pkg/endpoint/bpf.go Show resolved Hide resolved
pkg/endpoint/bpf.go Outdated Show resolved Hide resolved
pkg/endpoint/bpf.go Outdated Show resolved Hide resolved
Prior to this commit, we left it up to the kernel to clean up tc attachments
when the CNI finally removes the veth when a Pod goes away. This leaves a window
of time where an endpoint's tc programs can potentially be invoked after
the endpoint's internal tail call maps have already been cleared and the
endpoint has been removed from the endpoint map and ipcache, resulting in
undefined behaviour.

This patch clearly defines the endpoint teardown sequence as follows:
- remove (endpoint) routes
- set the interface down
- detach tc(x) hooks
- remove endpoint from endpoint map
- remove endpoint policy program(s)
- delete conntrack map pins
- remove policy prog array map pin
- remove internal tail call map pin
- remove custom calls map pin

This puts the agent more in control of the teardown sequence and will allow us
to reason better about failures related to missed tail calls and other flakes.

Signed-off-by: Timo Beckers <timo@isovalent.com>
@ti-mo ti-mo force-pushed the tb/define-endpoint-teardown branch from 7da33fd to 6d80a75 Compare April 29, 2024 14:21
@ti-mo
Copy link
Contributor Author

ti-mo commented Apr 29, 2024

/test

@ti-mo ti-mo requested a review from lmb April 29, 2024 14:21
pkg/datapath/loader/netlink.go Show resolved Hide resolved
pkg/endpoint/endpoint.go Show resolved Hide resolved
Copy link
Contributor

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misread the doc for errors.As.

@ti-mo ti-mo dismissed aditighag’s stale review April 30, 2024 10:50

Unclear what to change

@ti-mo ti-mo added this pull request to the merge queue Apr 30, 2024
Merged via the queue into cilium:main with commit 6633ca8 Apr 30, 2024
63 of 64 checks passed
@ti-mo ti-mo deleted the tb/define-endpoint-teardown branch April 30, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI-improvement Topic or proposal to improve the Continuous Integration workflow backport/author The backport will be carried out by the author of the PR. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch release-note/misc This PR makes changes that have no direct user impact. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants