New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect controllers on PVCs for retention policy #122499
base: master
Are you sure you want to change the base?
Conversation
/sig apps |
/triage accepted |
/priority important-soon |
/lgtm |
LGTM label has been added. Git tree hash: 3af13a5ff2cc1cd46ea27a3174fd1f2f9597b125
|
/assign @soltysh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattcary I found couple of problems with the current implementation
I feel that hasOwnerRef
should have a bool checkController
parameter. And we should not check the controller flag if we want to delete the owner ref, but we should check it if we want to introduce it / overwrite it.
I think this goes for all the cases in both claimOwnerMatchesSetAndPod
and updateClaimOwnerRefForSetAndPod
functions.
+ would be good to document why we are doing it
@@ -305,6 +333,7 @@ func setOwnerRef(target, owner metav1.Object, ownerType *metav1.TypeMeta) bool { | |||
Kind: ownerType.Kind, | |||
Name: owner.GetName(), | |||
UID: owner.GetUID(), | |||
Controller: ptr.To(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is already existing owner reference, this append will duplicate it. So there will be one reference without a controller and one with. Can we deduplicate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasOwnerRef checks that there is not an owner with the same UID, regardless of the value of the controller. Oops wait I did add that check to controller in there. Thx
Although, it occurs to me, if the StatefulSet is updated, the UID will change. So we probably want hasOwnerRef to check on api version + kind + name + namespace, which will be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by updated? Normally the UID shouldn't change and should be present and is used by the garbage collector. And if a person deletes the StatefulSet using the orphan strategy it should also remove the owner reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't know the ownerref was removed with the orphan strategy.
Mostly I think I got confused.
I guess the problem could be a pvc that's slow to delete, and a statefulset that's deleted and recreated (the same case the stale pod claim check deals with). In that case though a new pod won't bind to a deleted PVC, and so it should eventually delete and the statefulset controller will recreate a new PVC.
So yeah, we should just look at UIDs (except for the stale pod case).
@@ -305,6 +333,7 @@ func setOwnerRef(target, owner metav1.Object, ownerType *metav1.TypeMeta) bool { | |||
Kind: ownerType.Kind, | |||
Name: owner.GetName(), | |||
UID: owner.GetUID(), | |||
Controller: ptr.To(true), | |||
}) | |||
target.SetOwnerReferences(ownerRefs) | |||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also removeOwnerRef
seems it will not work correctly if there is an owner reference in the old format because hasOwnerRef
is more strict now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this will be fixed
case hasUnexpectedController(claim, set, pod): | ||
// Do not set any ownerRefs if there is an unexpected controller. | ||
msg := fmt.Sprintf("Claim %s has a conflicting controller, the retention policy is ignored for this claim", claimName) | ||
spc.recorder.Event(set, v1.EventTypeWarning, "ConflictingController", msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this part from the KEP seems not to be satisfied and should apply for all policy combinations:
If there is another controller, the existing non-controller StatefulSet or Pod reference will be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this! You make some good points.
Let's resolve the conversation on the KEP first about what the proper semantics should be, then I'll work through your comments here -- you raise some excellent points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh, you are correct. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change is in progress. It was better to make a little more complete re-vamp of the logic to keep it from getting too complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at that. And yeah, there are a lot of cases that need to be covered. It would be nice to have them all covered by the tests.
6494d39
to
d1cff32
Compare
Uff, that was tricky. I didn't realize that the stale owner ref would trigger the garbage collector on the PVC. When filtering the e2e run to just the single test, the GC doesn't get called, but when running in the presubmit it gets called much more frequently. So the behavior was different locally than in the presubmit. I used a finalizer to pin the PVC with the stale ref so we can test the update behavior. If the tests pass this should be good to review. |
/retest pull-kubernetes-conformance-kind-ipv6-parallel Flake in the n/w test. |
@mattcary: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-conformance-kind-ipv6-parallel |
fyi since the 1.30 freeze has started, I'll wait until the thaw next month this, then submit this bug fix & backport to 1.30, unless there are objections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GJ on rewriting the code and the tests. I think the readability is good, but unfortunately it still takes a bit of time to understand the code, but this is due to the problem domain...
// nonController returns true if the pod or set is an owner but not controller of the claim. | ||
func nonController(claim *v1.PersistentVolumeClaim, set *apps.StatefulSet, pod *v1.Pod) bool { | ||
for _, ownerRef := range claim.GetOwnerReferences() { | ||
if ownerRef.UID == set.GetUID() || ownerRef.UID == pod.GetUID() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my original thought was:
Suppose you have a list of owner references:
[{Kind: "Pod", Controller: nil}, {Kind: "StatefulSet", Controller: ptr.To(true)}]
If both of these match our set and pod UIDs, then the function will return early, even though we are the controller.
hm maybe I am confused by the name of the function and the fact there cannot be two controllers and it checks only some refs.
No code change necessary, but maybe we could just name it better? What about isMissingControllerFlag
?
} | ||
|
||
if hasNonController(claim, set, pod) { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return false | |
// Need to adopt the claim as a controller. | |
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other resource besides our set or pod has an owner ref
It seems that other resources do not necessarily have to be registered as owners in the references to get to this line. That is, it can be just one owner reference (e.g. our set or pod).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
claim := v1.PersistentVolumeClaim{} | ||
claim.SetOwnerReferences(tc.originalRefs) | ||
updateClaimOwnerRefForSetAndPod(logger, &claim, &set, &pod) | ||
if ownerRefsChanged(tc.expectedRefs, claim.GetOwnerReferences()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it won't, but the order should be predictable so it should be fine in our test (it is used in other places in the codebase. I think it is preferable from a maintainability perspective, rather than adding another helper function.
c1d3b18
to
a5d3cae
Compare
@atiratree , I've responded and/or implemented all of your comments as far as I can tell, let me know what you think. Thanks for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattcary there are still some nits and testing considerations
// nonController returns true if the pod or set is an owner but not controller of the claim. | ||
func nonController(claim *v1.PersistentVolumeClaim, set *apps.StatefulSet, pod *v1.Pod) bool { | ||
for _, ownerRef := range claim.GetOwnerReferences() { | ||
if ownerRef.UID == set.GetUID() || ownerRef.UID == pod.GetUID() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, although it would be better to have it reflected in the function name as well
claim := v1.PersistentVolumeClaim{} | ||
claim.SetOwnerReferences(tc.originalRefs) | ||
updateClaimOwnerRefForSetAndPod(logger, &claim, &set, &pod) | ||
if ownerRefsChanged(tc.expectedRefs, claim.GetOwnerReferences()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. In that case, I think it would be appropriate to have a unit test for the new function.
} | ||
|
||
if hasNonController(claim, set, pod) { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other resource besides our set or pod has an owner ref
It seems that other resources do not necessarily have to be registered as owners in the references to get to this line. That is, it can be just one owner reference (e.g. our set or pod).
test/e2e/apps/statefulset.go
Outdated
BlockOwnerDeletion: ptr.To(true), | ||
}, | ||
} | ||
ginkgo.By("Expect claims 0 and 2 to have ownerRefs to the statefulset, and 3 to have a stale reference") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY. Let's also change the description:
ginkgo.By("Expect claims 0 and 2 to have ownerRefs to the statefulset, and 3 to have a stale reference") | |
ginkgo.By("Expect claims 0,1 and 2 to have ownerRefs to the statefulset, and 3 to have a stale reference") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thx, done.
fc472e6
to
8400017
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 nits, but the PR lgtm.
Nevertheless, it would be good to get another set of eyes on it.
cc @soltysh
unit test flaking due to #124526 |
/kind bug
Fixes #122400
/assign @msau42
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: