New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflake watchcache tests #124610
Deflake watchcache tests #124610
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ResourceVersion: "0", | ||
// Limit is ignored when ResourceVersion is set to 0. | ||
// Set it to consistent read. | ||
ResourceVersion: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failures from before this PR could be easily triggered by adding time.Sleep(time.Second)
in appropriate tests (cacher_test.go) after creating watchcache but before starting the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does adding a sleep cause the test to fail?
By changing the RV to consistent read, the list call will be delegated to the underlying storage. I assume this was what this test intended to do, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without sleep, the list is happening before watchcache is initialized and was delegated to underlying storage.
If watchcache is initialized, the this test is failing, because watchcache is ignoring Limit for RV=0.
We know that for RV=0, limit is ignored when used by watchcache - so yes, we want to test if for other RVs, it doesn't matter if its delegated or not, the result is the important stuff (but yes, it's delegated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test for RV=0 to check if the limit is ignored for the watchcache ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure we had one but I can't find any now. We should add that as a follow-up (though I wouldn't block this PR on it as it's not regressing our coverage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I can add the new test if you don't have time. Just let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I found what I was looking for:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/testing/store_tests.go#L862-L866
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/testing/store_tests.go#L881-L885
We are now ignoring it, it might be better to slightly update them to check that we return everything in those cases.
If you have time to take it, it would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i will have a look.
ac3e675
to
6d9edcc
Compare
@@ -2407,7 +2415,7 @@ func RunTestGuaranteedUpdateWithSuggestionAndConflict(ctx context.Context, t *te | |||
err := store.GuaranteedUpdate(ctx, key, updatedPod, false, nil, | |||
storage.SimpleUpdate(func(obj runtime.Object) (runtime.Object, error) { | |||
pod := obj.(*example.Pod) | |||
pod.Name = "foo-2" | |||
pod.Generation = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests were somewhat broken for two reasons:
- changing name is unrealistic (forbidden) and watchcache is lost - switched to use Generation instead
- suggestion can be ignored by implementation - relaxed the validation check below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was RunTestGuaranteedUpdateWithSuggestionAndConflict
flaky or did you change it to match reality ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both:
- due to changing names, watchcache wasn't behaving as it is in reality (it's caching by objects namespace/name, not by the given key, so the caching was changing in the meantime
- the last check was incorrect and you can trigger it by putting sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are changing the test to match reality would it make sense to add a new annotation instead of messing with the Generation field which is set by the system ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the Generation will be updated during update before we call this function so it is okay to mess with that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generation isn't handled automatically by the system - we're handing it manually at strategy level for individual resources. So no - I think this is good.
/assign @p0lyn0mial |
ResourceVersion: "0", | ||
// Limit is ignored when ResourceVersion is set to 0. | ||
// Set it to consistent read. | ||
ResourceVersion: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does adding a sleep cause the test to fail?
By changing the RV to consistent read, the list call will be delegated to the underlying storage. I assume this was what this test intended to do, right?
@@ -1654,7 +1656,8 @@ func RunTestListContinuation(ctx context.Context, t *testing.T, store storage.In | |||
// no limit, should get two items | |||
out = &example.PodList{} | |||
options = storage.ListOptions{ | |||
ResourceVersion: "0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, setting the continuation token and an RV doesn't yield an error, is that correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thx.
@@ -2407,7 +2415,7 @@ func RunTestGuaranteedUpdateWithSuggestionAndConflict(ctx context.Context, t *te | |||
err := store.GuaranteedUpdate(ctx, key, updatedPod, false, nil, | |||
storage.SimpleUpdate(func(obj runtime.Object) (runtime.Object, error) { | |||
pod := obj.(*example.Pod) | |||
pod.Name = "foo-2" | |||
pod.Generation = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was RunTestGuaranteedUpdateWithSuggestionAndConflict
flaky or did you change it to match reality ?
/lgtm |
LGTM label has been added. Git tree hash: 40c0154dc0f651e815ea8da46471bac6ad77c859
|
/triage accepted |
Found when working on kubernetes/enhancements#4568
/kind flake
/priority important-longterm
/sig api-machinery