Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake watchcache tests #124610

Merged
merged 1 commit into from
Apr 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
46 changes: 25 additions & 21 deletions staging/src/k8s.io/apiserver/pkg/storage/testing/store_tests.go
Original file line number Diff line number Diff line change
Expand Up @@ -1631,7 +1631,9 @@ func RunTestListContinuation(ctx context.Context, t *testing.T, store storage.In
}
}
options := storage.ListOptions{
ResourceVersion: "0",
// Limit is ignored when ResourceVersion is set to 0.
// Set it to consistent read.
ResourceVersion: "",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures from before this PR could be easily triggered by adding time.Sleep(time.Second) in appropriate tests (cacher_test.go) after creating watchcache but before starting the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does adding a sleep cause the test to fail?
By changing the RV to consistent read, the list call will be delegated to the underlying storage. I assume this was what this test intended to do, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without sleep, the list is happening before watchcache is initialized and was delegated to underlying storage.
If watchcache is initialized, the this test is failing, because watchcache is ignoring Limit for RV=0.

We know that for RV=0, limit is ignored when used by watchcache - so yes, we want to test if for other RVs, it doesn't matter if its delegated or not, the result is the important stuff (but yes, it's delegated).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test for RV=0 to check if the limit is ignored for the watchcache ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure we had one but I can't find any now. We should add that as a follow-up (though I wouldn't block this PR on it as it's not regressing our coverage).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I can add the new test if you don't have time. Just let me know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I found what I was looking for:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/testing/store_tests.go#L862-L866
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/testing/store_tests.go#L881-L885

We are now ignoring it, it might be better to slightly update them to check that we return everything in those cases.
If you have time to take it, it would be great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i will have a look.

Predicate: pred(1, ""),
Recursive: true,
}
Expand All @@ -1654,7 +1656,8 @@ func RunTestListContinuation(ctx context.Context, t *testing.T, store storage.In
// no limit, should get two items
out = &example.PodList{}
options = storage.ListOptions{
ResourceVersion: "0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, setting the continuation token and an RV doesn't yield an error, is that correct ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx.

// ResourceVersion should be unset when setting continuation token.
ResourceVersion: "",
Predicate: pred(0, continueFromSecondItem),
Recursive: true,
}
Expand All @@ -1677,7 +1680,8 @@ func RunTestListContinuation(ctx context.Context, t *testing.T, store storage.In
// limit, should get two more pages
out = &example.PodList{}
options = storage.ListOptions{
ResourceVersion: "0",
// ResourceVersion should be unset when setting continuation token.
ResourceVersion: "",
Predicate: pred(1, continueFromSecondItem),
Recursive: true,
}
Expand All @@ -1699,7 +1703,8 @@ func RunTestListContinuation(ctx context.Context, t *testing.T, store storage.In

out = &example.PodList{}
options = storage.ListOptions{
ResourceVersion: "0",
// ResourceVersion should be unset when setting continuation token.
ResourceVersion: "",
Predicate: pred(1, continueFromThirdItem),
Recursive: true,
}
Expand Down Expand Up @@ -1815,7 +1820,9 @@ func RunTestListContinuationWithFilter(ctx context.Context, t *testing.T, store
}
}
options := storage.ListOptions{
ResourceVersion: "0",
// Limit is ignored when ResourceVersion is set to 0.
// Set it to consistent read.
ResourceVersion: "",
Predicate: pred(2, ""),
Recursive: true,
}
Expand Down Expand Up @@ -1845,7 +1852,8 @@ func RunTestListContinuationWithFilter(ctx context.Context, t *testing.T, store
// both read counters should be incremented for the singular calls they make in this case
out = &example.PodList{}
options = storage.ListOptions{
ResourceVersion: "0",
// ResourceVersion should be unset when setting continuation token.
ResourceVersion: "",
Predicate: pred(2, cont),
Recursive: true,
}
Expand Down Expand Up @@ -2407,7 +2415,7 @@ func RunTestGuaranteedUpdateWithSuggestionAndConflict(ctx context.Context, t *te
err := store.GuaranteedUpdate(ctx, key, updatedPod, false, nil,
storage.SimpleUpdate(func(obj runtime.Object) (runtime.Object, error) {
pod := obj.(*example.Pod)
pod.Name = "foo-2"
pod.Generation = 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests were somewhat broken for two reasons:

  • changing name is unrealistic (forbidden) and watchcache is lost - switched to use Generation instead
  • suggestion can be ignored by implementation - relaxed the validation check below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was RunTestGuaranteedUpdateWithSuggestionAndConflict flaky or did you change it to match reality ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both:

  • due to changing names, watchcache wasn't behaving as it is in reality (it's caching by objects namespace/name, not by the given key, so the caching was changing in the meantime
  • the last check was incorrect and you can trigger it by putting sleep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are changing the test to match reality would it make sense to add a new annotation instead of messing with the Generation field which is set by the system ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, the Generation will be updated during update before we call this function so it is okay to mess with that field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generation isn't handled automatically by the system - we're handing it manually at strategy level for individual resources. So no - I think this is good.

return pod, nil
}),
nil,
Expand All @@ -2424,24 +2432,24 @@ func RunTestGuaranteedUpdateWithSuggestionAndConflict(ctx context.Context, t *te
err = store.GuaranteedUpdate(ctx, key, updatedPod2, false, nil,
storage.SimpleUpdate(func(obj runtime.Object) (runtime.Object, error) {
pod := obj.(*example.Pod)
if pod.Name != "foo-2" {
if pod.Generation != 2 {
if sawConflict {
t.Fatalf("unexpected second conflict")
}
sawConflict = true
// simulated stale object - return a conflict
return nil, apierrors.NewConflict(example.SchemeGroupVersion.WithResource("pods").GroupResource(), "name", errors.New("foo"))
}
pod.Name = "foo-3"
pod.Generation = 3
return pod, nil
}),
originalPod,
)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if updatedPod2.Name != "foo-3" {
t.Errorf("unexpected pod name: %q", updatedPod2.Name)
if updatedPod2.Generation != 3 {
t.Errorf("unexpected pod generation: %q", updatedPod2.Generation)
}

// Third, update using a current version as the suggestion.
Expand All @@ -2452,14 +2460,8 @@ func RunTestGuaranteedUpdateWithSuggestionAndConflict(ctx context.Context, t *te
err = store.GuaranteedUpdate(ctx, key, updatedPod3, false, nil,
storage.SimpleUpdate(func(obj runtime.Object) (runtime.Object, error) {
pod := obj.(*example.Pod)
if pod.Name != updatedPod2.Name || pod.ResourceVersion != updatedPod2.ResourceVersion {
t.Errorf(
"unexpected live object (name=%s, rv=%s), expected name=%s, rv=%s",
pod.Name,
pod.ResourceVersion,
updatedPod2.Name,
updatedPod2.ResourceVersion,
)
if pod.Generation != updatedPod2.Generation || pod.ResourceVersion != updatedPod2.ResourceVersion {
t.Logf("stale object (rv=%s), expected rv=%s", pod.ResourceVersion, updatedPod2.ResourceVersion)
}
attempts++
return nil, fmt.Errorf("validation or admission error")
Expand All @@ -2469,8 +2471,10 @@ func RunTestGuaranteedUpdateWithSuggestionAndConflict(ctx context.Context, t *te
if err == nil {
t.Fatalf("expected error, got none")
}
if attempts != 1 {
t.Errorf("expected 1 attempt, got %d", attempts)
// Implementations of the storage interface are allowed to ignore the suggestion,
// in which case two attempts are possible.
if attempts > 2 {
t.Errorf("update function should have been called at most twice, called %d", attempts)
}
}

Expand Down