Replace ApplyResourceWithRetry with ApplyResource #751

cumirror · 2023-10-18T13:07:41Z

What happened:

Carefully analyzing the ApplyResourceWithRetry method, it seems that there is no need for retry, so i wrote the ApplyResource function to replace ApplyResourceWithRetry.

Modifications:

use wait.ExponentialBackoffWithContext to prevent conflicts in ApplyResourceWithRetry.

I think this is not reasonable because the controller-manager has already performed the master selection operation, there will be no much conflict, and the controller will retry again if failure, it's not necessary to retry here.

In the processing, create resource first, regardless of whether or not it exists, and update if an error is returned indicating that it exists.

I think it's not very reasonable, in most scenarios, resource updates are more frequent than creations, which means that checking for existence and then create or update based on the result is more reasonable.

Is there a parameter passing error on Line594? It should be passing resourceCopy instead of resource.

What you expected to happen:

use ApplyResource instead of ApplyResourceWithRetry

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

dixudx · 2023-10-18T14:54:25Z

I think this is not reasonable because the controller-manager has already performed the master selection operation, there will be no much conflict, and the controller will retry again if failure, it's not necessary to retry here.

All the retries here have nothing to do with the leader election.

For any failures, it is true that controller will retry again. But with retries, we make this method more resilient to any conflicts. And most of the time, these conflicts are easy to solve with a simple update or field convergence in a very short period.

As you may know, the resources in the cluster may get varied all the time. It is very common to see conflicts when we create or update a resource. By using retries, it is definitely far more faster to deploy the resources successfully to clusters than letting controller resync in the next round, where we may have conflicts as well, especially when we have a large queue.

What's more important here is that we may not apply back all the immutable value in a single ApplyResource operation, even we've got a list of status causes. But they may not be completed. Or there may be other potential conflicts to solve. If we are not adopting retries here, we may fall into the same situation in the next round. It is low-efficient.

Thus, I'd prefer this kind of resilient way. And it is more efficient.

I think it's not very reasonable, in most scenarios, resource updates are more frequent than creations, which means that checking for existence and then create or update based on the result is more reasonable.

It is hard to say "updates are more frequent than creations". That depends. Maybe you're right.

For GET-First-Then-CREATE-UPDATE step, it takes 2 steps to finish a normal CREATE operation. While for CREATE-or-UPDATE step, it only takes 1 step to finishing the creation. No matter which way you choose, it is a preference.

Furthermore, you could find this kind of CREATE-or-UPDATE preference eveywhere in Kubernetes repo.

Is there a parameter passing error on Line594? It should be passing resourceCopy instead of resource.

Thanks for pointing out. You're right. It should be resourceCopy here.

cumirror · 2023-10-19T04:37:26Z

@dixudx Thank you for your patient explanation.

about retry

I understand that retry are more important for solving the problem of immutable values.

For other errors during processing, I still believe that retries have limited effects. In the process of applying resources, I have encountered two types of failures: inability to connect to the member cluster and request throttling. However, these scenarios cannot be resolved through retries within the function. These failure cases can be better addressed through the controller's delay queue. For other failure cases, there may be issues such as processing timeouts, insufficient access permissions, and other problems that cannot be resolved through retries within the function.

Therefore, retry only need for the problem of immutable values, and I have made some modifications, as shown here: retry for doApplyPatch.

about the CREATE-or-UPDATE preference

In the Kubernetes repository, I have found some usage, such as in the kubeadm apiclient.CreateOrUpdateConfigMap function on Line284.

However, there is little such processing in the pkg/controller directory, and I have located it by searching for IsAlreadyExists.

$ grep -r "IsAlreadyExists" pkg/controller/
pkg/controller//statefulset/stateful_pod_control.go:	if apierrors.IsAlreadyExists(err) {
pkg/controller//statefulset/stateful_pod_control.go:			if err == nil || !apierrors.IsAlreadyExists(err) {
pkg/controller//statefulset/stateful_pod_control_test.go:	if err := control.CreateStatefulPod(context.TODO(), set, pod); !apierrors.IsAlreadyExists(err) {
pkg/controller//serviceaccount/serviceaccounts_controller.go:		if _, err := c.client.CoreV1().ServiceAccounts(ns.Name).Create(ctx, &sa, metav1.CreateOptions{}); err != nil && !apierrors.IsAlreadyExists(err) {
pkg/controller//cronjob/cronjob_controllerv2.go:	case errors.IsAlreadyExists(err):
pkg/controller//daemon/update.go:	if outerErr := err; errors.IsAlreadyExists(outerErr) {
pkg/controller//deployment/sync.go:	case errors.IsAlreadyExists(err):
pkg/controller//history/controller_history.go:		if errors.IsAlreadyExists(err) {
pkg/controller//history/controller_history.go:		if errors.IsAlreadyExists(err) {
pkg/controller//volume/persistentvolume/pv_controller.go:		if newVol, err = ctrl.kubeClient.CoreV1().PersistentVolumes().Create(ctx, volume, metav1.CreateOptions{}); err == nil || apierrors.IsAlreadyExists(err) {
pkg/controller//volume/persistentvolume/pv_controller.go:		case goroutinemap.IsAlreadyExists(err):

I have read part of the code above and found that most of the logic is still created when it does not exist, so I understand that CREATE-or-UPDATE is not a general method. I am not sure if you have any suitable examples, so I will do some more reading.

cumirror added the kind/bug Something isn't working label Oct 18, 2023

cumirror mentioned this issue Oct 18, 2023

Replace ApplyResourceWithRetry with ApplyResource #752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ApplyResourceWithRetry with ApplyResource #751

Replace ApplyResourceWithRetry with ApplyResource #751

cumirror commented Oct 18, 2023

dixudx commented Oct 18, 2023

cumirror commented Oct 19, 2023

Replace ApplyResourceWithRetry with ApplyResource #751

Replace ApplyResourceWithRetry with ApplyResource #751

Comments

cumirror commented Oct 18, 2023

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

dixudx commented Oct 18, 2023

cumirror commented Oct 19, 2023

about retry

about the CREATE-or-UPDATE preference