You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, when an API call fails (i.e. when creating or updating a resource on the cluster), the operator requeues the object and tries again later (respecting APF responses if present).
This is generally the right thing to do, but it can hide non-transient errors (like RBAC problems).
We could spend time sorting through which errors are transient and which are not, but I think a more general approach would be:
Any time we need to requeue, we should attempt to record the reason for it on in the object's status.
The only exception would be if the operator can't update the status to record the reason for the requeue.
This should result in an operator that never requires reading logs for unusual situations, unless you can see that it has been wedged somehow (which should be evident from a stuck observedGeneration on the status)
The text was updated successfully, but these errors were encountered:
Right now, when an API call fails (i.e. when creating or updating a resource on the cluster), the operator requeues the object and tries again later (respecting APF responses if present).
This is generally the right thing to do, but it can hide non-transient errors (like RBAC problems).
We could spend time sorting through which errors are transient and which are not, but I think a more general approach would be:
This should result in an operator that never requires reading logs for unusual situations, unless you can see that it has been wedged somehow (which should be evident from a stuck
observedGeneration
on the status)The text was updated successfully, but these errors were encountered: