Report API failures in SpiceDBCluster status #79

ecordell · 2022-09-06T21:43:32Z

Right now, when an API call fails (i.e. when creating or updating a resource on the cluster), the operator requeues the object and tries again later (respecting APF responses if present).

This is generally the right thing to do, but it can hide non-transient errors (like RBAC problems).

We could spend time sorting through which errors are transient and which are not, but I think a more general approach would be:

Any time we need to requeue, we should attempt to record the reason for it on in the object's status.
The only exception would be if the operator can't update the status to record the reason for the requeue.

This should result in an operator that never requires reading logs for unusual situations, unless you can see that it has been wedged somehow (which should be evident from a stuck observedGeneration on the status)

The text was updated successfully, but these errors were encountered:

ecordell added the priority/2 medium This needs to be done label Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report API failures in SpiceDBCluster status #79

Report API failures in SpiceDBCluster status #79

ecordell commented Sep 6, 2022

Report API failures in SpiceDBCluster status #79

Report API failures in SpiceDBCluster status #79

Comments

ecordell commented Sep 6, 2022