Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report API failures in SpiceDBCluster status #79

Open
ecordell opened this issue Sep 6, 2022 · 0 comments
Open

Report API failures in SpiceDBCluster status #79

ecordell opened this issue Sep 6, 2022 · 0 comments
Labels
priority/2 medium This needs to be done

Comments

@ecordell
Copy link
Contributor

ecordell commented Sep 6, 2022

Right now, when an API call fails (i.e. when creating or updating a resource on the cluster), the operator requeues the object and tries again later (respecting APF responses if present).

This is generally the right thing to do, but it can hide non-transient errors (like RBAC problems).

We could spend time sorting through which errors are transient and which are not, but I think a more general approach would be:

  • Any time we need to requeue, we should attempt to record the reason for it on in the object's status.
  • The only exception would be if the operator can't update the status to record the reason for the requeue.

This should result in an operator that never requires reading logs for unusual situations, unless you can see that it has been wedged somehow (which should be evident from a stuck observedGeneration on the status)

@ecordell ecordell added the priority/2 medium This needs to be done label Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/2 medium This needs to be done
Projects
None yet
Development

No branches or pull requests

1 participant