-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade: retry if default DSCI creation fails #1008
upgrade: retry if default DSCI creation fails #1008
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/retest |
1 similar comment
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AjayJagan, zdtsw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
After removing leader election, operator fails to start if it is instructed to create default DSCI. Looks like webhook is not ready by the time: ``` create default DSCI CR. {"level":"error","ts":"2024-05-13T09:25:58Z","logger":"setup","msg":"unable to create initial setup for the operator","error":"Internal error occurred: failed calling webhook \"operator.opendatahub.io\": failed to call webhook: Post \"https://opendatahub-operator-controller-manager-service.oo-2ts9m.svc:443/validate-opendatahub-io-v1?timeout=10s\": no endpoints available for service \"opendatahub-operator-controller-manager-service\"","stacktrace":"main.main.func1\n\t/workspace/main.go:200\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/remote-source/operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/manager.go:336\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/remote-source/operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"} ``` Leader election added some delay. The problem does not happen in default configuration since it explicitly disables DSCI creation in the manifests: ``` containers: - command: - /manager env: - name: DISABLE_DSC_CONFIG value: 'true' args: - --operator-name=opendatahub image: controller:latest ``` Make a wrapper function cluster.CreateWithRetry for client.Object creation with timeout. Use hardcoded 5s interval, just seems reasonable, and timeout in minutes as the parameter. It requires disable linter nilerr since for the polling function error in creation is a valid condition, something the function wait to disappear. Fixes: 3610b0b ("feat: remove leader election for operator (opendatahub-io#1000)") Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
a7889fb
to
a892b10
Compare
Actually, generic is overthinking, taking interface here is enough. |
/retest |
/retest-required |
/lgtm |
e26100e
into
opendatahub-io:incubation
* Update version to v2.12.0 (#1007) * upgrade: retry if default DSCI creation fails (#1008) After removing leader election, operator fails to start if it is instructed to create default DSCI. Looks like webhook is not ready by the time: ``` create default DSCI CR. {"level":"error","ts":"2024-05-13T09:25:58Z","logger":"setup","msg":"unable to create initial setup for the operator","error":"Internal error occurred: failed calling webhook \"operator.opendatahub.io\": failed to call webhook: Post \"https://opendatahub-operator-controller-manager-service.oo-2ts9m.svc:443/validate-opendatahub-io-v1?timeout=10s\": no endpoints available for service \"opendatahub-operator-controller-manager-service\"","stacktrace":"main.main.func1\n\t/workspace/main.go:200\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/remote-source/operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/manager.go:336\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/remote-source/operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"} ``` Leader election added some delay. The problem does not happen in default configuration since it explicitly disables DSCI creation in the manifests: ``` containers: - command: - /manager env: - name: DISABLE_DSC_CONFIG value: 'true' args: - --operator-name=opendatahub image: controller:latest ``` Make a wrapper function cluster.CreateWithRetry for client.Object creation with timeout. Use hardcoded 5s interval, just seems reasonable, and timeout in minutes as the parameter. It requires disable linter nilerr since for the polling function error in creation is a valid condition, something the function wait to disappear. Fixes: 3610b0b ("feat: remove leader election for operator (#1000)") Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Co-authored-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Description
After removing leader election, operator fails to start if it is instructed to create default DSCI. Looks like webhook is not ready by the time:
Leader election added some delay.
The problem does not happen in default configuration since it explicitly disables DSCI creation in the manifests:
Make a wrapper function cluster.CreateWithRetry for client.Object creation with timeout. Use 5s interval, just seems reasonable, and timeout in minutes as the parameter.
It requires disable linter nilerr since for the polling function error in creation is a valid condition, something the function wait to disappear.
Fixes: 3610b0b ("feat: remove leader election for operator (#1000)")
How Has This Been Tested?
Merge criteria: