Handle GroupDiscoveryFailedError in proxy.go (#5596) #6222

gabe-l-hart · 2022-12-15T17:45:23Z

Description of the change:

Similar to the fix in helm, this fix allows GroupDiscoveryFailedError to not error out the process of managing apiserver resource types.

Motivation for the change:

The proposed fix in #5596 requires that the cluster's state be fixed (upgrading / cleaning up the metrics server apiserver resource). While this is correct, it's not a fix that can be made by the author of an operator as it requires the user of the operator to fix their cluster. Fixing this on the client side (in the operator itself) will save headaches and support tickets in the future.

Checklist

If the pull request includes user-facing changes, extra documentation is required:

Add a new changelog fragment in changelog/fragments (see changelog/fragments/00-template.yaml)
Add or update relevant sections of the docs website in website/content/en/docs

…work#5596) Similar to the fix in helm (helm/helm#6361), this fix allows GroupDiscoveryFailedError to not error out the process of managing apiserver resource types. Fixes operator-framework#5596 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Add a changelog fragment for the merge automation operator-framework#5596 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart · 2022-12-15T17:55:07Z

Hi maintainers! I think this fix should address the linked issue, but I'm still working on getting a valid repro case that will allow me to fully validate. I'll also plan to look into a unit test to exercise the corner case before moving to full review.

gabe-l-hart · 2022-12-15T21:16:44Z

I've verified that make test-e2e-ansible passes cleanly. I'm still trying to learn my way around the test framework, but it looks like the most appropriate place to add tests would be in cluster_tests.go. It's not clear to me quite how we would simulate this kind of apiserver failure in those tests, so I'm open to suggestions!

everettraven · 2022-12-20T20:01:03Z

@gabe-l-hart Thanks for taking the time to make this contribution!

I looked through the changes and it seems relatively straightforward. I would definitely like to find a way to add a test case for this PR before merging though to ensure the change is working as expected. cluster_test.go definitely looks to be the right place to add a test case.

Regarding the implementation of the test case itself, this comment on the Helm issue you linked makes me think as part of the test case we would need to install an api-service onto the test cluster and then intentionally orphan it somehow (I'm honestly not sure how to do this - not very familiar with aggregated apiservers).

gabe-l-hart · 2022-12-21T16:00:27Z

Thanks for the pointer @everettraven! I'll look into adding a test and see how far I get. I also don't know yet how to simulate an orphaned apiserver (we just saw this come up on a customer cluster, but I have no idea how it got in that state!) but I'm sure it should be doable somehow, if not in a real cluster at least in some kind of a mock.

everettraven · 2022-12-21T17:21:05Z

I also don't know yet how to simulate an orphaned apiserver (we just saw this come up on a customer cluster, but I have no idea how it got in that state!) but I'm sure it should be doable somehow, if not in a real cluster at least in some kind of a mock.

@gabe-l-hart I was doing some pretty rudimentary digging around and I think a good starting point for understanding how the apiservers work could be: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/

I also saw this: https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/api-service-v1/ - it looks to go over the APIService resource and I wonder if we could somehow use that to fake out the cluster we are using for testing that there is an apiserver running for a specific group/version without there actually being a backend implemented (maybe this would cause the error to happen?).

Just some more thoughts from doing some brief digging around about aggregated apiservers - I'm looking forward to seeing what you come up with (and feel free to ping me on here if you just want to bounce ideas around :))

openshift-bot · 2023-03-22T01:00:49Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-04-21T08:30:51Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2023-05-22T00:01:03Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2023-05-22T00:01:07Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 15, 2022

ansible: Add changelog fragment

69b2d07

Add a changelog fragment for the merge automation operator-framework#5596 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart marked this pull request as ready for review December 15, 2022 21:16

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 15, 2022

openshift-ci bot requested review from jberkhahn and joelanford December 15, 2022 21:16

gabe-l-hart temporarily deployed to deploy December 20, 2022 20:01 — with GitHub Actions Inactive

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 21, 2023

openshift-ci bot closed this May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle GroupDiscoveryFailedError in proxy.go (#5596) #6222

Handle GroupDiscoveryFailedError in proxy.go (#5596) #6222

gabe-l-hart commented Dec 15, 2022 •

edited

gabe-l-hart commented Dec 15, 2022

gabe-l-hart commented Dec 15, 2022

everettraven commented Dec 20, 2022

gabe-l-hart commented Dec 21, 2022

everettraven commented Dec 21, 2022

openshift-bot commented Mar 22, 2023

openshift-bot commented Apr 21, 2023

openshift-bot commented May 22, 2023

openshift-ci bot commented May 22, 2023

Handle GroupDiscoveryFailedError in proxy.go (#5596) #6222

Handle GroupDiscoveryFailedError in proxy.go (#5596) #6222

Conversation

gabe-l-hart commented Dec 15, 2022 • edited

gabe-l-hart commented Dec 15, 2022

gabe-l-hart commented Dec 15, 2022

everettraven commented Dec 20, 2022

gabe-l-hart commented Dec 21, 2022

everettraven commented Dec 21, 2022

openshift-bot commented Mar 22, 2023

openshift-bot commented Apr 21, 2023

openshift-bot commented May 22, 2023

openshift-ci bot commented May 22, 2023

gabe-l-hart commented Dec 15, 2022 •

edited