Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: MON-3513: add resource metrics api availability test #28699

Closed
wants to merge 1 commit into from

Conversation

slashpai
Copy link
Member

openshift/origin e2e tests don't explicitly exercise the metrics.k8s.io API group. We should add tests under the test/extended/prometheus proving that the API works before/after upgrade.

It will help showing off that the prometheus-adpater -> metrics-server migration happens seamlessly when we turn the feature gate on by default.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 10, 2024

@slashpai: This pull request references MON-3513 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.

In response to this:

openshift/origin e2e tests don't explicitly exercise the metrics.k8s.io API group. We should add tests under the test/extended/prometheus proving that the API works before/after upgrade.

It will help showing off that the prometheus-adpater -> metrics-server migration happens seamlessly when we turn the feature gate on by default.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2024
@slashpai
Copy link
Member Author

/retest

@slashpai
Copy link
Member Author

/test verify

Signed-off-by: Jayapriya Pai <janantha@redhat.com>
Copy link
Contributor

openshift-ci bot commented Apr 11, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: slashpai
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 11, 2024
g.By("verifying whether the nodes.metrics command works after the upgrade")
postUpgradeResponse, err := oc.Run("get").Args("nodes.metrics").Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(postUpgradeResponse).NotTo(o.BeEmpty())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there are any cases where non-emptiness could equate to an unexpected output (without any errors), but if so, would it be worth checking the contents of the output as well, just to be safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that checking for the correctness should happen on oc and/or CMO sides, IIUC the test is only to show that the API continues to work (is responsive/available) after the p-a->m-server switch (which will happen during an upgrade).
We can even get rid of the test in future versions.

If not already done, JP could provoke a failure (putting some nonexisting stuff in Args()) of the test so we get an idea about how it'd look like and to test the test :)

I know the release folks are migrating to the new monitortests framework for upgrade tests: you can see how we use it here https://github.com/openshift/origin/pull/28361/files. But for a temp test, I'm ok with keeping it like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If you agree with me that it's a temp test, it'd be great to add that as a comment in the test and to have a ticket that will revert it in the future)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea to have the test is to make sure resource metrics api is always available. We didn't have test for that but it was getting tested indirect way for HPA. The test will be still useful even full migration (PA->MS) happened as well. But for now ya we needed a test to prove it will always be functional

We do have other tests as well so would that mean we will need to move all tests using new framework?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not a temp test, I'd go for the monitortests framework (I could help if you have questions.)
But let's see what the others think.
(I'd be interested in a commit that provokes a failure in both cases though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good remark on monitortests, I'm definitely not up to date with the origin testing 😅
It looks like a good option and the statefulsets example is a good starting point from what I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with JP, we'll go with #28737

@slashpai
Copy link
Member Author

/retest-required

Copy link
Contributor

openshift-ci bot commented Apr 11, 2024

@slashpai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-upgrade b28c376 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-metal-ipi-ovn-ipv6 b28c376 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-gcp-csi b28c376 link false /test e2e-gcp-csi
ci/prow/e2e-metal-ipi-sdn b28c376 link false /test e2e-metal-ipi-sdn
ci/prow/e2e-aws-ovn-single-node-serial b28c376 link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: b28c376

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (23) are below the historical average (811): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@slashpai
Copy link
Member Author

Closing in favour of #28737

@slashpai slashpai closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants