Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e tests for metrics #11307

Merged
merged 1 commit into from Apr 4, 2024

Conversation

machadovilaca
Copy link
Member

What this PR does

Before this PR:

KubeVirt had no e2e tests validating that metrics were being scrapped by Prometheus

After this PR:

KubeVirt has e2e tests validating that metrics from all components are being scrapped by Prometheus

jira-ticket: https://issues.redhat.com/browse/CNV-29328

Fixes #

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Special notes for your reviewer

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

Add e2e tests for metrics

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Feb 20, 2024
@machadovilaca
Copy link
Member Author

/cc @sradco @enp0s3

@machadovilaca
Copy link
Member Author

/retest

@machadovilaca
Copy link
Member Author

/hold
needs #11264

@kubevirt-bot kubevirt-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 21, 2024
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2024
@machadovilaca
Copy link
Member Author

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2024
@machadovilaca
Copy link
Member Author

/test pull-kubevirt-e2e-k8s-1.29-sig-monitoring

@machadovilaca
Copy link
Member Author

/retest

Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machadovilaca Hi, thank you for the PR!
can you please check the file tests/infrastructure/prometheus.go and see if there are no duplicates? if you can arrange all the monitoring e2e under one directory it could be great.

@machadovilaca
Copy link
Member Author

@machadovilaca Hi, thank you for the PR! can you please check the file tests/infrastructure/prometheus.go and see if there are no duplicates? if you can arrange all the monitoring e2e under one directory it could be great.

@enp0s3 These tests are a bit different, on the tests/infrastructure/prometheus.go they are testing 2 things:

  • Prometheus endpoints are working and throttling requests
  • Each virt-* component, is correctly exposing metrics through their internal endpoints

This new test added validates that:

  • All metrics registered in the virt-* components are present in Prometheus

They seem to intent to test different things

Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machadovilaca Hi. TBH I think we can cover these scenarios with unit tests. Please correct me if I am wrong.

tests/monitoring/metrics.go Outdated Show resolved Hide resolved
tests/monitoring/metrics.go Outdated Show resolved Hide resolved
tests/monitoring/metrics.go Outdated Show resolved Hide resolved
}
})

Context("perfscale histogram metrics", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we cover it with unit test instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would only test that they are registered in the component, not that they are correctly set up, and being exposed correctly
and since we already fetched the list of KubeVirt metrics, its just a simple validation

@machadovilaca
Copy link
Member Author

/test pull-kubevirt-e2e-k8s-1.29-sig-monitoring

@machadovilaca
Copy link
Member Author

/retest

tests/monitoring/metrics.go Outdated Show resolved Hide resolved
_, err := virtClient.VirtualMachine(vm.Namespace).Create(context.Background(), vm)
Expect(err).ToNot(HaveOccurred())

Eventually(func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a VM in this test? why not VMI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to ensure for example kubevirt_vm_created_total exists

tests/monitoring/metrics.go Outdated Show resolved Hide resolved
tests/libmonitoring/metric_matcher.go Outdated Show resolved Hide resolved
tests/libmonitoring/metric_matcher.go Outdated Show resolved Hide resolved
@machadovilaca
Copy link
Member Author

/test pull-kubevirt-e2e-k8s-1.29-sig-monitoring

Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machadovilaca Maybe I am missing something here but the only added value of testing these metrics in e2e is because we bring up a real Prometheus endpoint and query it with web client. But why do we need to test the registration and the query? these are generic Prometheus functionalities IMO.

I think we can test the metrics in unit tests, by creating a fake VM objs, load them into informers and call the relevant functions to observe a particular metric value.
It will be even better since monitoring e2e is only running upon monitoring related files change, it won't run when someone changed VM lifecycle code that can affect metrics.

tests/monitoring/metrics.go Show resolved Hide resolved
@machadovilaca
Copy link
Member Author

@machadovilaca Maybe I am missing something here but the only added value of testing these metrics in e2e is because we bring up a real Prometheus endpoint and query it with web client. But why do we need to test the registration and the query? these are generic Prometheus functionalities IMO.

I think we can test the metrics in unit tests, by creating a fake VM objs, load them into informers and call the relevant functions to observe a particular metric value. It will be even better since monitoring e2e is only running upon monitoring related files change, it won't run when someone changed VM lifecycle code that can affect metrics.

this tests that the metrics are being properly registered by the components
yes, prometheus client tests that if they are registered they will be served
but this ensures that the kubevirt components are actually registering those metrics
and in the case of collectors, that those collectors are being used and pushing all metrics
because of the way collectors work, by pushing a "custom metric", those metrics are not initialized by default and may happen that some of them are not used

@kubevirt-bot kubevirt-bot added the sig/buildsystem Denotes an issue or PR that relates to changes in the build system. label Mar 18, 2024
@machadovilaca
Copy link
Member Author

/test pull-kubevirt-e2e-k8s-1.29-sig-monitoring

return msg
}

func (matcher *MetricMatcher) NegatedFailureMessage(actual interface{}) (message string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need both functions? FailureMessage and NegatedFailureMessage?
seems like they do the same thing except the msg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a implementation of a gomega interface, so it's mandatory to have both, to handle 2 scenarios

FailureMessage is used when you expect to find something and it is not present, for example, expected [1,2] to contain 3

NegatedFailureMessage is used when you expect to not find something, but it is there, for example, expected [A,B] not to contain A

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks!

"strconv"
"time"

. "github.com/onsi/ginkgo/v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please sort the import to groups, I see onsi here and in the second group as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it makes sense to keep them apart, these 2 are dot imports, and we have this structure on all tests, and are always splitted from the rest

even github.com/onsi/gomega/types being from the same origin I would keep it as is, since is a qualified identifier and is just used for the return type of the matcher

Copy link
Contributor

@fossedihelm fossedihelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @machadovilaca!
Some comments below!

@@ -79,7 +79,7 @@ func WaitForMetricValueWithLabels(client kubecli.KubevirtClient, metric string,
return -1
}
return i
}, 3*time.Minute, 1*time.Second).Should(BeNumerically("==", expectedValue))
}, 5*time.Minute, 1*time.Second).Should(BeNumerically("==", expectedValue))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this increase? Have we seen some flakiness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, reverted

Comment on lines 60 to 62
if matcher.Labels == nil {
return true, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can drop this, since iterate over nil map is a non-op

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 👌


nameToMatch := matcher.Metric.GetOpts().Name
if matcher.Metric.GetType() == operatormetrics.HistogramType || matcher.Metric.GetType() == operatormetrics.HistogramVecType {
nameToMatch = nameToMatch + prometheusHistogramBucketPrefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is prometheusHistogramBucketPrefix a prefix or a suffix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suffix yes, updated the name

@@ -0,0 +1,139 @@
package monitoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing header

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -0,0 +1,75 @@
package libmonitoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing header

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

)

var _ = Describe("[sig-monitoring]Metrics", decorators.SigMonitoring, func() {
var err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using global err variable. It can generate flakiness. Furthermore, it is never cleaned after each test execution, so the risk is real.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, removed

}

func createRunningVM(virtClient kubecli.KubevirtClient) *v1.VirtualMachine {
vmi := libvmifact.NewGuestless(libvmi.WithNamespace(testsuite.GetTestNamespace(nil)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to specify libvmi.WithNamespace since the the Create already has its own

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 [FAILED] Unexpected error:
      <*errors.errorString | 0xc002857c80>: 
      an empty namespace may not be set when a resource name is provided
      {
          s: "an empty namespace may not be set when a resource name is provided",
      }
  occurred
  In [BeforeEach] at: tests/libwait/wait.go:76 @ 03/26/24 11:04:46.857

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, the libvmi.wait needs it.
Thank you!

Comment on lines +127 to +147
Eventually(func() bool {
vm, err := virtClient.VirtualMachine(testsuite.GetTestNamespace(vm)).Get(context.Background(), vm.Name, &metav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())
return vm.Status.Ready
}, 300*time.Second, 1*time.Second).Should(BeTrue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the libwait.WaitForSuccessfulVMIStart(vmi) enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and perform a vm.Get after it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VMI still takes a few seconds to be created after you create the VM. And WaitForSuccessfulVMIStart just waits for the status, but if it does not find the VMI it exits with an error

Copy link
Contributor

@fossedihelm fossedihelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @machadovilaca Can you please squash the commits into a single one?
Also, the headers are not compliant with the others.
Thank you!

Comment on lines 1 to 19
/*
Copyright The KubeVirt Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
Copyright The KubeVirt Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
* This file is part of the KubeVirt project
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Copyright The Kubevirt Authors
*
*/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the template from hack/boilerplate/boilerplate.go.txt
not supposed to use that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, in all the code we are using a different format for the non-generated code.
I would use my suggestion while we will decide something.
Thank you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds fine by me, updated!

@fossedihelm
Copy link
Contributor

@enp0s3 any chance to get another review here? Thanks

}

It("should contain virt components metrics", func() {
metrics = fetchPrometheusMetrics(virtClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for splitted review! Do we need this? this is already performed in BeforeEach block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries
nice catch, I think its a leftover of before I added the 2nd test

Signed-off-by: João Vilaça <jvilaca@redhat.com>
@fossedihelm
Copy link
Contributor

Nothing else from my side. Thank you for your patience!
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2024
Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enp0s3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 3, 2024
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-windows2016
/test pull-kubevirt-e2e-kind-1.27-vgpu
/test pull-kubevirt-e2e-kind-sriov
/test pull-kubevirt-e2e-k8s-1.29-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-storage
/test pull-kubevirt-e2e-k8s-1.27-sig-compute
/test pull-kubevirt-e2e-k8s-1.27-sig-operator
/test pull-kubevirt-e2e-k8s-1.28-sig-network
/test pull-kubevirt-e2e-k8s-1.28-sig-storage
/test pull-kubevirt-e2e-k8s-1.28-sig-compute
/test pull-kubevirt-e2e-k8s-1.28-sig-operator

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Apr 4, 2024

@machadovilaca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-arm64 719c9f6 link false /test pull-kubevirt-e2e-arm64
pull-kubevirt-conformance-arm64 719c9f6 link false /test pull-kubevirt-conformance-arm64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit f4aad6e into kubevirt:main Apr 4, 2024
37 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/monitoring dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/buildsystem Denotes an issue or PR that relates to changes in the build system. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants