Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNV-40881: kubevirt, e2e, add test for advanced multinet #3902

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

qinqon
Copy link
Contributor

@qinqon qinqon commented Apr 17, 2024

What this PR does / why we need it:
This change add a test to start a kubevirt hosted cluster with just secondary interface as primary network and start a companion dnsmasq pod attached to that network that acts as dhcp server, gateway and masquerade. The test only has one worker so this works as expected.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Copy link

netlify bot commented Apr 17, 2024

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit 7e8e3f3
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/6645c05f9d7fe7000856d60c
😎 Deploy Preview https://deploy-preview-3902--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@qinqon
Copy link
Contributor Author

qinqon commented Apr 17, 2024

/area test

@qinqon
Copy link
Contributor Author

qinqon commented Apr 17, 2024

/area e2e

Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

@qinqon: The label(s) area/e2e cannot be applied, because the repository doesn't have them.

In response to this:

/area e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

@qinqon: The label(s) area/test cannot be applied, because the repository doesn't have them.

In response to this:

/area test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from enxebre and hasueki April 17, 2024 13:30
@qinqon
Copy link
Contributor Author

qinqon commented Apr 17, 2024

error performing canary route check	{"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.example-sr92d.apps.cluster-1713337633-ellorent-test.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

@openshift-ci openshift-ci bot added area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Apr 17, 2024
Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qinqon
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach makes sense to me. i left one minor comment

test/e2e/nodepool_kv_advanced_multinet_test.go Outdated Show resolved Hide resolved
@qinqon qinqon changed the title kubevirt, e2e: Add test for advanced multinet CNV-40881: kubevirt, e2e, add test for advanced multinet Apr 18, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 18, 2024

@qinqon: This pull request references CNV-40881 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

What this PR does / why we need it:
This change add a test to start a kubevirt hosted cluster with just secondary interface as primary network and start a companion dnsmasq pod attached to that network that acts as dhcp server, gateway and masquerade. The test only has one worker so this works as expected.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 18, 2024
@qinqon
Copy link
Contributor Author

qinqon commented Apr 18, 2024

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 18, 2024

@qinqon: This pull request references CNV-40881 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "cnv-4.16" instead.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 18, 2024

@qinqon: This pull request references CNV-40881 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "cnv-4.16" instead.

In response to this:

What this PR does / why we need it:
This change add a test to start a kubevirt hosted cluster with just secondary interface as primary network and start a companion dnsmasq pod attached to that network that acts as dhcp server, gateway and masquerade. The test only has one worker so this works as expected.

TODO:

  • The default passthrough ingress service is pointing to the VM but default network cannot access that IP, either make it go over dnsmasq and redirect the https traffic to the VM there or connect the VM to the underlay (not sure if this would work though).

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch 2 times, most recently from 7ebe007 to 46d8373 Compare April 22, 2024 14:55
@openshift-ci openshift-ci bot added the area/cli Indicates the PR includes changes for CLI label Apr 22, 2024
@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch 8 times, most recently from d3a3ca7 to cc1a5f6 Compare April 24, 2024 09:55
@qinqon
Copy link
Contributor Author

qinqon commented May 7, 2024

Before last changes we have a green lane now multiple nodepool tests are failing

 --- FAIL: TestNodePool/Main (0.02s)
        --- SKIP: TestNodePool/Main/TestRollingUpgrade (0.00s)
        --- SKIP: TestNodePool/Main/TestKMSRootVolumeEncryption (0.00s)
        --- SKIP: TestNodePool/Main/TestNodepoolMachineconfigGetsRolledout (0.00s)
        --- SKIP: TestNodePool/Main/TestNodePoolAutoRepair (0.00s)
        --- SKIP: TestNodePool/Main/TestNTOMachineConfigGetsRolledOut (0.00s)
        --- SKIP: TestNodePool/Main/TestNTOMachineConfigAppliedInPlace (0.00s)
        --- FAIL: TestNodePool/Main/KubeVirtNodeAdvancedMultinetTest (2814.02s)
        --- FAIL: TestNodePool/Main/TestNodePoolReplaceUpgrade (2815.09s)
        --- FAIL: TestNodePool/Main/KubeKubeVirtJsonPatchTest (2815.08s)
        --- FAIL: TestNodePool/Main/KubeVirtCacheTest (2820.09s)
        --- FAIL: TestNodePool/Main/TestNTOPerformanceProfile (2820.09s)
        --- FAIL: TestNodePool/Main/TestNodePoolInPlaceUpgrade (2835.08s)
        --- PASS: TestNodePool/Main/KubeVirtNodeSelectorTest (1845.14s)
        --- PASS: TestNodePool/Main/KubeVirtQoSClassGuaranteedTest (730.10s)
        --- PASS: TestNodePool/Main/KubeVirtNodeMultinetTest (485.11s)

hosted cluster is in partial state

=== NAME  TestNodePool/Main/KubeVirtNodeAdvancedMultinetTest
    util.go:340: 
        failed waiting for hostedcluster image rollout: status.version.history[0].state is "Partial", but we want "Completed"
        Unexpected error:
            <wait.errInterrupted>: 
            timed out waiting for the condition
            {
                cause: <*errors.errorString | 0xc000a10a90>{
                    s: "timed out waiting for the condition",
                },
            }
        occurred

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch from c1a6409 to dbcd2de Compare May 7, 2024 07:29
@qinqon
Copy link
Contributor Author

qinqon commented May 7, 2024

Passing locally and isolated

=== NAME  TestNodePool
    hypershift_framework.go:132: skipping teardown, already called
--- PASS: TestNodePool (688.64s)
    --- PASS: TestNodePool/Main (65.00s)
        --- PASS: TestNodePool/Main/KubeVirtNodeAdvancedMultinetTest (617.71s)
PASS
+ popd
~/Documents/cnv/sandbox/aws

@qinqon
Copy link
Contributor Author

qinqon commented May 7, 2024

Running all Kubevirt nodepool in parallel we get the console operator stuck

console                                    4.16.0-0.nightly-2024-05-05-102537   False       False         True       7m      RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.example-tc26l.apps.cluster-1715061648-ellorent-test.devcluster.openshift.com): Get "https://console-openshift-console.apps.example-tc26l.apps.cluster-1715061648-ellorent-test.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

@qinqon
Copy link
Contributor Author

qinqon commented May 7, 2024

It takes 20 seconds for the route to respond

time curl -Lk https://console-openshift-console.apps.example-tc26l.apps.cluster-1715061648-ellorent-test.devcluster.openshift.com
...
real	0m20.591s
user	0m0.020s
sys	0m0.011s

Even removing the endpointslices from from advanced multus we get that latency

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch from dbcd2de to 0f74aa5 Compare May 7, 2024 09:07
@qinqon
Copy link
Contributor Author

qinqon commented May 7, 2024

/retest

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch 3 times, most recently from eb5e0ca to da6ca75 Compare May 8, 2024 10:27
@qinqon
Copy link
Contributor Author

qinqon commented May 8, 2024

Now is possible to run different nodepools at different hosted cluster and those are run in parallel too.

--- PASS: TestNodePool (0.00s)
    --- PASS: TestNodePool/HostedCluster1 (1123.09s)
        --- PASS: TestNodePool/HostedCluster1/Main (45.95s)
            --- PASS: TestNodePool/HostedCluster1/Main/KubeVirtNodeAdvancedMultinetTest (1066.60s)
    --- PASS: TestNodePool/HostedCluster0 (1163.22s)
        --- PASS: TestNodePool/HostedCluster0/Main (48.97s)
            --- PASS: TestNodePool/HostedCluster0/Main/KubeVirtNodeSelectorTest (1097.61s)
            --- PASS: TestNodePool/HostedCluster0/Main/KubeKubeVirtJsonPatchTest (1101.79s)
            --- PASS: TestNodePool/HostedCluster0/Main/KubeVirtNodeMultinetTest (1101.97s)
            --- PASS: TestNodePool/HostedCluster0/Main/KubeVirtQoSClassGuaranteedTest (1101.98s)
            --- PASS: TestNodePool/HostedCluster0/Main/KubeVirtCacheTest (1106.72s)

test/e2e/nodepool_test.go Outdated Show resolved Hide resolved
test/e2e/nodepool_kv_advanced_multinet_test.go Outdated Show resolved Hide resolved
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 8, 2024

@qinqon: This pull request references CNV-40881 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "cnv-4.16" instead.

In response to this:

What this PR does / why we need it:
This change add a test to start a kubevirt hosted cluster with just secondary interface as primary network and start a companion dnsmasq pod attached to that network that acts as dhcp server, gateway and masquerade. The test only has one worker so this works as expected.

TODO:

  • Replace quay.io/coreos/dnsmaq
  • Skip HostedCluster1 is this is not kubevirt (aws lane should not run it)
  • Move dnsmasq and nad back to the hosted cluster namespace and make it privileged.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 9, 2024

@qinqon: This pull request references CNV-40881 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "cnv-4.16" instead.

In response to this:

What this PR does / why we need it:
This change add a test to start a kubevirt hosted cluster with just secondary interface as primary network and start a companion dnsmasq pod attached to that network that acts as dhcp server, gateway and masquerade. The test only has one worker so this works as expected.

TODO:

  • Replace quay.io/coreos/dnsmaq
  • Don't start HotedCluster1 for test not running there.
  • Use nodepool as owner of dnmasq and nad at default namespace.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch 4 times, most recently from 8994f68 to cf9e835 Compare May 9, 2024 09:11
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 9, 2024

@qinqon: This pull request references CNV-40881 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "cnv-4.16" instead.

In response to this:

What this PR does / why we need it:
This change add a test to start a kubevirt hosted cluster with just secondary interface as primary network and start a companion dnsmasq pod attached to that network that acts as dhcp server, gateway and masquerade. The test only has one worker so this works as expected.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch from cf9e835 to 5ce890d Compare May 9, 2024 09:31
@qinqon qinqon requested a review from davidvossel May 9, 2024 09:32
@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch 2 times, most recently from 8417f1c to d6258c1 Compare May 9, 2024 12:30
@stevekuznetsov
Copy link
Contributor

stevekuznetsov commented May 9, 2024

@qinqon re #3902 (comment) ... it's two fields (Compute, RootVolume) to filter, which you can do with a one-liner in cmpopts.IgnoreFields(hyperv1.KubevirtNodePoolPlatform{}, "Compute", "RootVolume"). Did you check what the output looks like when the test fails for real? That is the real issue - when the test fails in a way we expect, the output of Ginkgo is very, very hard to read. The structured diff out of cmp is simple.

I won't force you to but I really think you would benefit from comparing the output of the test when it fails the way you expect between the two approaches.

@qinqon
Copy link
Contributor Author

qinqon commented May 9, 2024

cmpopts.IgnoreFields(hyperv1.KubevirtNodePoolPlatform{}, "Compute", "RootVolume")

This means we have to maintain this in case new fields are added, I think is not worth it.

@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch from d6258c1 to 7e8e3f3 Compare May 16, 2024 08:14
@qinqon
Copy link
Contributor Author

qinqon commented May 16, 2024

/test e2e-kubevirt-aws-ovn

Not related

--- FAIL: TestCreateCluster/Teardown (33.16s)
        --- FAIL: TestCreateCluster/Teardown/ValidateMetricsAreExposed (0.04s)

This change add a test to start a kubevirt hosted cluster with just
secondary interface as primary network and start a companion dnsmasq pod
attached to that network that acts as dhcp server, gateway and masquerade.
The test only has one worker so this works as expected.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
@qinqon qinqon force-pushed the kubevirt-advanced-multus-e2e-tests branch from 7e8e3f3 to 57f6fcc Compare May 20, 2024 07:44
Copy link
Contributor

openshift-ci bot commented May 20, 2024

@qinqon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure 57f6fcc link false /test e2e-azure
ci/prow/e2e-kubevirt-aws-ovn 57f6fcc link true /test e2e-kubevirt-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli Indicates the PR includes changes for CLI area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants