Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Cloud Hypervisor v37.0 (LTS) #8695

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

likebreath
Copy link
Contributor

This release has been tracked in our roadmap project as iteration
v37.0. The following user visible changes have been made:

Long Term Support (LTS) Release

This release is a LTS release. Point releases for bug fixes will be made
for the next 18 months; live migration and live upgrade will be
supported between the point releases of the LTS.

Multiple PCI segments Support for 32-bit VFIO Devices

Now VFIO devices with 32-bit memory BARs can be attached to non-zero PCI
segments on the guest, allowing users to have more 32-bit devices and
assign such devices to appropriate NUMA nodes for better performance.

Configurable Named TAP Devices

Named TAP devices now accepts IP configuration from users, such as IP
and MAC address, as long as the named TAP device is created by Cloud
Hypervisor (e.g. not existing TAP devices).

TTY Output from Both Serial Device and Virtio Console

Now legacy serial device and virtio console can be set as TTY mode as
the same time. This allows users to capture early boot logs with the
legacy serial device without losing performance benefits of using
virtio-console, when appropriate kernel configuration is used (such as
using kernel command-line console=hvc0 earlyprintk=ttyS0 on x86).

Faster VM Restoration from Snapshots

The speed of VM restoration from snapshots is improved with a better
implementation of deserializing JSON files.

Notable Bug Fixes

  • Fix aio backend behavior for block devices when writeback cache
    disabled
  • Fix PvPanic device PCI BAR alignment
  • Bug fix to OpenAPI specification file
  • Error out early for live migration when TDX is enabled

Fixes: #8694

@katacontainersbot katacontainersbot added the size/large Task of significant size label Dec 18, 2023
@likebreath
Copy link
Contributor Author

Given it is a LTS release, please let me know if you want me to back-port to our stable branch. Thanks.

@likebreath
Copy link
Contributor Author

/test

@amshinde
Copy link
Member

@likebreath Unless there are some critical security fixes that have gone in as well, I would not backport a version update to a past release. If you feel there are some security fixes that are worth backporting, then let us know.

@skaegi
Copy link
Contributor

skaegi commented Dec 18, 2023

We would very much also like this back-ported to 3.2.x as we are already in-process to do this in our own tree and intending to follow the LTS release.

@likebreath
Copy link
Contributor Author

@amshinde @skaegi Thanks for the quick response. Sounds like backporting only to stable-3.2 will the best way to go. We can do that together after landing this one.

@likebreath
Copy link
Contributor Author

likebreath commented Dec 18, 2023

I believe there are some unrelated failures (say clh/qemu-tracing/metrics, etc), while the error from worker run-nerdctl-tests (cloud-hypervisor) [1] looks to be a real catch (which might also be the reason for other worker failures). The reported error is:

time="2023-12-18T22:31:35Z" level=warning msg="cannot set cgroup manager to \"systemd\" for runtime \"io.containerd.kata-cloud-hypervisor.v2\""
time="2023-12-18T22:31:36Z" level=fatal msg="failed to create shim task: Others(\"failed to handle message try init runtime instance\\n\\nCaused by:\\n    0: init runtime handler\\n    1: start sandbox\\n    2: set up device after start vm\\n    3: failed to set up network\\n    4: setup network\\n    5: attach\\n    6: do handle network Veth endpoint device failed.\\n    7: failed to add deivce\\n    8: add network device.\\n    9: Server responded with an error: InternalServerError: ApiError(VmAddNet(DeviceManager(CreateVirtioNet(OpenTap(TapSetNetmask(IoctlError(35100, Os { code: 99, kind: AddrNotAvailable, message: \\\"Address not available\\\" })))))))\"): unknown"

I could be related to cloud-hypervisor/cloud-hypervisor#5924, which changed the behavior of using a named tap device with Cloud Hypervisor. For example, with --net tap=newTap,ip=,mac=,mask=, the new release v37.0 Cloud Hypervisor will create the newTap TAP device on the host and configure the TAP device with default ip (192.168.249.1), mac (random value), mask (255.255.255.0), while the original behavior is leaving the TAP device unconfigured.

Will such behavior change from Cloud Hypervisor cause the test failure above?

@amshinde
Copy link
Member

@likebreath Thats the failure seen with the rust runtime. Looks like tests are failing with the go runtime as well: https://github.com/kata-containers/kata-containers/actions/runs/7252876624/job/19796037426?pr=8695

time="2023-12-18T22:31:10Z" level=fatal msg="failed to create shim task: \"update interface: Link not found (Address: 1e:f3:1f:68:b7:4c)\": unknown"

With the go runtime, we open the the tap interface and pass the file descriptor to clh as seen here:
https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/clh.go#L184
With the rust runtime, we pass the tap interface name instead. (Plans to add multi-queue support and passing the fd eventually are there, but we will get to this in the future).
Will clh try to create another tap if its already existing in that case? We also rely on the hypervisor not changing the mac address, as the kata-agent uses the mac-address to identify the network device inside the guest and configure its name and IP address later.

@likebreath
Copy link
Contributor Author

@likebreath Thats the failure seen with the rust runtime. Looks like tests are failing with the go runtime as well: https://github.com/kata-containers/kata-containers/actions/runs/7252876624/job/19796037426?pr=8695

time="2023-12-18T22:31:10Z" level=fatal msg="failed to create shim task: \"update interface: Link not found (Address: 1e:f3:1f:68:b7:4c)\": unknown"

With the go runtime, we open the the tap interface and pass the file descriptor to clh as seen here: https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/clh.go#L184 With the rust runtime, we pass the tap interface name instead. (Plans to add multi-queue support and passing the fd eventually are there, but we will get to this in the future).

Thank you for the context. For the case where a fd is used for creating a vitio-net device, there is no behavior change from Cloud Hypervisor side. So for the case with go-lang runtime, the error is caused by something else. Do you have any thoughts?

Will clh try to create another tap if its already existing in that case? We also rely on the hypervisor not changing the mac address, as the kata-agent uses the mac-address to identify the network device inside the guest and configure its name and IP address later.

No, Cloud Hypervisor won't create another tap device if the given tap device already exists based on the input tap device name see code here [1]. Does kata create the tap device and configure its MAC for runtime-rs? If that's the case, runtime-rs should not see any behavior change from Cloud Hypervisor either..

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/24f384d2397a93ca32b7efcda2105e67bdac7b3c/net_util/src/open_tap.rs#L76-L78

@amshinde
Copy link
Member

Does kata create the tap device and configure its MAC for runtime-rs?

Yes, Kata creates the tap device as seen here:
https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/network_linux.go#L829

It then configures the mac address as seen here:
https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/network_linux.go#L858

I ran the CI again to see if this was a one-off error due to a race condition, but the CI seems to fail consistently on not being able to find the mac address. SO maybe clh is updating the mac address at some point.
Will have to reproduce this locally to see if thats the case.

@likebreath
Copy link
Contributor Author

Does kata create the tap device and configure its MAC for runtime-rs?

Yes, Kata creates the tap device as seen here: https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/network_linux.go#L829

It then configures the mac address as seen here: https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/network_linux.go#L858

I ran the CI again to see if this was a one-off error due to a race condition, but the CI seems to fail consistently on not being able to find the mac address. SO maybe clh is updating the mac address at some point. Will have to reproduce this locally to see if thats the case.

Agree. Would you please help setup an environment to reproduce the error locally so that we can look into it in details? Thanks a lot. @amshinde

@likebreath
Copy link
Contributor Author

/test

@likebreath
Copy link
Contributor Author

For future references, @amshinde and myself have been following this issue recently, and here is a summary of this long standing PR. There are essentially two different issues involved:

  1. Issue runtime-rs: ch: runtime crashes with Docker when creating network tap device with newer Cloud Hypervisor versions #9254 that is related to this PR and impacts only rust-runtime (was fixed via runtime-rs: ch: Provide valid default value for NetConfig #9295);
  2. Issue nerdctl tests not working with cloud hypervisor runtime-rs #8831 that is a totally separate issue (nothing to do with the changes here) that both impacts golang and rust runtime;

With that, this PR is ready for review and to be landed. Note that the following CI failures (non-required jobs) are not related to the changes here:

 run-k8s-tests (qemu, kubeadm)
 run-k8s-tests-on-sev (qemu-sev, nydus, guest-pull)
 run-k8s-tests-on-tdx (qemu-tdx, nydus, guest-pull)
 run-k8s-tests-sev-snp (qemu-snp, nydus, guest-pull)
 run-monitor (qemu, containerd)

@likebreath
Copy link
Contributor Author

likebreath commented Apr 24, 2024

@GabyCT @amshinde Not sure why the Jenkins based CI workers are not running after being triggered for 2 hours. Would you please help and take a look? Thanks.

Looks like it is a matter of availability of the underlining aarch64 system that was back locked. I guess it just need that much time to pick up the job and run it.

@skaegi
Copy link
Contributor

skaegi commented Apr 24, 2024

We've been using this in our kata release since December. Would suggest bumping this to 37.1 though for a few fixes.

@likebreath
Copy link
Contributor Author

We've been using this in our kata release since December. Would suggest bumping this to 37.1 though for a few fixes.

That's the plan, but we will upgrade to the latest release v38.0 (instead of v37.1). I will follow-up with another PR for that.

@likebreath
Copy link
Contributor Author

Rebased the PR to have newly added tests on runtime-rs + CH: #9525

/cc @jodh-intel

/test

Details of this release can be found in ourroadmap project as iteration
v37.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: kata-containers#8694

Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch re-generates the client code for Cloud Hypervisor v37.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: kata-containers#8694

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath
Copy link
Contributor Author

Rebased to include #9562 for fixing unrelated CI worker failures.

@likebreath
Copy link
Contributor Author

/test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test size/huge Largest and most complex task (probably needs breaking into small pieces)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to Cloud Hypervisor v37.0
5 participants