Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora VM hangs on boot #6439

Open
maburlik opened this issue May 3, 2024 · 1 comment
Open

Fedora VM hangs on boot #6439

maburlik opened this issue May 3, 2024 · 1 comment

Comments

@maburlik
Copy link

maburlik commented May 3, 2024

Describe the bug
A clear and concise description of what the bug is.

I have been hitting sporadic boot hang issues for a while when launching back-to-back cloud-hypervisor VMs. This mostly happens with the Fedora 37 base image, and can happen on a fresh boot or reboot.

I've hit this with cloud-hypervisor v35.0, and continue to hit it since recently upgrading to v38.0.

Fresh boot:

cloud-hypervisor logs:
chv.1582604111319321.log

stdout log:
chv.stdout.1582604111319321_2024-04-28T09-33-36.496.log

To Reproduce
Steps to reproduce the behaviour:

Launch 4 VMs, reboot them, tear them down, and recreate back to back in succession x100 iterations.

Version

Output of cloud-hypervisor --version:

cloud-hypervisor v38.0

Did you build from source, if so build command line (e.g. features):
Yes, built from source from tag v38.0

VM configuration

What command line did you run (or JSON config data):

JSON config:

{"cpus":{"boot_vcpus":2,"max_vcpus":2},"memory":{"size":1073741824,"shared":true},"payload":{"kernel":"/home/proj/myagent/_work/_temp/out/release/proj/bin/serv.code/OVMF-CLOUDHV-DEBUG.fd"},"disks":[{"path":"/home/proj/myagent/_work/_temp/out/src/test/e2e_test/test-data/vm_ssh_test_2_fedora/vm_ssh_test_2/fedora_reboot.vm_ssh_test_2.raw"},{"path":"/home/proj/myagent/_work/_temp/out/src/test/e2e_test/test-data/vm_ssh_test_2_fedora/vm_ssh_test_2/cloud-init/nocloud_ds.iso"}],"net":[{"tap":"proj_ptap_6b60","mac":"52:69:6b:5f:d8:ac"}],"fs":[{"tag":"/secrets","socket":"/tmp/serv_sockets/virtiofsd-socket-484"},{"tag":"proj_mount_tag_242","socket":"/tmp/serv_sockets/virtiofsd-socket-485"},{"tag":"proj_mount_tag_243","socket":"/tmp/serv_sockets/virtiofsd-socket-486"},{"tag":"proj_vm_diagnostics","socket":"/tmp/serv_sockets/virtiofsd-socket-487"}],"serial":{"mode":"Tty"},"console":{"mode":"Off"},"vsock":{"cid":124,"socket":"/tmp/serv_sockets/proj-vm-guest-socket-124.vsock"},"watchdog":true}

Guest OS version details:

Fedora 37: https://mirrors.rit.edu/fedora/fedora/linux/releases/37/Cloud/x86_64/images/Fedora-Cloud-Base-37-1.7.x86_64.raw.xz

Host OS version details:

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Logs

Shown above.

@rbradford
Copy link
Member

Hi, thanks for opening this issue - this looks like the virtiofs daemon you are using is not accepting the connection:

cloud-hypervisor: 125.332215s: <_fs6> ERROR:virtio-devices/src/thread_helper.rs:55 -- Error running worker: HandleEvent(failed to reconnect vhost-user backend: IoError(Custom { kind: Other, error: "failed reconnecting vhost-user backend: VhostUserGetFeatures(VhostUserProtocol(SocketBroken(Os { code: 104, kind: ConnectionReset, message: \"Connection reset by peer\" })))" }))
cloud-hypervisor: 125.332844s: <vmm> INFO:vmm/src/lib.rs:1125 -- VM exit event
cloud-hypervisor: 125.332898s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-fs
cloud-hypervisor: 125.332707s: <_fs4> ERROR:virtio-devices/src/thread_helper.rs:55 -- Error running worker: HandleEvent(failed to reconnect vhost-user backend: IoError(Custom { kind: Other, error: "failed reconnecting vhost-user backend: VhostUserGetFeatures(VhostUserProtocol(SocketBroken(Os { code: 104, kind: ConnectionReset, message: \"Connection reset by peer\" })))" }))
cloud-hypervisor: 185.441257s: <_fs3> ERROR:virtio-devices/src/vhost_user/vu_common_ctrl.rs:411 -- Failed connecting the backend after trying for 1 minute: VhostUserProtocol(SocketConnect(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }))
cloud-hypervisor: 185.441584s: <_fs3> ERROR:virtio-devices/src/thread_helper.rs:55 -- Error running worker: HandleEvent(failed to reconnect vhost-user backend: IoError(Custom { kind: Other, error: "failed connecting vhost-user backend VhostUserConnect" }))
cloud-hypervisor: 185.449764s: <_fs5> ERROR:virtio-devices/src/vhost_user/vu_common_ctrl.rs:411 -- Failed connecting the backend after trying for 1 minute: VhostUserProtocol(SocketConnect(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }))
cloud-hypervisor: 185.450342s: <_fs5> ERROR:virtio-devices/src/thread_helper.rs:55 -- Error running worker: HandleEvent(failed to reconnect vhost-user backend: IoError(Custom { kind: Other, error: "failed connecting vhost-user backend VhostUserConnect" }))

Tagging @slp as he knows about virtiofs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants