Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MSHV] [LIVE-MIGRATION] Live migration fails on MSHV with VirtIO watchdog #6201

Open
russell-islam opened this issue Feb 15, 2024 · 0 comments
Assignees
Labels
mshv Affects mshv only

Comments

@russell-islam
Copy link
Contributor

Describe the bug
After the live migration with watchdog enabled, the virtio devices starts resuming, while virtio-net is processing everything it takes longer time, as a result watchdog gets timed out and a reset event is triggers. So in the test case we see dest VM time out and one extra reboot.

To Reproduce
Runt live migration test case with watchdog filer

Version

v37.0
cargo build --release --bin cloud-hypervisor --no-default-features --features "mshv"

VM configuration

sudo -E scripts/dev_cli.sh tests --hypervisor mshv --integration-live-migration -- --test-filter test_live_migration_watchdog

Guest OS version details:

Host OS version details:

Logs

cloud-hypervisor: 48.217084s: INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-net
cloud-hypervisor: 48.217212s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:322 -- MUISLAM: NetEpollHandler event type: 18
cloud-hypervisor: 48.217319s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:408 -- process_rx: start
cloud-hypervisor: 48.217508s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:443 -- process_rx: start end
cloud-hypervisor: 48.217579s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:263 -- handle_rx_tap_event
cloud-hypervisor: 48.217699s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:265 -- Signalling RX queue
cloud-hypervisor: 48.219351s: DEBUG:hypervisor/src/arch/x86/emulator/mod.rs:289 -- Register write: 0x86 to AX
cloud-hypervisor: 48.220109s: DEBUG:hypervisor/src/arch/x86/emulator/mod.rs:289 -- Register write: 0x25 to AX
cloud-hypervisor: 50.166669s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:322 -- MUISLAM: NetEpollHandler event type: 18
cloud-hypervisor: 50.166752s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:408 -- process_rx: start
cloud-hypervisor: 50.166818s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:443 -- process_rx: start end
cloud-hypervisor: 50.166862s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:263 -- handle_rx_tap_event
cloud-hypervisor: 50.166960s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:265 -- Signalling RX queue
cloud-hypervisor: 58.358688s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:322 -- MUISLAM: NetEpollHandler event type: 18
cloud-hypervisor: 58.358777s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:408 -- process_rx: start
cloud-hypervisor: 58.358843s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:443 -- process_rx: start end
cloud-hypervisor: 58.358887s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:263 -- handle_rx_tap_event
cloud-hypervisor: 58.358986s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:265 -- Signalling RX queue
cloud-hypervisor: 63.217011s: <__watchdog> DEBUG:virtio-devices/src/watchdog.rs:143 -- MUISLAM: WatchdogEpollHandler event type: 17
cloud-hypervisor: 75.766692s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:322 -- MUISLAM: NetEpollHandler event type: 18
cloud-hypervisor: 75.766782s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:408 -- process_rx: start
cloud-hypervisor: 75.766867s: <net123_qp0> DEBUG:net_util/src/queue_pair.rs:443 -- process_rx: start end
cloud-hypervisor: 75.766911s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:263 -- handle_rx_tap_event
cloud-hypervisor: 75.767015s: <net123_qp0> DEBUG:virtio-devices/src/net.rs:265 -- Signalling RX queue
cloud-hypervisor: 78.216938s: <__watchdog> DEBUG:virtio-devices/src/watchdog.rs:143 -- MUISLAM: WatchdogEpollHandler event type: 17
cloud-hypervisor: 78.217068s: <__watchdog> ERROR:virtio-devices/src/watchdog.rs:174 -- Watchdog triggered: 30 seconds since last ping
cloud-hypervisor: 78.217281s: INFO:vmm/src/lib.rs:1133 -- VM reset event
cloud-hypervisor: 78.217385s: INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-rng
cloud-hypervisor: 78.217464s: INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-block
cloud-hypervisor: 78.217540s: INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-pmem
cloud-hypervisor: 78.217616s: INFO:virtio-devices/src/watchdog.rs:406 -- Watchdog resumed - enabling timer (every 15 seconds)

Linux kernel output:

@russell-islam russell-islam self-assigned this Feb 15, 2024
@russell-islam russell-islam added the mshv Affects mshv only label Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mshv Affects mshv only
Projects
None yet
Development

No branches or pull requests

1 participant