New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmm: Support passing Net FDs to Restore #6402
Conversation
@pupacha Can you please include an integration test to test this feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs an integration test!
b74a48d
to
62d0f4b
Compare
@jinankjain @rbradford Added integration test. @rbradford could you please approve the PR workflow/checks. I dont see an option to trigger them. |
62d0f4b
to
115ed20
Compare
@pupacha I think some of the commits are missing your SoB. |
95cff5e
to
c5053d0
Compare
My name is too long and |
d6cacc4
to
237d19d
Compare
The commit message length check is advisory only - don't worry about that - we regularly exceed it with long names or copy and pasted errors but it's there to prevent commits from where the body is not wrapped. |
45d4fc5
to
f40c6ff
Compare
Some integration tests fail with "No space left on device (os error 28)". Could anyone please help me with retriggering this failed job https://github.com/cloud-hypervisor/cloud-hypervisor/actions/runs/8802672511/job/24159343723?pr=6402 ? |
f40c6ff
to
b6b0edd
Compare
It seems that the CI is a bit flaky. The integration tests failed a few times, but in different places which don't seem to be related to the changes here. |
Yes, I see different tests failing for different runs. But most often I see |
73d4b69
to
e04b41a
Compare
a549322
to
c15c796
Compare
9ddd3d6
to
7a0b3ac
Compare
I think it should just be one commit moving the tests. |
I commented about it somewhere up in the conversations. Copying it here as well. I went ahead and moved all the snapshot/restore tests to common_sequential. No flakiness is noticed during the test runs. But git does not show the diff as "code chunk removed + code chunk added", it would mess up the diff and blame, see 9ddd3d6
Please let me know if you have any other suggestions on this, @rbradford |
Adds ~ 4 minutes to the run - but maybe we can shave the time off elsewhere (and ultimately if it's more reliable then we won't have to more builds to deal with flaky tests.) As you probably realised we're right at the limit so adding a new test can push the load/resource consumption up such than an unrelated test will start to fail! |
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
Consume FDs passed via SCM_RIGHTs to VmRestore API and assign them appropriately to RestoredNetConfig's fds field. Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Enable restore command the ability to send file descriptors along with HTTP request. This is useful when restoring a VM with explicit FDs passed to NetConfig(s). Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Add a section about restoring VM with new Net FDs explicitly passed to ch-remote via 'net_fds' parameter Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
VM is created with FDs explicitly passed to CH via --net parameter and snapshotted. New net FDs are passed in turn during restore. Boilerplate code from _test_snapshot_restore(). Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
test_snapshot_restore_* tests often have transient failures and add to overall flakiness of the integration testsuite. Hence, remove them from common_parallel. However, these tests need to be added back to common_sequential Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
tests_snapshot_restore* have been earlier removed from common_parallel due to the falkiness they add testsuite. Running them sequentially would eliminate the flakiness. Hence, add the tests back to testsuite but into common_sequential module. Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
@rbradford Just rebased and pushed. |
Thanks @rbradford and @likebreath for the constant feedback and helping in landing these changes. |
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: #6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately.
Implement 'validate' for RestoreConfig
Allow net FDs to be sent along with 'restore' in ch-remote
Add new macro 'vm_action_put_handler_body_with_fds'; to be used by VmRestore and VmAddNet
tests: Add test_snapshot_restore_with_fd to integration tests
tests: Reduce Guest memory in _test_snapshot_restore
vmm: Add unit tests for restore
Fixes: #6286