New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restoring VM fails when using explicitly passed FDs #6286
Comments
Most likely the FD will be invalid when you restore the guest, right? |
Yes, the FDs would be invalid while restore. So, we might have to enlighten CH to take "FDs" during restore operation as well. |
It doesn't quite make sense to me yet. How is the guest created? Did you pass a list of fds while it was created? |
Yes, a list of FDs are passed during guest creation. Following is the sequence of steps.
|
Based on discussion with @jinankjain and inputs from him, here's a solution proposal to fix the issue. Idea is to introduce 2 new things
1. Adding --restore-net FlagThe 2. Introducing the vm.restore-net APIThe
This API should be invoked after a vm.restore API call with False passed to --restore-net in order to restore Net devices from the saved device state. Tap FDs can also be passed via SCM_RIGHTS to this API. The actions performed by the vm.restore-net API are as follows:
Following sequence of operations can be performed to achieve VM Restore with FDs passed explicitly from user.
@jinankjain @liuw @rbradford @likebreath Could you please review this and provide your inputs ? |
@pupacha Thank you for reporting the issue and looking into potential solutions. I agree with the general direction of the proposal, but I don't think we need so many changes of the APIs. I think it would be enough to extend the With that, using invalid FDs is wrong and bug. It impacts both snapshot and live-upgrade/migration, and needs to be fixed. |
Thanks @likebreath for reviewing the proposal. I have explored solution on the same lines as you suggested i.e., to pass FDs somehow to
|
Can we look into if it's viable to pause the VM, remove the network devices, snapshot and then in reverse restore, hotplug with new FDs and then resume - provided e.g the ordering is preserved along with the MAC address this should result in correct behaviour. If this is viable in libvirt then we can disable restoring VMs that have FD backed TAPs. |
@rbradford From what I understand, if we remove the network devices, the devices' state is lost. And hotplug later, would make the VM treat them as new devices. The network devices' state should also be preserved and restored, but with the new FDs. |
Yes, the kernel won't be happy with that because it won't reinitialise the devices - I think adding FDs to vm.restore is the way to go. |
@rbradford Adding FDs directly to |
I think this a fine approach:
|
Unfortunately, this may not work. passing FDs using SCM_RIGHTS will not preserve the FD #. Following above example if the restored cloud-hypervisor already has an FD with ID We can use For this to work, we need to maintain 1-to-1 mapping between interfaces in Libvirt and cloud-hypervisor. This should be possible, if we preserve some private information per domain in Libvirt. |
@pupacha What you propose is what I mean by extending Note that you won't need to specific any net devices that are not backed directly by donated FDs (e.g. the TAP devices associated with these net devices will be opened based on the TAP name).
@praveen-pk We don't need to preserve FD #. What we need is to replace the invalid FDs (deserialized from the VmConfig file) with valid FDs (donated via SCM_RIGHTS). A single |
@likebreath, thanks for the clarification. I was misled by references to FD#s above. Even in the proposed update: is there a need to pass following the above example, of all the FDs received by cloud-hypervisor, first |
@likebreath Right, we can have validation and enforce from CH such that only (and all of) the net devs having backing FDs are passed again to restore. |
I believe you do. Otherwise how would you support to send FDs via CLI of Maybe you are thinking of using an utility program to send the serialization data and FDs directly with SCM_RIGHTS. That would work without an explicit |
@likebreath The CLI for
Yes, for this fix, plan is to support FDs directly with SC_RIGHTS without the |
I think you are mistaken. Please check the CLI parser here [1]. Also our doc has an example of how it is being used here [2]. [1] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/vmm/src/config.rs#L1312-L1315
I still believe you will need the |
The 'NetConfig' may contain FDs which can't be serialized correctly, as FDs can only be donated from another process via a Unix domain socket with `SCM_RIGHTS`. To avoid false use of the serialized FDs, this patch explicitly set 'NetConfig' FDs as invalid for (de)serialization. See: cloud-hypervisor#6286 Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath Cloud-hypervisor has moved away from taking FDs as input to any APIs. Even when [1] #5522
From what I understand, ch-remote does not support |
ch-remote does support this - it uses SCM_RIGHTS under the hood of course: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/src/bin/ch-remote.rs#L392-L402 I think some of the confusion comes from that we're thinking that this only would be supported with |
I wasn't aware of this under the hood SCM_RIGHTS usage from |
@pupacha Glad my message finally got across. To be clear, as I mentioned earlier, we need to the |
The 'NetConfig' may contain FDs which can't be serialized correctly, as FDs can only be donated from another process via a Unix domain socket with `SCM_RIGHTS`. To avoid false use of the serialized FDs, this patch explicitly set 'NetConfig' FDs as invalid for (de)serialization. See: cloud-hypervisor#6286 Signed-off-by: Bo Chen <chen.bo@intel.com>
Thanks @likebreath for the PR on setting FDs as invalid for serialization & deserialization. |
The 'NetConfig' may contain FDs which can't be serialized correctly, as FDs can only be donated from another process via a Unix domain socket with `SCM_RIGHTS`. To avoid false use of the serialized FDs, this patch explicitly set 'NetConfig' FDs as invalid for (de)serialization. See: cloud-hypervisor#6286 Signed-off-by: Bo Chen <chen.bo@intel.com>
The 'NetConfig' may contain FDs which can't be serialized correctly, as FDs can only be donated from another process via a Unix domain socket with `SCM_RIGHTS`. To avoid false use of the serialized FDs, this patch explicitly set 'NetConfig' FDs as invalid for (de)serialization. See: #6286 Signed-off-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot operation. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. Hence, added 2 new parameters 1. net_ids 2. net_fds to 'RestoreConfig'. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot operation. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. Hence, added 2 new parameters 1. net_ids 2. net_fds to 'RestoreConfig'. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Implement exclusive HTTP PutHandler for VmRestore. Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Use vm_action_put_handler_body_with_fds for VmRestore http handler Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds two new fields to 'RestoreConfig' - 1.net_ids 2.net_fds. 'net_ids' is a list of NetConfig id. 'net_fds' is a list of FDs for required NetConfigs. These fds are replaced into the fds field of NetConfig appropriately. Implement 'validate' for RestoreConfig Use vm_action_put_handler_body_with_fds for VmRestore http handler Allow net FDs to be sent along with 'restore' in ch-remote Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. Also, implement 'validate' fn for RestoreConfig Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. Also, implement 'validate' fn for RestoreConfig Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. Also, implement 'validate' fn for RestoreConfig Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: #6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: #6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: #6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes: #6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: cloud-hypervisor#6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes cloud-hypervisor#6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM creation, are marked as invalid during snapshot. See: #6332. So, Restore should support input for the new net FDs. This patch adds new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new field are replaced into the 'fds' field of NetConfig appropriately. The 'validate()' function ensures all net devices from 'VmConfig' backed by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and expected number of FDs. The unit tests provide different inputs to parse and validate functions to make sure parsing and error handling is as per expectation. Fixes #6286 Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com> Co-authored-by: Bo Chen <chen.bo@intel.com>
When a cloud-hypervisor VM, that is created with tapFDs explicitly passed via socket as CMSG (SCM_RIGHTS), is snapshot and restored, it fails.
This is because the config that
vm.snapshot
generates (config.json) contains the FDs list in it. And whenvm.restore
is called (with the same directory path used during forvm.snapshot
), CH reads this config.json and fails with following error message as those FDs no longer exist.Snapshot's config.json ("net" content)
This was discovered when I was implementing libvirt's save/restore features for ch driver at https://github.com/pupacha/libvirt/tree/ch/save_restore_basic
The text was updated successfully, but these errors were encountered: