Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement hypervisor specific pause/resume API #6441

Merged

Conversation

russell-islam
Copy link
Contributor

No description provided.

@russell-islam russell-islam requested a review from a team as a code owner May 3, 2024 21:01
@russell-islam russell-islam requested review from jinankjain, NunoDasNeves and liuw and removed request for a team May 3, 2024 21:01
@@ -390,6 +395,17 @@ pub trait Vm: Send + Sync + Any {
) -> Result<()> {
unimplemented!()
}

/// Freeze the Partiton/VM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer reuse the pause / resume terminology. There is not need for the _vm suffix as this is on the Vm trait.

Please also make it unconditional at build time and simply provide an empty implementation in the trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you refering to rename the function name or implementing the pausable trait? I skipped implementing the pauasable trait it it requires mutable reference to vm. I can easily rename the functions and make it unconditional. Let me know. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the name - I don't expect you to implement the trait.

@russell-islam russell-islam changed the title freeze/unfreeze vm during pause/resume for MSHV Implement hypervisor specific pause/resume API May 7, 2024
vmm/src/vm.rs Outdated
// Unfreeze the vm for MSHV
#[cfg(feature = "mshv")]
if current_state == VmState::Created {
self.vm.resume().map_err(Error::UnfreezeVm)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be MigratableError::Resume ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boot() does not return MigratableError so we can't return.

vmm/src/vm.rs Outdated
@@ -319,6 +319,9 @@ pub enum Error {

#[error("Error injecting NMI")]
ErrorNmi,
#[cfg(feature = "mshv")]
#[error("Error unfreezing the VM: {0}")]
UnfreezeVm(#[source] hypervisor::HypervisorVmError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you renamed freeze/unfreeze to pause/resume maybe we don't need this anymore

/// Failed to set partition property
///
#[error("Failed to set partition property: {0}")]
SetPartitionProperty(#[source] anyhow::Error),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we don't really want to define an error type for all the lower-level things that could go wrong here in the generic code.

IMO, only the mshv hypervisor-related code should know about "partition"s, etc... The generic CLH code can just see this error as 'failed to pause/resume', and the wrapped inner error will contain the additional details from the mshv code - I believe this is the main point of using the anyhow crate.

Also worth noting: we already have a SetPartitionProperty error in hypervisor/src/hypervisor.rs, which is clearly in the wrong place. Maybe we could fix just that, for a start, and clean up any other errors that shouldn't be in the generic code later on in some cleanup patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can cleanup in another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please don't have hypervisor specific errors - if you can avoid it at the top level.

Copy link
Contributor

@NunoDasNeves NunoDasNeves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


/// Pause the VM
#[cfg(feature = "mshv")]
fn pause(&self) -> vm::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the mshv file - is this #[cfg(feature = "mshv")] needed?

/// Failed to set partition property
///
#[error("Failed to set partition property: {0}")]
SetPartitionProperty(#[source] anyhow::Error),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please don't have hypervisor specific errors - if you can avoid it at the top level.

vmm/src/vm.rs Outdated
#[cfg(feature = "mshv")]
if current_state == VmState::Paused {
self.vm.resume().map_err(|e| {
MigratableError::Resume(anyhow!("Could not unfreeze the VM: {}", e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old terminology - unfreeze?

vmm/src/vm.rs Outdated
#[cfg(feature = "mshv")]
self.vm
.pause()
.map_err(|e| MigratableError::Pause(anyhow!("Could not freeze the VM: {}", e)))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto - old terminology.

vmm/src/vm.rs Outdated
@@ -2487,6 +2495,11 @@ impl Pausable for Vm {
self.cpu_manager.lock().unwrap().pause()?;
self.device_manager.lock().unwrap().pause()?;

#[cfg(feature = "mshv")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a default implementation - you don't need to make this conditional - applies to the other changes in this file too.

Add Pause/Resume functions for VM trait. For KVM it
will be empty implementations. For MSHV it needs to freeze
and unfreeze the partition.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implementing pause/Resume API for MSHV.
Here we set/reset the partition property(TIME_FREEZE)

Signed-off-by: Muminul Islam <muislam@microsoft.com>
Create a partition frozen always, then unfreeze the partition
during boot phase or resume phase. We also freeze the
partition during pause event. Time is freeze during the
time between freeze and unfreeze.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
For MSHV we always create frozen partition, so we
resume the VM during boot. Also during pause and resume
VM events we call hypervisor specific API.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
@russell-islam
Copy link
Contributor Author

@rbradford PTAL

Copy link
Member

@jinankjain jinankjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -2495,6 +2507,7 @@ impl Pausable for Vm {

fn resume(&mut self) -> std::result::Result<(), MigratableError> {
event!("vm", "resuming");
let current_state = self.get_state().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can cause a panic - please propagate properly using ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I look at it - handling of this seems inconsistent in this file...

@rbradford rbradford added this pull request to the merge queue May 16, 2024
Merged via the queue into cloud-hypervisor:main with commit 860939d May 16, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants