Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While starting VM with 'considerlasthost' enabled, don't load host tags/details for the last host when it doesn't exist #9037

Draft
wants to merge 1 commit into
base: 4.18
Choose a base branch
from

Conversation

sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented May 3, 2024

Description

This PR doesn't load host tags/details for the last host when it doesn't exist, while starting VM with 'considerlasthost' enabled.

Fixes #9033

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

…/details for the last host when it doesn't exist
@sureshanaparti sureshanaparti changed the title While starting VM with considerlasthost enabled, don't load host tags/details for the last host when it doesn't exist While starting VM with 'considerlasthost' enabled, don't load host tags/details for the last host when it doesn't exist May 3, 2024
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@weizhouapache
Copy link
Member

@sureshanaparti
#7214 has refactored this code snippet in main/4.20 branch
we can consider the backport to 4.18/4.19, and fix the issue like

     private DeployDestination deployInVmLastHost(VirtualMachineProfile vmProfile, DeploymentPlan plan, ExcludeList avoids,
             DeploymentPlanner planner, VirtualMachine vm, DataCenter dc, ServiceOffering offering, int cpuRequested, long ramRequested,
             boolean volumesRequireEncryption) throws InsufficientServerCapacityException {
         HostVO host = _hostDao.findById(vm.getLastHostId());
+        if (host == null) {
+            logger.warn(String.format("The last host [id=%s] does not exist, it may have been removed.", vm.getLastHostId()));
+            return null;
+        }
         _hostDao.loadHostTags(host);
         _hostDao.loadDetails(host);

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 54.09836% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 12.24%. Comparing base (5c9d79e) to head (d242123).

Files Patch % Lines
...om/cloud/deploy/DeploymentPlanningManagerImpl.java 54.09% 13 Missing and 15 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##               4.18    #9037     +/-   ##
===========================================
  Coverage     12.24%   12.24%             
- Complexity     9291     9293      +2     
===========================================
  Files          4698     4698             
  Lines        414259   414259             
  Branches      52267    50777   -1490     
===========================================
+ Hits          50707    50719     +12     
+ Misses       357251   357237     -14     
- Partials       6301     6303      +2     
Flag Coverage Δ
unittests 12.24% <54.09%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9526

Comment on lines +422 to +440
_hostDao.loadHostTags(host);
_hostDao.loadDetails(host);
if (avoids.shouldAvoid(host)) {
s_logger.debug("The last host of this VM is in avoid set");
} else if (plan.getClusterId() != null && host.getClusterId() != null
&& !plan.getClusterId().equals(host.getClusterId())) {
s_logger.debug("The last host of this VM cannot be picked as the plan specifies different clusterId: "
+ plan.getClusterId());
} else if (_capacityMgr.checkIfHostReachMaxGuestLimit(host)) {
s_logger.debug("The last Host, hostId: " + host.getId() +
" already has max Running VMs(count includes system VMs), skipping this and trying other available hosts");
} else if ((offeringDetails = _serviceOfferingDetailsDao.findDetail(offering.getId(), GPU.Keys.vgpuType.toString())) != null) {
ServiceOfferingDetailsVO groupName = _serviceOfferingDetailsDao.findDetail(offering.getId(), GPU.Keys.pciDevice.toString());
if(!_resourceMgr.isGPUDeviceAvailable(host.getId(), groupName.getValue(), offeringDetails.getValue())){
s_logger.debug("The last host of this VM does not have required GPU devices available");
}
} else if (volumesRequireEncryption && !Boolean.parseBoolean(host.getDetail(Host.HOST_VOLUME_ENCRYPTION))) {
s_logger.warn(String.format("The last host of this VM %s does not support volume encryption, which is required by this VM.", host));
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to load tags/details only for non-null host. Should we also consider refactoring some of the checks in separate methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwstppr there was some refactoring done on this code, in PR #7214 , will see if any further refactoring required (if so, that will be targeted to main, on a separate PR)

@sureshanaparti
Copy link
Contributor Author

@sureshanaparti #7214 has refactored this code snippet in main/4.20 branch we can consider the backport to 4.18/4.19, and fix the issue like

     private DeployDestination deployInVmLastHost(VirtualMachineProfile vmProfile, DeploymentPlan plan, ExcludeList avoids,
             DeploymentPlanner planner, VirtualMachine vm, DataCenter dc, ServiceOffering offering, int cpuRequested, long ramRequested,
             boolean volumesRequireEncryption) throws InsufficientServerCapacityException {
         HostVO host = _hostDao.findById(vm.getLastHostId());
+        if (host == null) {
+            logger.warn(String.format("The last host [id=%s] does not exist, it may have been removed.", vm.getLastHostId()));
+            return null;
+        }
         _hostDao.loadHostTags(host);
         _hostDao.loadDetails(host);

@weizhouapache I think, better to fix this issue (no backport for that refactored code). PR #7214 also doesn't fix load tags issue, needs to be fixed here - https://github.com/apache/cloudstack/pull/7214/files#r1593867688

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@weizhouapache
Copy link
Member

@sureshanaparti #7214 has refactored this code snippet in main/4.20 branch we can consider the backport to 4.18/4.19, and fix the issue like

     private DeployDestination deployInVmLastHost(VirtualMachineProfile vmProfile, DeploymentPlan plan, ExcludeList avoids,
             DeploymentPlanner planner, VirtualMachine vm, DataCenter dc, ServiceOffering offering, int cpuRequested, long ramRequested,
             boolean volumesRequireEncryption) throws InsufficientServerCapacityException {
         HostVO host = _hostDao.findById(vm.getLastHostId());
+        if (host == null) {
+            logger.warn(String.format("The last host [id=%s] does not exist, it may have been removed.", vm.getLastHostId()));
+            return null;
+        }
         _hostDao.loadHostTags(host);
         _hostDao.loadDetails(host);

@weizhouapache I think, better to fix this issue (no backport for that refactored code). PR #7214 also doesn't fix load tags issue, needs to be fixed here - https://github.com/apache/cloudstack/pull/7214/files#r1593867688

@sureshanaparti
right, #7214 does not fix any issue.
if no backport (and then fix), this PR will only be applicable for 4.18/4.19, we need a separated pr for 4.20/main

@sureshanaparti
Copy link
Contributor Author

@sureshanaparti right, #7214 does not fix any issue. if no backport (and then fix), this PR will only be applicable for 4.18/4.19, we need a separated pr for 4.20/main

correct @weizhouapache , separate PR for 4.20/main (will create one)

@weizhouapache
Copy link
Member

@sureshanaparti right, #7214 does not fix any issue. if no backport (and then fix), this PR will only be applicable for 4.18/4.19, we need a separated pr for 4.20/main

correct @weizhouapache , separate PR for 4.20/main (will create one)

ok , it should work @sureshanaparti

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9575

@sureshanaparti
Copy link
Contributor Author

@sureshanaparti right, #7214 does not fix any issue. if no backport (and then fix), this PR will only be applicable for 4.18/4.19, we need a separated pr for 4.20/main

correct @weizhouapache , separate PR for 4.20/main (will create one)

ok , it should work @sureshanaparti

PR for 4.20/main here: #9063

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-10196)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40429 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9037-t10196-kvm-centos7.zip
Smoke tests completed. 110 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti sureshanaparti added this to the 4.19.1.0 milestone May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

6 participants