Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: add new pod estimate with loadaware plugin #1992

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zwForrest
Copy link
Contributor

Ⅰ. Describe what this PR does

When scheduling a new pod, it is necessary to consider the resource utilization of the new pod with the loadaware plugin, to prevent the node utilization from becoming too high after the new pod is scheduled.

This can prevent nodes with excessive utilization entering score phase.

The resource utilization for the new pod here is calculated using the original estimated method.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign eahydra after the PR has been reviewed.
You can assign the PR to them by writing /assign @eahydra in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Apr 8, 2024

Codecov Report

Attention: Patch coverage is 78.26087% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 68.75%. Comparing base (2dc8735) to head (8943b2d).
Report is 44 commits behind head on main.

Files Patch % Lines
pkg/scheduler/plugins/loadaware/load_aware.go 78.26% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1992      +/-   ##
==========================================
+ Coverage   67.50%   68.75%   +1.25%     
==========================================
  Files         421      423       +2     
  Lines       47100    39174    -7926     
==========================================
- Hits        31795    26936    -4859     
+ Misses      12990     9909    -3081     
- Partials     2315     2329      +14     
Flag Coverage Δ
unittests 68.75% <78.26%> (+1.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zwForrest zwForrest force-pushed the feat/updateloadaware branch 2 times, most recently from 84026c9 to 8e7711f Compare April 8, 2024 13:02
@hormes
Copy link
Member

hormes commented Apr 28, 2024

What is the status of this issue?

@zwForrest
Copy link
Contributor Author

What is the status of this issue?

The corresponding changes have been updated.

pkg/scheduler/plugins/loadaware/load_aware.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/loadaware/load_aware.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/loadaware/load_aware.go Outdated Show resolved Hide resolved
@eahydra
Copy link
Member

eahydra commented May 7, 2024

I'm sorry I've been away from X for a while.

I reviewed this PR and reviewed the implementation of the loadaware plugin. I noticed that there are many areas that need to be improved in the code I implemented in the past. Especially in this PR, you can see that the logic of filter and score are very similar. We should calculate the usage first, and then decide whether to score or filter. When calculating usage, we can decide whether to use estimated.

These codes should be wrapped as a function that can be used in filter and score stage.

prodPod := extension.GetPodPriorityClassWithDefault(pod) == extension.PriorityProd && p.args.ScoreAccordingProdUsage
podMetrics := buildPodMetricMap(p.podLister, nodeMetric, prodPod)
estimatedUsed, err := p.estimator.EstimatePod(pod)
if err != nil {
return 0, nil
}
assignedPodEstimatedUsed, estimatedPods := p.estimatedAssignedPodUsed(nodeName, nodeMetric, podMetrics, prodPod)
for resourceName, value := range assignedPodEstimatedUsed {
estimatedUsed[resourceName] += value
}
podActualUsages, estimatedPodActualUsages := sumPodUsages(podMetrics, estimatedPods)
if prodPod {
for resourceName, quantity := range podActualUsages {
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
} else {
if nodeMetric.Status.NodeMetric != nil {
var nodeUsage *slov1alpha1.ResourceMap
if scoreWithAggregation(p.args.Aggregated) {
nodeUsage = getTargetAggregatedUsage(nodeMetric, &p.args.Aggregated.ScoreAggregatedDuration, p.args.Aggregated.ScoreAggregationType)
} else {
nodeUsage = &nodeMetric.Status.NodeMetric.NodeUsage
}
if nodeUsage != nil {
for resourceName, quantity := range nodeUsage.ResourceList {
if q := estimatedPodActualUsages[resourceName]; !q.IsZero() {
quantity = quantity.DeepCopy()
if quantity.Cmp(q) >= 0 {
quantity.Sub(q)
}
}
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
}
}
}

@zwForrest
Copy link
Contributor Author

I'm sorry I've been away from X for a while.

I reviewed this PR and reviewed the implementation of the loadaware plugin. I noticed that there are many areas that need to be improved in the code I implemented in the past. Especially in this PR, you can see that the logic of filter and score are very similar. We should calculate the usage first, and then decide whether to score or filter. When calculating usage, we can decide whether to use estimated.

These codes should be wrapped as a function that can be used in filter and score stage.

prodPod := extension.GetPodPriorityClassWithDefault(pod) == extension.PriorityProd && p.args.ScoreAccordingProdUsage
podMetrics := buildPodMetricMap(p.podLister, nodeMetric, prodPod)
estimatedUsed, err := p.estimator.EstimatePod(pod)
if err != nil {
return 0, nil
}
assignedPodEstimatedUsed, estimatedPods := p.estimatedAssignedPodUsed(nodeName, nodeMetric, podMetrics, prodPod)
for resourceName, value := range assignedPodEstimatedUsed {
estimatedUsed[resourceName] += value
}
podActualUsages, estimatedPodActualUsages := sumPodUsages(podMetrics, estimatedPods)
if prodPod {
for resourceName, quantity := range podActualUsages {
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
} else {
if nodeMetric.Status.NodeMetric != nil {
var nodeUsage *slov1alpha1.ResourceMap
if scoreWithAggregation(p.args.Aggregated) {
nodeUsage = getTargetAggregatedUsage(nodeMetric, &p.args.Aggregated.ScoreAggregatedDuration, p.args.Aggregated.ScoreAggregationType)
} else {
nodeUsage = &nodeMetric.Status.NodeMetric.NodeUsage
}
if nodeUsage != nil {
for resourceName, quantity := range nodeUsage.ResourceList {
if q := estimatedPodActualUsages[resourceName]; !q.IsZero() {
quantity = quantity.DeepCopy()
if quantity.Cmp(q) >= 0 {
quantity.Sub(q)
}
}
estimatedUsed[resourceName] += getResourceValue(resourceName, quantity)
}
}
}
}

Yes, I noticed this issue while making the changes. I will further modify this part.

@saintube
Copy link
Member

@zwForrest Thanks for your contributions! Please resolve the conflicts and we will continue the review.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

Attention: Patch coverage is 78.26087% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 68.99%. Comparing base (2dc8735) to head (0a6249c).
Report is 56 commits behind head on main.

Current head 0a6249c differs from pull request most recent head c77ed01

Please upload reports for the commit c77ed01 to get more accurate results.

Files Patch % Lines
pkg/scheduler/plugins/loadaware/load_aware.go 78.26% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1992      +/-   ##
==========================================
+ Coverage   67.50%   68.99%   +1.49%     
==========================================
  Files         421      430       +9     
  Lines       47100    40048    -7052     
==========================================
- Hits        31795    27633    -4162     
+ Misses      12990    10056    -2934     
- Partials     2315     2359      +44     
Flag Coverage Δ
unittests 68.99% <78.26%> (+1.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zwForrest
Copy link
Contributor Author

zwForrest commented May 27, 2024

@zwForrest Thanks for your contributions! Please resolve the conflicts and we will continue the review.

Done, The ci error is caused by other code. @saintube

Signed-off-by: zwForrest <756495135@qq.com>
@saintube
Copy link
Member

saintube commented Jun 7, 2024

image
@zwForrest Please check your logic to pass the UTs.

@@ -140,117 +140,47 @@ func (p *Plugin) Filter(ctx context.Context, state *framework.CycleState, pod *c
}
return framework.NewStatus(framework.Error, err.Error())
}
if nodeMetric.Status.NodeMetric == nil {
klog.Warningf("nodeMetrics(%s) should not be nil.\n", node.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog / glog does not need a tailing \n.
Same below.

continue
}
usage := int64(math.Round(float64(estimatedUsed[resourceName]) / float64(total) * 100))
if usage > value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce the cyclomatic complexity and improve the readability, consider this pattern:

if usage <= value {
  continue
}
// ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants