[question] batch resource calculation fluctuation #1906

j4ckstraw · 2024-02-20T09:08:17Z

          why use Request if has no metric, how about skip?

Originally posted by @j4ckstraw in #1559 (comment)

The text was updated successfully, but these errors were encountered:

j4ckstraw · 2024-02-20T09:25:01Z

We observed a steep drop of batch-cpu allocatable.

metric koordlet_node_resource_allocatable{resource="kubernetes.io/batch-cpu",node=~"$node"}/1000

One pod with 10cores normal cpu requested scheduled on the node at the time of problem, and batch-cpu usage had no significant changes.
After investigation, I thought it's related to batch resource allocatable calculation.

j4ckstraw · 2024-02-20T09:26:22Z

Here's my question: why add up HPUsed with pod request if no metric found, how about just skip it?

saintube · 2024-02-20T12:34:01Z

Here's my question: why add up HPUsed with pod request if no metric found, how about just skip it?

@j4ckstraw If the HP pod has no metric found but can show in the PodList (e.g. pod is newly created), it should not cause a steep drop of batch allocatable since the HPRequest also increases with the request of the HP pod. The drop could be due to an HP pod having metric but not showing in the PodList (e.g. pod is deleted). We can skip the pod request if it is deleted, but cannot assure if it is dangling and keeps running on the node.

saintube · 2024-02-21T06:30:40Z

As discussed with @j4ckstraw offline, the current calculation formula does not consider the HP Request when calculatePolicy="usage", so the steep drop issue does exist.
Furthermore, this fluctuation can cause unexpected eviction when we are also using the BECPUEvict strategy with policy="evictByAllocatable". That is the real concern from @j4ckstraw.
However, IMO, the batch allocatable's decreasing when a new HP pod is newly created could help mitigate the problem where too many batch pods are just scheduled at that time. So the issues can be separately resolved by the following:

To reduce the unexpected eviction in the BECPUEvict strategy, which is the real problem, add the calculation logic of batch allocatable in the koordlet (refer to BECPUSuppress), and the QoS plugins (e.g. BECPUEvict) should retrieve this real-time result instead of look up the node.status.allocatable since it is always lagged.
To smooth the batch allocatable calculation of the slo-controller, add parameters podWarmupDurationSecondsand podWarmupReclaimPercent in the ColocationStrategy for the pod warm-up/cold-start cases, which can adjust the weights of the usage who has no reported pod metric or the pod is just starting with inaccurate metrics, differing to the long-time metrics. e.g. podWarmupReclaimPercent=0 to ignore the missing-metric pods.

saintube added area/koord-manager kind/question Support request or question relating to Koordinator labels Feb 20, 2024

saintube added the area/koordlet label Feb 21, 2024

saintube changed the title ~~[question] batch resource calculation~~ [question] batch resource calculation fluctuation Feb 21, 2024

ZiMengSheng added this to the v1.6 milestone May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] batch resource calculation fluctuation #1906

[question] batch resource calculation fluctuation #1906

j4ckstraw commented Feb 20, 2024

j4ckstraw commented Feb 20, 2024

j4ckstraw commented Feb 20, 2024

saintube commented Feb 20, 2024

saintube commented Feb 21, 2024 •

edited

[question] batch resource calculation fluctuation #1906

[question] batch resource calculation fluctuation #1906

Comments

j4ckstraw commented Feb 20, 2024

j4ckstraw commented Feb 20, 2024

j4ckstraw commented Feb 20, 2024

saintube commented Feb 20, 2024

saintube commented Feb 21, 2024 • edited

saintube commented Feb 21, 2024 •

edited