You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:
After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, the issue is closed
You can:
Mark this issue or PR as fresh with /remove-lifecycle stale
Close this issue or PR with /close
Thank you for your contributions.
What happened:
离线pod运行过程中造成整机的cpuLoad升高,从而影响在线服务质量。
使用场景及复现方法:
1、离线pod中运行着nodeManager
2、NM负责运行调度到自身的离线任务。
3、由于评估离线任务预期资源量过低,造成调度到的NM实际cpu不满足离线任务的cpu需求数量。或者离线任务本身消耗大量cpu而分配的batch_cpu不满足。最终会导致整机的cpu负载不断升高,进而影响在线服务质量。
What you expected to happen:
期望能够根据离线pod cpu负载进行驱逐。
条件一、整机cpu负载超过cpu数量时(或超过多少百分比),进行离线pod-cpu负载扫描
条件二、离线pod-cpu负载超过 分配的batch-cpu limit数量,则进行驱逐该离线pod
Environment:
Anything else we need to know:
The text was updated successfully, but these errors were encountered: