Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Offline Pods increase node cpuload, which affects online Pods #1903

Open
yangchuan37326 opened this issue Feb 19, 2024 · 2 comments
Labels
kind/question Support request or question relating to Koordinator lifecycle/stale

Comments

@yangchuan37326
Copy link

What happened:
离线pod运行过程中造成整机的cpuLoad升高,从而影响在线服务质量。
使用场景及复现方法:
1、离线pod中运行着nodeManager
2、NM负责运行调度到自身的离线任务。
3、由于评估离线任务预期资源量过低,造成调度到的NM实际cpu不满足离线任务的cpu需求数量。或者离线任务本身消耗大量cpu而分配的batch_cpu不满足。最终会导致整机的cpu负载不断升高,进而影响在线服务质量。

What you expected to happen:
期望能够根据离线pod cpu负载进行驱逐。
条件一、整机cpu负载超过cpu数量时(或超过多少百分比),进行离线pod-cpu负载扫描
条件二、离线pod-cpu负载超过 分配的batch-cpu limit数量,则进行驱逐该离线pod

Environment:

  • Koordinator version: - v1.3
  • Kubernetes version (use kubectl version): v1.24.13
  • docker/containerd version: containerd 1.622
  • OS (e.g: cat /etc/os-release): Anolis OS 8.2
  • Kernel (e.g. uname -a): 4.19.91-27.2.an8.x86_64

Anything else we need to know:

@yangchuan37326 yangchuan37326 added the kind/question Support request or question relating to Koordinator label Feb 19, 2024
@zwzhang0107
Copy link
Contributor

load高是结果而不是原因,此时可能有资源竞争干扰,也可能没有。
对于cpu层面的资源竞争,可以关注cpu qos和cpu suppress策略,这两个策略目前也适配了yarn场景的最佳实践:#1727
对于驱逐策略,目前策略考虑的范围有:内存水位和cpu满足度,目前还尚未支持对yarn task的精细化驱逐,欢迎参与共建。

Copy link

stale bot commented May 20, 2024

This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close
    Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Support request or question relating to Koordinator lifecycle/stale
Projects
None yet
Development

No branches or pull requests

2 participants