Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] BroadcastJob 完成策略为 Never 时潜在的问题 #1581

Open
yeahx opened this issue Apr 16, 2024 · 1 comment
Open
Assignees

Comments

@yeahx
Copy link

yeahx commented Apr 16, 2024

参考 bcj 实现时有的一些疑问,没有具体的版本

What would you like to be added:
1、常驻的 bcj 任务,能否添加状态为 Succeeded 的 Pod GC机制

Why is this needed:
1、当 CompletionPolicy 为 Never时,在大规模集群下会长时间存在大量的 Succeeded 的 Pod,这部分是否会存在降低控制面性能的问题
2、如果 Succeeded 数量超过 terminated-pod-gc-threshold 定义,对于 k8s 本身的回收是否会影响 bcj 本身,如被回收后触发协调再次达到期望状态的 pod 数?

@ls-2018
Copy link
Member

ls-2018 commented Apr 16, 2024

I have some ideas:
If the succeeded pod record is deleted, where is this record to be kept? And is it in memory or in k8s?
If it's in k8s, using configmap storage, there will be a lot of single-point updates, which should consume more cpu than keeping all the pod records.
I think it would be better to keep it in memory and use some checkpoint mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants