[feature request] BroadcastJob 完成策略为 Never 时潜在的问题 #1581

yeahx · 2024-04-16T10:43:07Z

参考 bcj 实现时有的一些疑问，没有具体的版本

What would you like to be added:
1、常驻的 bcj 任务，能否添加状态为 Succeeded 的 Pod GC机制

Why is this needed:
1、当 CompletionPolicy 为 Never时，在大规模集群下会长时间存在大量的 Succeeded 的 Pod，这部分是否会存在降低控制面性能的问题
2、如果 Succeeded 数量超过 terminated-pod-gc-threshold 定义，对于 k8s 本身的回收是否会影响 bcj 本身，如被回收后触发协调再次达到期望状态的 pod 数？

ls-2018 · 2024-04-16T12:25:15Z

I have some ideas:
If the succeeded pod record is deleted, where is this record to be kept? And is it in memory or in k8s?
If it's in k8s, using configmap storage, there will be a lot of single-point updates, which should consume more cpu than keeping all the pod records.
I think it would be better to keep it in memory and use some checkpoint mechanism.

yeahx added the kind/feature-request label Apr 16, 2024

yeahx assigned FillZpp Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] BroadcastJob 完成策略为 Never 时潜在的问题 #1581

[feature request] BroadcastJob 完成策略为 Never 时潜在的问题 #1581

yeahx commented Apr 16, 2024

ls-2018 commented Apr 16, 2024

[feature request] BroadcastJob 完成策略为 Never 时潜在的问题 #1581

[feature request] BroadcastJob 完成策略为 Never 时潜在的问题 #1581

Comments

yeahx commented Apr 16, 2024

ls-2018 commented Apr 16, 2024