Scheduler: Observe/Improve event handling throughput #124566

sanposhiho · 2024-04-26T13:57:44Z

/kind feature
/assign
/sig scheduling

We don't have any observability around event handling (= requeueing) throughput. Adding observability can be a monitor for a degradation around it.
Also, based on the metric we add, we can find how fast event handling should be on a certain large scale, like we have 300 pods/s target in scheduling throughput. And then we may or may not need to improve the event handling throughput accordingly.

1. Add an observability for event handling throughput.
2. Benchmark and decide the ideal throughput.
3. Improve the throughput based on (2). (possible actions: simplify slower QHint(s), etc)

Reference

spown from the discussion in Scheduler throughput reduced when many gated pods #124384
general QHint issue: [Umbrella] Implement QueueingHintFn in in-tree plugins #118893

k8s-ci-robot · 2024-04-26T13:57:51Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sanposhiho · 2024-04-26T13:58:19Z

Registered as Beta requirement: QueueingHint.

utam0k · 2024-04-28T05:52:34Z

I'd like to take part in this issue if possible. Is there any good issue for newer ones?

sanposhiho · 2024-04-29T10:16:21Z

@utam0k

Let me dig in first.
Later maybe I'll be able to split it into some tasks that can be done in parallel; then, I'll let you know here.

utam0k · 2024-04-29T10:29:23Z

Sure!

k8s-ci-robot assigned sanposhiho Apr 26, 2024

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Apr 26, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 26, 2024

sanposhiho mentioned this issue Apr 26, 2024

Beta requirement: QueueingHint #122597

Open

This was referenced Apr 26, 2024

Scheduler throughput reduced when many gated pods #124384

Closed

[Umbrella] Implement QueueingHintFn in in-tree plugins #118893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler: Observe/Improve event handling throughput #124566

Scheduler: Observe/Improve event handling throughput #124566

sanposhiho commented Apr 26, 2024

k8s-ci-robot commented Apr 26, 2024

sanposhiho commented Apr 26, 2024

utam0k commented Apr 28, 2024

sanposhiho commented Apr 29, 2024

utam0k commented Apr 29, 2024

Scheduler: Observe/Improve event handling throughput #124566

Scheduler: Observe/Improve event handling throughput #124566

Comments

sanposhiho commented Apr 26, 2024

Reference

k8s-ci-robot commented Apr 26, 2024

sanposhiho commented Apr 26, 2024

utam0k commented Apr 28, 2024

sanposhiho commented Apr 29, 2024

utam0k commented Apr 29, 2024