Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Topology Aware Plugin #3388

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lowang-bh
Copy link
Member

@lowang-bh lowang-bh commented Apr 6, 2024

/kind feature
fixes #2984
fixes #447
fixes #3317
There are several issues request this feature, such as #447 #2984 #3317

Motivation

We target to make scheduler net-topology aware so as to achieve the following:

  • best effort to schedule same job to same topology devices, such as same idc.

Goals

  • Support single key topology configuration, try to schedule job's all tasks to nodes which have same value with that key
  • Support multiple-key topology policies, the key at front get higher score

Non-Goals

  • Not to find the global solutions among nodes with all kind values of that key

@lowang-bh
Copy link
Member Author

/assign @Monokaix @hwdef @william-wang

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign william-wang
You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 7, 2024
@hwdef
Copy link
Member

hwdef commented Apr 8, 2024

we should talk this in weekly meeting.

@Thor-wl
Copy link
Member

Thor-wl commented Apr 11, 2024

Sounds interesting!But it's maybe more complex than the given desgin. For example, network delay varies between different nodes. Also, it varies in different period to the same node. Maybe considering with network performance metrics for this feature will be a good choice. I think we should take a discussion in the community and complete the design first.

@Thor-wl Thor-wl requested review from william-wang and hwdef and removed request for hudson741 and archlitchi April 11, 2024 02:31
@Monokaix
Copy link
Member

@lowang-bh
Copy link
Member Author

Maybe considering with network performance metrics for this feature will be a good choice.

I would like to recommend to treat the network performace metrics as a kind of load, so it's more like a loadaware scheduling.

network delay varies between different nodes

Now it is not considered. This plugin just conserder the physical difference in topology.

@lowang-bh
Copy link
Member Author

lowang-bh commented Apr 11, 2024

https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/networkaware?

I think it is a lite one of that plugin, just consider several physical topology, such as idc, rock, switch, and depend on those labels on nodes.

advantage: more simply to use, just rely on node labels
shortcoming:no bandwidth, no latency, etc.

@Monokaix
Copy link
Member

https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/networkaware?

I think it is a lite one of that plugin, just consider several physical topology, such as idc, rock, switch, and depend on those labels on nodes.

advantage: more simply to use, just rely on node labels shortcoming:no bandwidth, no latency, etc.

We should collect more user cases: )

3. If a node has multiple keys same as the configured list, the first key matching the configured keys has higher score

```go
nodeOrderFn := func(task *api.TaskInfo, node *api.NodeInfo) (float64, error){
Copy link
Member

@Monokaix Monokaix Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the original user demand, job should occupy nodes exclusively, so only nodeorder func seems cannot satisfy the original use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean a predicateFn is need?

@lowang-bh lowang-bh closed this Apr 11, 2024
@lowang-bh lowang-bh reopened this Apr 11, 2024
@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2024
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2024
Signed-off-by: lowang-bh <lhui_wang@163.com>
Signed-off-by: lowang-bh <lhui_wang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
6 participants