put back the queue to priority queue after job's resource allocating … #3413

panoswoo · 2024-04-15T03:10:44Z

Resolves #3407

Put back the queue to priority queue after job's resource allocating finished to ensure that the priority of the queue is calculated based on the latest resource allocation situation.

panoswoo · 2024-04-15T03:35:39Z

/retest

volcano-sh-bot · 2024-04-15T03:35:56Z

@panoswoo: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lowang-bh · 2024-04-15T03:36:51Z

Hi, please add some ut to cover it. Thanks

panoswoo · 2024-04-15T09:00:59Z

Hi, please add some ut to cover it. Thanks

Done, but i haven't found a simple way to check the allocate order of tasks, so I made some extensions to TestCommonStruct to enable it to record the actual allocation order. This change is compatible with previous cases.
Looking forward to your suggestions

lowang-bh · 2024-04-15T09:47:52Z

Hi, please add some ut to cover it. Thanks

Done, but i haven't found a simple way to check the allocate order of tasks, so I made some extensions to TestCommonStruct to enable it to record the actual allocation order. This change is compatible with previous cases. Looking forward to your suggestions

Don't need to do that. Just let the whole resource only enough for allocating first few jobs and check the left jobs are pending.

panoswoo · 2024-04-15T11:00:13Z

Hi, please add some ut to cover it. Thanks

Done, but i haven't found a simple way to check the allocate order of tasks, so I made some extensions to TestCommonStruct to enable it to record the actual allocation order. This change is compatible with previous cases. Looking forward to your suggestions

Don't need to do that. Just let the whole resource only enough for allocating first few jobs and check the left jobs are pending.

I have considered doing this before, but due to the same weight for both queues, when I try to fill up all resources with tasks, they will be rejected by the proportion plugin during Overused and Allocable detection. (because we only have two queues, if we want to fill up all resources, must have one queue's resource allocated is more than half).
But I just realized that maybe I can disable the Overused and Allocatable checks directly in the plugin's config.

lowang-bh · 2024-04-16T02:34:26Z

Don't need to do that. Just let the whole resource only enough for allocating first few jobs and check the left jobs are pending.

Add two cases:

total 5cpus, pod-small-1 use 1cpu, then pod-large-2 use 3cpu, and then pod-large-1 pending, indicate that q-2 is allocated first
total 5 cpus, pod-small-1 use 1cpu, then pod-large-2 use 2cpu, and then pod-large-1 use 2 cpus, pod-small-2 is pending, indicate that q-2 is first allocated and then q-1 is allocated

lowang-bh · 2024-04-16T02:41:49Z

Wait #3408 fix the failed UT.

lowang-bh · 2024-04-16T05:11:24Z

May be we need to adjust the e2e test case.

panoswoo · 2024-04-16T06:19:07Z

May be we need to adjust the e2e test case.

let me take a look

panoswoo · 2024-04-16T07:32:29Z

@lowang-bh Hi, I seem to have found some issues while searching for the volcano component logs generated by e2e testing.

when e2e test failed, we will backup volcano component logs by

volcano/hack/run-e2e-kind.sh

Lines 69 to 74 in c414e56

    
           function generate-log { 
        
               echo "Generating volcano log files" 
        
               kubectl logs deployment/${CLUSTER_NAME}-admission -n kube-system > volcano-admission.log 
        
               kubectl logs deployment/${CLUSTER_NAME}-controllers -n kube-system > volcano-controller.log 
        
               kubectl logs deployment/${CLUSTER_NAME}-scheduler -n kube-system > volcano-scheduler.log 
        
           }

we are trying to get log from namespace kube-system but those component are installed in volcano-system actually

volcano/hack/run-e2e-kind.sh

Line 57 in c414e56

    
           helm install ${CLUSTER_NAME} installer/helm/chart/volcano --namespace volcano-system --kubeconfig ${KUBECONFIG} \

it seems that we didn't upload the log file at the end of the workflow

volcano/.github/workflows/e2e_scheduling_basic.yaml

Lines 39 to 41 in c414e56

    
                 - name: Run E2E Tests 
        
                   run: | 
        
                     make e2e-test-schedulingbase CC=/usr/local/musl/bin/musl-gcc

lowang-bh · 2024-04-16T10:04:33Z

we are trying to get log from namespace kube-system but those component are installed in volcano-system actually

That is a issue, now we use volcano-system. You can file another PR an merge it first. @Monokaix

lowang-bh · 2024-04-19T13:08:14Z

/retest

panoswoo · 2024-04-20T13:29:16Z

May be we need to adjust the e2e test case.

I triggered the test again and it passed.
I haven't made any modifications. Have we recently merged any fixes?

lowang-bh · 2024-04-20T13:41:15Z

May be we need to adjust the e2e test case.

I triggered the test again and it passed. I haven't made any modifications. Have we recently merged any fixes?

No update. Maybe it is randomly failed. We'd better check it.

panoswoo · 2024-04-20T14:59:41Z

May be we need to adjust the e2e test case.

I triggered the test again and it passed. I haven't made any modifications. Have we recently merged any fixes?

No update. Maybe it is randomly failed. We'd better check it.

I am unable to reproduce the problem now.
I don't have detailed logs of the previous exception cases, and this error is a large process (involving multiple components) that has timed out. I can't determine which part of the problem went wrong without detailed logs :(

lowang-bh · 2024-04-22T02:12:13Z

hack/run-e2e-kind.sh

 if [[ $? -ne 0 ]]; then
-  generate-log
+


Will generate-log be put here?

the modifications made in run-e2e-kind.sh are temporary and were used for debugging
I will revert these changes in this PR and create a new one

volcano-sh-bot · 2024-04-22T02:33:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign shinytang6
You can assign the PR to them by writing /assign @shinytang6 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lowang-bh · 2024-04-27T09:40:03Z

@panoswoo please make the ci succeed and so we can make this pr merged.

panoswoo · 2024-04-27T09:49:05Z

/retest

volcano-sh-bot · 2024-04-27T09:49:23Z

@panoswoo: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

panoswoo · 2024-04-27T09:50:28Z

@panoswoo please make the ci succeed and so we can make this pr merged.

I will fix it ASAP

lowang-bh · 2024-04-27T11:03:43Z

/ok-to-test

lowang-bh · 2024-04-27T11:04:02Z

/retest

lowang-bh · 2024-04-27T11:05:56Z

/assign @Monokaix @hwdef
Please also help to review it. Thanks.

panoswoo · 2024-04-27T11:13:49Z

/retest

hwdef

/lgtm

lowang-bh · 2024-05-02T13:33:02Z

HI, @panoswoo , You can close and reopen it to trigger the CI.

…finished Signed-off-by: Panos Woo <panoswoo@outlook.com>

volcano-sh-bot · 2024-05-06T01:58:14Z

New changes are detected. LGTM label has been removed.

volcano-sh-bot requested review from lowang-bh and merryzhou April 15, 2024 03:10

volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 15, 2024

panoswoo force-pushed the fix/3407 branch from c00530f to 4abb29b Compare April 15, 2024 08:49

volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 15, 2024

panoswoo force-pushed the fix/3407 branch from 4abb29b to 6c1b982 Compare April 16, 2024 02:29

panoswoo force-pushed the fix/3407 branch from 6c1b982 to a803f35 Compare April 16, 2024 02:54

panoswoo force-pushed the fix/3407 branch 2 times, most recently from cfb0334 to 4624db7 Compare April 20, 2024 13:01

panoswoo force-pushed the fix/3407 branch from 4624db7 to 772dba4 Compare April 20, 2024 13:57

lowang-bh reviewed Apr 22, 2024

View reviewed changes

panoswoo force-pushed the fix/3407 branch from 772dba4 to a9004ad Compare April 22, 2024 02:33

lowang-bh mentioned this pull request Apr 22, 2024

queue is put back before a job's resource allocating #3407

Open

volcano-sh-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 27, 2024

hwdef reviewed Apr 28, 2024

View reviewed changes

volcano-sh-bot assigned hwdef Apr 28, 2024

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2024

put back the queue to priority queue after job's resource allocating …

5b84e1e

…finished Signed-off-by: Panos Woo <panoswoo@outlook.com>

panoswoo force-pushed the fix/3407 branch from a9004ad to 5b84e1e Compare May 6, 2024 01:58

volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label May 6, 2024

panoswoo force-pushed the fix/3407 branch from 280be8c to abf7472 Compare May 6, 2024 03:09

Merge branch 'master' into fix/3407

4493453

panoswoo force-pushed the fix/3407 branch from abf7472 to 4493453 Compare May 6, 2024 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

put back the queue to priority queue after job's resource allocating … #3413

put back the queue to priority queue after job's resource allocating … #3413

panoswoo commented Apr 15, 2024

panoswoo commented Apr 15, 2024

volcano-sh-bot commented Apr 15, 2024

lowang-bh commented Apr 15, 2024

panoswoo commented Apr 15, 2024

lowang-bh commented Apr 15, 2024 •

edited

panoswoo commented Apr 15, 2024

lowang-bh commented Apr 16, 2024

lowang-bh commented Apr 16, 2024

lowang-bh commented Apr 16, 2024

panoswoo commented Apr 16, 2024

panoswoo commented Apr 16, 2024

lowang-bh commented Apr 16, 2024

lowang-bh commented Apr 19, 2024

panoswoo commented Apr 20, 2024

lowang-bh commented Apr 20, 2024

panoswoo commented Apr 20, 2024

lowang-bh Apr 22, 2024

panoswoo Apr 22, 2024

volcano-sh-bot commented Apr 22, 2024

lowang-bh commented Apr 27, 2024

panoswoo commented Apr 27, 2024

volcano-sh-bot commented Apr 27, 2024

panoswoo commented Apr 27, 2024

lowang-bh commented Apr 27, 2024

lowang-bh commented Apr 27, 2024

lowang-bh commented Apr 27, 2024 •

edited

panoswoo commented Apr 27, 2024

hwdef left a comment

lowang-bh commented May 2, 2024

volcano-sh-bot commented May 6, 2024

put back the queue to priority queue after job's resource allocating … #3413

Are you sure you want to change the base?

put back the queue to priority queue after job's resource allocating … #3413

Conversation

panoswoo commented Apr 15, 2024

panoswoo commented Apr 15, 2024

volcano-sh-bot commented Apr 15, 2024

lowang-bh commented Apr 15, 2024

panoswoo commented Apr 15, 2024

lowang-bh commented Apr 15, 2024 • edited

panoswoo commented Apr 15, 2024

lowang-bh commented Apr 16, 2024

lowang-bh commented Apr 16, 2024

lowang-bh commented Apr 16, 2024

panoswoo commented Apr 16, 2024

panoswoo commented Apr 16, 2024

lowang-bh commented Apr 16, 2024

lowang-bh commented Apr 19, 2024

panoswoo commented Apr 20, 2024

lowang-bh commented Apr 20, 2024

panoswoo commented Apr 20, 2024

lowang-bh Apr 22, 2024

Choose a reason for hiding this comment

panoswoo Apr 22, 2024

Choose a reason for hiding this comment

volcano-sh-bot commented Apr 22, 2024

lowang-bh commented Apr 27, 2024

panoswoo commented Apr 27, 2024

volcano-sh-bot commented Apr 27, 2024

panoswoo commented Apr 27, 2024

lowang-bh commented Apr 27, 2024

lowang-bh commented Apr 27, 2024

lowang-bh commented Apr 27, 2024 • edited

panoswoo commented Apr 27, 2024

hwdef left a comment

Choose a reason for hiding this comment

lowang-bh commented May 2, 2024

volcano-sh-bot commented May 6, 2024

lowang-bh commented Apr 15, 2024 •

edited

lowang-bh commented Apr 27, 2024 •

edited