Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaking Test][sig-scheduling] SchedulerPredicates [Serial] validates resource limits of pods that are allowed to run #122283

Closed
pacoxu opened this issue Dec 13, 2023 · 7 comments
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@pacoxu
Copy link
Member

pacoxu commented Dec 13, 2023

Failure cluster ea66cea699bee5bc4084

https://storage.googleapis.com/k8s-triage/index.html?test=validates%20resource%20limits%20of%20pods%20that%20are%20allowed%20to%20run

Error text:
[FAILED] context deadline exceeded
In [BeforeEach] at: test/e2e/framework/framework.go:263 @ 12/11/23 16:53:47.996

STEP: Starting Pods to consume most of the cluster CPU. - test/e2e/scheduling/predicates.go:379 @ 12/12/23 04:33:47.465
Dec 12 04:33:47.465: INFO: Creating a pod which consumes cpu=5530m on Node kind-worker
Dec 12 04:33:47.475: INFO: Creating a pod which consumes cpu=5530m on Node kind-worker2
E1212 04:33:48.368351   69964 retrywatcher.go:129] "Watch failed" err="context canceled"
E1212 04:33:49.368742   69964 retrywatcher.go:129] "Watch failed" err="context canceled"
Dec 12 04:33:49.513: INFO: Failed inside E2E framework:
    k8s.io/kubernetes/test/e2e/framework/pod.WaitTimeoutForPodRunningInNamespace({0x7fd3ec76a0f0, 0xc004185200}, {0x78e4b70?, 0xc003da7380?}, {0xc0045b2480, 0x2f}, {0xc0039b0de0, 0xf}, 0x0?)
    	test/e2e/framework/pod/wait.go:459 +0x2ed
    k8s.io/kubernetes/test/e2e/framework/pod.WaitForPodRunningInNamespace(...)
    	test/e2e/framework/pod/wait.go:468
    k8s.io/kubernetes/test/e2e/scheduling.glob..func4.5({0x7fd3ec76a0f0, 0xc004185200})
    	test/e2e/scheduling/predicates.go:416 +0xe6c
STEP: removing the label node off the node kind-worker - test/e2e/framework/node/helper.go:73 @ 12/12/23 04:33:49.514
STEP: verifying the node doesn't have the label node - test/e2e/framework/node/helper.go:76 @ 12/12/23 04:33:49.544
STEP: removing the label node off the node kind-worker2 - test/e2e/framework/node/helper.go:73 @ 12/12/23 04:33:49.547
STEP: verifying the node doesn't have the label node - test/e2e/framework/node/helper.go:76 @ 12/12/23 04:33:49.565
[FAILED] Told to stop trying after 2.009s.
Expected pod to reach phase "Running", got final phase "Failed" instead.
In [It] at: test/e2e/scheduling/predicates.go:416 @ 12/12/23 04:33:49.568
< Exit [It] validates resource limits of pods that are allowed to run [Conformance] - test/e2e/scheduling/predicates.go:334 @ 12/12/23 04:33:49.568 (2.204s)
> Enter [AfterEach] [sig-scheduling] SchedulerPredicates [Serial] - test/e2e/scheduling/predicates.go:91 @ 12/12/23 04:33:49.568
< Exit [AfterEach] [sig-scheduling] SchedulerPredicates [Serial] - test/e2e/scheduling/predicates.go:91 @ 12/12/23 04:33:49.568 (0s)
> Enter [DeferCleanup (Each)] [sig-scheduling] SchedulerPredicates [Serial] - test/e2e/framework/node/init/init.go:34 @ 12/12/23 04:33:49.568
Dec 12 04:33:49.568: INFO: Waiting up to 7m0s for all (but 0) nodes to be ready
< Exit [DeferCleanup (Each)] [sig-scheduling] SchedulerPredicates [Serial] - test/e2e/framework/node/init/init.go:34 @ 12/12/23 04:33:49.573 (5ms)
> Enter [DeferCleanup (Each)] [sig-scheduling] SchedulerPredicates [Serial] - test/e2e/framework/metrics/init/init.go:35 @ 12/12/23 04:33:49.573
< Exit [DeferCleanup (Each)] [sig-scheduling] SchedulerPredicates [Serial] - test/e2e/framework/metrics/init/init.go:35 @ 12/12/23 04:33:49.573 (0s)
> Enter [DeferCleanup (Each)] [sig-scheduling] SchedulerPredicates [Serial] - dump namespaces | framework.go:218 @ 12/12/23 04:33:49.573
STEP: dump namespace information after failure - test/e2e/framework/framework.go:297 @ 12/12/23 04:33:49.573
STEP: Collecting events from namespace "sched-pred-8301". - test/e2e/framework/debug/dump.go:42 @ 12/12/23 04:33:49.573
STEP: Found 3 events. - test/e2e/framework/debug/dump.go:46 @ 12/12/23 04:33:49.577
Dec 12 04:33:49.577: INFO: At 2023-12-12 04:33:47 +0000 UTC - event for filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3: {default-scheduler } Scheduled: Successfully assigned sched-pred-8301/filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3 to kind-worker2
Dec 12 04:33:49.577: INFO: At 2023-12-12 04:33:47 +0000 UTC - event for filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0: {default-scheduler } Scheduled: Successfully assigned sched-pred-8301/filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0 to kind-worker
Dec 12 04:33:49.577: INFO: At 2023-12-12 04:33:47 +0000 UTC - event for filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0: {kubelet kind-worker} NodeAffinity: Predicate NodeAffinity failed
Dec 12 04:33:49.581: INFO: POD                                              NODE          PHASE    GRACE  CONDITIONS
Dec 12 04:33:49.581: INFO: filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3  kind-worker2  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2023-12-12 04:33:47 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2023-12-12 04:33:47 +0000 UTC ContainersNotReady containers with unready status: [filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2023-12-12 04:33:47 +0000 UTC ContainersNotReady containers with unready status: [filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2023-12-12 04:33:47 +0000 UTC  }]
Dec 12 04:33:49.581: INFO: filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0  kind-worker   Failed          []
Dec 12 04:33:49.581: INFO: 
Dec 12 04:33:49.610: INFO: Unable to fetch sched-pred-8301/filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3/filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3 logs: the server rejected our request for an unknown reason (get pods filler-pod-36f468c7-08c4-4dd6-b52f-a41567d1b7f3)
Dec 12 04:33:49.661: INFO: Unable to fetch sched-pred-8301/filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0/filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0 logs: the server rejected our request for an unknown reason (get pods filler-pod-c2ed7ea0-735c-44d4-884b-a37f7bf19cf0)
Dec 12 04:33:49.665: INFO: 
Logging node info for node kind-control-plane

NodeAffinity: Predicate NodeAffinity failed

Recent failures:

2023/12/9 21:48:25 e2e-ci-kubernetes-e2e-al2023-aws-conformance-canary
2023/12/2 17:48:19 e2e-ci-kubernetes-e2e-al2023-aws-conformance-canary

/kind flake
/sig scheduling

See it in https://testgrid.k8s.io/sig-release-master-blocking#conformance-ga-only.

@k8s-ci-robot k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Dec 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 13, 2023
@sanposhiho
Copy link
Member

I’ve got little time investigating deeply, but I found one critical bug in NodeAffinity QueueingHint, which is implemented in this release. Failing test may be caused by that.
#122284

@pacoxu
Copy link
Member Author

pacoxu commented Dec 13, 2023

/cc @kubernetes/ci-signal

@pacoxu
Copy link
Member Author

pacoxu commented Dec 13, 2023

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Dec 13, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Dec 13, 2023

/remove-priority critical-urgent

as we disabled QueueingHint by default.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Dec 13, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Feb 19, 2024

@k8s-ci-robot
Copy link
Contributor

@pacoxu: Closing this issue.

In response to this:

/close
as no flaking can be found with https://storage.googleapis.com/k8s-triage/index.html?test=validates%20resource%20limits%20of%20pods%20that%20are%20allowed%20to%20run

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
Development

No branches or pull requests

3 participants