New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshooting pods #1889
Troubleshooting pods #1889
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 05b69797b574b7192a33f6dac17759416679bdd0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just nits
site/content/en/docs/tasks/troubleshooting/troubleshooting_pods.md
Outdated
Show resolved
Hide resolved
site/content/en/docs/tasks/troubleshooting/troubleshooting_pods.md
Outdated
Show resolved
Hide resolved
site/content/en/docs/tasks/troubleshooting/troubleshooting_pods.md
Outdated
Show resolved
Hide resolved
to accomodate higher priority jobs or reclaim quota. Preemption is implemented via `DELETE` calls, | ||
the standard way of terminating a Pod in Kubernetes. | ||
|
||
When using single Pods, Kubernetes will delete Workload object along with the Pod, as there is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a sentence on what happens with workload when using Pod groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly the next question that covers it, but I added a note.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically lgtm
This doc is about troubleshooting pending Pods when directly managed by Kueue, in other words, | ||
Pods that are not managed by kubernetes Jobs or supported CRDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention which integrations we target in this troubleshooting?
https://kueue.sigs.k8s.io/docs/tasks/run/plain_pods/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link in this paragraph.
Note that the above event might show up for the first Pod that Kueue observes, and it will remain | ||
even if Kueue successfully creates the Workload for the Pod group later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the above event might show up for the first Pod that Kueue observes, and it will remain | |
even if Kueue successfully creates the Workload for the Pod group later. | |
{{% alert title="Note" color="primary" %}} | |
The above event might show up for the first Pod that Kueue observes, and it will remain | |
even if Kueue successfully creates the Workload for the Pod group later. | |
{{% /alert %}} |
Using docsy style notes would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- Read [Troubleshooting Jobs](troubleshooting_jobs) to learn generic troubleshooting steps for jobs | ||
managed by Kueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated with line 12. So, can we put this in either place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it from here and highlighted it on top using a note.
bf0b398
to
6a91858
Compare
Rebased on top of #1888 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
/lgtm
/hold cancel
LGTM label has been added. Git tree hash: 5ae16238d16cc5864bd63f8e0ad0f4fd26b0ca54
|
/hold |
6a91858
to
1b662e4
Compare
I just wanted to squash :) /hold cancel |
/cherry-pick website |
@alculquicondor: new pull request created: #1906 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Add basic steps to troubleshoot single pods or pod groups
Which issue(s) this PR fixes:
Part of #1410
Special notes for your reviewer:
The links in this change assume that #1888 is merged.
Does this PR introduce a user-facing change?