Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting pods #1889

Merged

Conversation

alculquicondor
Copy link
Contributor

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Add basic steps to troubleshoot single pods or pod groups

Which issue(s) this PR fixes:

Part of #1410

Special notes for your reviewer:

The links in this change assume that #1888 is merged.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. labels Mar 22, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 22, 2024
@alculquicondor
Copy link
Contributor Author

/hold
for #1888

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 22, 2024
Copy link

netlify bot commented Mar 22, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 1b662e4
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6601d62e6316b300089222b0

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2024
Copy link
Contributor

@trasc trasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 05b69797b574b7192a33f6dac17759416679bdd0

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just nits

to accomodate higher priority jobs or reclaim quota. Preemption is implemented via `DELETE` calls,
the standard way of terminating a Pod in Kubernetes.

When using single Pods, Kubernetes will delete Workload object along with the Pod, as there is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a sentence on what happens with workload when using Pod groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly the next question that covers it, but I added a note.

@PBundyra
Copy link
Contributor

LGTM

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2024
@k8s-ci-robot k8s-ci-robot requested a review from trasc March 25, 2024 17:10
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically lgtm

Comment on lines 9 to 10
This doc is about troubleshooting pending Pods when directly managed by Kueue, in other words,
Pods that are not managed by kubernetes Jobs or supported CRDs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention which integrations we target in this troubleshooting?
https://kueue.sigs.k8s.io/docs/tasks/run/plain_pods/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link in this paragraph.

Comment on lines 61 to 65
Note that the above event might show up for the first Pod that Kueue observes, and it will remain
even if Kueue successfully creates the Workload for the Pod group later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that the above event might show up for the first Pod that Kueue observes, and it will remain
even if Kueue successfully creates the Workload for the Pod group later.
{{% alert title="Note" color="primary" %}}
The above event might show up for the first Pod that Kueue observes, and it will remain
even if Kueue successfully creates the Workload for the Pod group later.
{{% /alert %}}

Using docsy style notes would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 104 to 105
- Read [Troubleshooting Jobs](troubleshooting_jobs) to learn generic troubleshooting steps for jobs
managed by Kueue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated with line 12. So, can we put this in either place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it from here and highlighted it on top using a note.

@alculquicondor
Copy link
Contributor Author

Rebased on top of #1888

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5ae16238d16cc5864bd63f8e0ad0f4fd26b0ca54

@alculquicondor
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2024
@alculquicondor
Copy link
Contributor Author

I just wanted to squash :)

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2024
@k8s-ci-robot k8s-ci-robot merged commit c326666 into kubernetes-sigs:main Mar 25, 2024
6 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Mar 25, 2024
@alculquicondor
Copy link
Contributor Author

/cherry-pick website

@k8s-infra-cherrypick-robot

@alculquicondor: new pull request created: #1906

In response to this:

/cherry-pick website

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants