Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sidecar Containers] Pods comparison by maxContainerRestarts should account for sidecar containers #124952

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AxeZhan
Copy link
Member

@AxeZhan AxeZhan commented May 19, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Today, there are a few uses of the function maxContainerRestarts - mostly to compare pods to decide which one is
better to delete or which logs to get.

The code only look at Container Statuses, but likely need to look at init container statuses as well.
Especially in case of sidecar containers that may behave exactly as regular containers.

We may need to be careful including all init container statuses. If a Pod was failing to start for a while
because of Init container failures and now it is running OK, it is likely not important. However, including
the restartable containers (sidecars) restart count is important.

I think the desireable behavior will be to check regular containers max restart count first. And compare this.
Then compare max restart count for restarteable init containers.

Which issue(s) this PR fixes:

Fixes #124936

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Considering sidecar container restart counts when removing pods by job controller

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP]: <https://github.com/kubernetes/enhancements/issues/753>

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 19, 2024
@k8s-ci-robot k8s-ci-robot added area/kubectl cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 19, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AxeZhan
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels May 19, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 19, 2024
Copy link
Member

@gjkim42 gjkim42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to check the feature gate to change the existing behavior.

@@ -190,14 +199,32 @@ func podReadyTime(pod *corev1.Pod) *metav1.Time {
return &metav1.Time{}
}

func maxContainerRestarts(pod *corev1.Pod) int {
func maxContainerRestarts(cs []corev1.ContainerStatus) int {
Copy link
Member

@gjkim42 gjkim42 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we get the pod here and take restartable init containers into account inside of this function to remove the redundancy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean a function like maxContainerRestarts(p1 *corev1.Pod, p2 *corev1.Pod) int ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about the function that returns the maximum container restarts for regular containers AND sidecar containers of the given pod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion.
I meant one maximum count across all containers including regular containers and sidecar containers.
What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with what we originally designed.

I think the desireable behavior will be to check regular containers max restart count first. And compare this.
Then compare max restart count for restarteable init containers.

/cc @SergeyKanzhelev

@AxeZhan
Copy link
Member Author

AxeZhan commented May 20, 2024

I guess we need to check the feature gate to change the existing behavior.

I'm not sure about this also:

If the pod has no sidecar containers, then existing logic won't be broken here.
If the pod has sidecar containers:

  1. If feature gate is enabled, we compare restart count for sidecar containers, and it's expected.
  2. If feature gate is disabled, then should we compare sidecar containers restarts?

I tend to compare it as long as there are sidecar containers in the pod.
But, I'm not clear how other components handle sidecar containers when feature gate is disabled, so no strong view here.

@gjkim42
Copy link
Member

gjkim42 commented May 20, 2024

I guess we have applied the feature gate since we cannot guarantee that it never break anything. However, I am ok with not applying it as the patch looks simple and predictable.

@AxeZhan
Copy link
Member Author

AxeZhan commented May 20, 2024

I guess we have applied the feature gate since we cannot guarantee that it never break anything. However, I am ok with not applying it as the patch looks simple and predictable.

I have no strong view either, let's wait for other reviewers:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Needs Triage
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

[Sidecar Containers] Pods comparison by maxContainerRestarts should account for sidecar containers
3 participants