Add readiness & liveness probes to kube-proxy #75323

stafot · 2019-03-13T09:19:39Z

Possible mitigation of #75189

What type of PR is this?

/kind bug

What this PR does / why we need it:
Add readiness & liveness probes to kube-proxy daemonset example.
Which issue(s) this PR fixes:

Fixes #75189

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kube-proxy: Adds readiness and liveness probes.

k8s-ci-robot · 2019-03-13T09:19:47Z

Hi @stafot. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

stafot · 2019-03-13T09:20:08Z

/sig aws
/sig network

MrHohn · 2019-03-13T18:13:43Z

/ok-to-test

cluster/addons/kube-proxy/kube-proxy-ds.yaml

MrHohn · 2019-03-13T18:23:25Z

A previous attempt was #50118. Now give it another thought and it seems totally reasonable to have liveness/readiness probes on kube-proxy.

Would be great to add this change to below as well for consistency:

@kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-network-pr-reviews

MrHohn · 2019-03-13T18:26:26Z

One problem though is that in case of kube-apiserver being not available (e.g. during master upgrade). kube-proxy may become unhealthy and gets restarted, even if that won't help.

stafot · 2019-03-13T19:06:05Z

Thanks for the information. I 'll update accordingly. For master upgrade, this failure can cause any harm? I believe not, but if yes and is blocking issue let me know.

stafot · 2019-03-14T06:12:48Z

/assign @jingax10 @luxas

neolit123

@stafot
please add a release note that explains the change in one sentence instead of NONE.
also we are in code freeze until "code thaw" in 1.14 https://github.com/kubernetes/sig-release/tree/master/releases/release-1.14

/priority backlog
/assign @timothysc
@kubernetes/sig-cluster-lifecycle-pr-reviews

danwinship · 2020-02-20T22:18:02Z

do we need a KEP?

The existing liveness and readiness probes for kube-proxy are in need of adjustment. The current implementation is exec-based, which can be a resource concern, and is tied heavily to iptables, so is incompatible with ipvs. This change removes the exec-based liveness and readiness probes from the kube-proxy daemonset, and replaces them with HTTP probes of the healthz endpoint, following the direction that kubernetes seems to be taking.[0][1] The values.yaml interface to enable and disable the probes and set various parameters is also modified to use the helm-toolkit standard snippet.[2] Notably, the settings previously configurable under livenessProbe.config are now under pod.probes.proxy.proxy.liveness.params. 0: kubernetes/kubernetes#81630 1: kubernetes/kubernetes#75323 2: https://opendev.org/openstack/openstack-helm-infra/src/branch/master/helm-toolkit/templates/snippets/_kubernetes_probes.tpl Change-Id: I99ccbc2270a1f8a204417aa410868d04788dc60f

fejta-bot · 2020-07-28T20:08:17Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-08-27T20:49:17Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

zouyee · 2020-08-28T02:07:01Z

/remove-lifecycle rotten

cmluciano · 2020-11-09T21:21:24Z

Do you think this is something that we should continue to pursue @MrHohn ?

MrHohn · 2020-11-10T21:14:35Z

Sorry for the delay. As @danwinship mentioned this probably needs a KEP to go forward due to the complication.

fejta-bot · 2021-02-08T21:17:20Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

thockin · 2021-02-19T00:28:04Z

Hi all,

This PR is very old - what do we want to do with it? It seems reasonable to to have a healthz handler and to use it. The handler for kube-proxy has undergone some changes and as I look at it now, it seems reasonable or even too weak. I don't think it will trigger if the apiserver is down (though maybe it should). I also don't think it will trigger if there's a chronic failure to sync rules (e.g. iptables error).

Do we want to revive this?

SergeyKanzhelev · 2021-02-19T20:27:06Z

cluster/addons/kube-proxy/kube-proxy-ds.yaml

+          timeoutSeconds: 15
+          successThreshold: 1
+          failureThreshold: 2
+        readinessProbe:


is there any need for readinessProbe matching livenessProbe exactly?

aojea · 2021-02-19T21:37:02Z

Do we want to revive this?

I don't have clear what problems is solving this, at least is not my understanding from the bug referenced in the description.
so,if in 5 years or more, kube-proxy was running without probes and we didn't have any issue because of this ... is it worth to add a possibility to restart or declare as not ready, the component that configure the services, and that almost all the pods use to reach the internal apiserver endpoints?

neolit123 · 2021-02-19T21:44:04Z

as mentioned by @MrHohn here:
#75323 (comment)
#75323 (comment)
this change has seen a complication and may need a KEP.

if that's no longer the case, i'd defer to him whether we want to proceed merging this PR.

we should note that the original issue was closed without sufficient information why exactly we want to apply probes to kube-proxy:
#75189 (comment)

thus far i have not seen other requests about this.

unless someone objects, i propose that we close this PR ~ mid-next week.

k8s-ci-robot · 2021-03-04T03:44:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MrHohn, ohbus, stafot, timothysc
To complete the pull request process, please assign neolit123 after the PR has been reviewed.
You can assign the PR to them by writing /assign @neolit123 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

~~cluster/addons/kube-proxy/OWNERS~~ [MrHohn]
~~cluster/gce/manifests/OWNERS~~ [MrHohn]
cmd/kubeadm/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123 · 2021-03-04T04:08:58Z

closing until a KEP is written for this change.
thanks for the discussion.

/close

k8s-ci-robot · 2021-03-04T04:09:09Z

@neolit123: Closed this PR.

In response to this:

closing until a KEP is written for this change.
thanks for the discussion.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SataQiu · 2022-08-15T08:15:41Z

So is it safe to add readiness probe only for now? Readiness does not trigger Pod restart, but it is a good indicator of Pod health. It can help identify potential problems about kube-proxy faster.
cc @neolit123

neolit123 · 2022-08-15T09:23:04Z

@SataQiu not sure. readiness should be fine, but i think we should keep the kubeadm / kubeup kubeproxy addons in sync for better test coverage.

k8s-ci-robot requested review from bowei and MrHohn March 13, 2019 09:20

k8s-ci-robot added sig/aws sig/network Categorizes an issue or PR as relevant to SIG Network. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 13, 2019

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 13, 2019

MrHohn reviewed Mar 13, 2019

View reviewed changes

cluster/addons/kube-proxy/kube-proxy-ds.yaml Outdated Show resolved Hide resolved

stafot force-pushed the kube_proxy_ds_healthprobes branch from 3a426fe to 22d586b Compare March 13, 2019 20:30

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/kubeadm and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 13, 2019

k8s-ci-robot assigned jingax10 and luxas Mar 14, 2019

neolit123 reviewed Mar 14, 2019

View reviewed changes

dims removed their assignment Apr 29, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 28, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2021

SergeyKanzhelev reviewed Feb 19, 2021

View reviewed changes

ohbus approved these changes Mar 4, 2021

View reviewed changes

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021

k8s-ci-robot closed this Mar 4, 2021

Obsolete: SIG-Network KEPs (see https://github.com/orgs/kubernetes/projects/148) automation moved this from Triage to Done Mar 4, 2021

xinfengliu mentioned this pull request Sep 27, 2021

kube-proxy pods needs proper probes k0sproject/k0s#1139

Closed

SataQiu mentioned this pull request Aug 15, 2022

Enable readiness probe for kube-proxy #111857

Closed

aojea mentioned this pull request Feb 8, 2023

KEP-3836: Improve Kube-proxy ingress connectivity reliability kubernetes/enhancements#3837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add readiness & liveness probes to kube-proxy #75323

Add readiness & liveness probes to kube-proxy #75323

stafot commented Mar 13, 2019 •

edited

k8s-ci-robot commented Mar 13, 2019

stafot commented Mar 13, 2019

MrHohn commented Mar 13, 2019

MrHohn commented Mar 13, 2019 •

edited

MrHohn commented Mar 13, 2019

stafot commented Mar 13, 2019

stafot commented Mar 14, 2019

neolit123 left a comment

danwinship commented Feb 20, 2020

fejta-bot commented Jul 28, 2020

fejta-bot commented Aug 27, 2020

zouyee commented Aug 28, 2020

cmluciano commented Nov 9, 2020

MrHohn commented Nov 10, 2020

fejta-bot commented Feb 8, 2021

thockin commented Feb 19, 2021

SergeyKanzhelev Feb 19, 2021

aojea commented Feb 19, 2021 •

edited

neolit123 commented Feb 19, 2021 •

edited

k8s-ci-robot commented Mar 4, 2021

neolit123 commented Mar 4, 2021

k8s-ci-robot commented Mar 4, 2021

SataQiu commented Aug 15, 2022

neolit123 commented Aug 15, 2022

Add readiness & liveness probes to kube-proxy #75323

Add readiness & liveness probes to kube-proxy #75323

Conversation

stafot commented Mar 13, 2019 • edited

k8s-ci-robot commented Mar 13, 2019

stafot commented Mar 13, 2019

MrHohn commented Mar 13, 2019

MrHohn commented Mar 13, 2019 • edited

MrHohn commented Mar 13, 2019

stafot commented Mar 13, 2019

stafot commented Mar 14, 2019

neolit123 left a comment

Choose a reason for hiding this comment

danwinship commented Feb 20, 2020

fejta-bot commented Jul 28, 2020

fejta-bot commented Aug 27, 2020

zouyee commented Aug 28, 2020

cmluciano commented Nov 9, 2020

MrHohn commented Nov 10, 2020

fejta-bot commented Feb 8, 2021

thockin commented Feb 19, 2021

SergeyKanzhelev Feb 19, 2021

Choose a reason for hiding this comment

aojea commented Feb 19, 2021 • edited

neolit123 commented Feb 19, 2021 • edited

k8s-ci-robot commented Mar 4, 2021

neolit123 commented Mar 4, 2021

k8s-ci-robot commented Mar 4, 2021

SataQiu commented Aug 15, 2022

neolit123 commented Aug 15, 2022

stafot commented Mar 13, 2019 •

edited

MrHohn commented Mar 13, 2019 •

edited

aojea commented Feb 19, 2021 •

edited

neolit123 commented Feb 19, 2021 •

edited