Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: change service activity check logic in ServiceCheck #60304

Closed
wants to merge 1 commit into from

Conversation

mythi
Copy link
Contributor

@mythi mythi commented Feb 23, 2018

What this PR does / why we need it:
The PR brings kubeadm improvements.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
The PR changes the service activity check logic in ServiceCheck. It is necessary because kubelet may be running (in the standalone mode) and the user doesn't necessarily now why the other checks (especially Port10250) are failing.

Release note:

kubeadm: change service activity check logic in ServiceCheck 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 23, 2018
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Feb 23, 2018
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 23, 2018
@mythi
Copy link
Contributor Author

mythi commented Feb 23, 2018

cnfc-cla should be OK now.

@timothysc timothysc removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 23, 2018
@timothysc timothysc self-assigned this Feb 23, 2018
@@ -148,10 +148,16 @@ func (sc ServiceCheck) Check() (warnings, errors []error) {
sc.Service, sc.Service))
}

if sc.CheckIfActive && !initSystem.ServiceIsActive(sc.Service) {
if sc.WantActive != initSystem.ServiceIsActive(sc.Service) {
status := [2]string{"active", "not active"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing some tests will need to be updated for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make check WHAT="./cmd/kubeadm/app/preflight" GOFLAGS="-v" passes with these changes. I'll need to see why the automated PR checks are failing.

@xiangpengzhao
Copy link
Contributor

/ok-to-test

@timothysc FYI: removing needs-ok-to-test manually doesn't trigger tests :)

@xiangpengzhao
Copy link
Contributor

@mythi seems like CLA status is not detected...

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Feb 26, 2018
@xiangpengzhao
Copy link
Contributor

/retest
Seems like they are test flakes.

@xiangpengzhao
Copy link
Contributor

k8s-ci-robot added the cncf-cla: no label 12 hours ago

@mythi your CLA status seems still not correct :)

@xiangpengzhao
Copy link
Contributor

      diff -u ./cmd/kubeadm/app/preflight/checks.go.orig ./cmd/kubeadm/app/preflight/checks.go
--- ./cmd/kubeadm/app/preflight/checks.go.orig	2018-02-24 03:22:20.945495368 +0000
+++ ./cmd/kubeadm/app/preflight/checks.go	2018-02-24 03:22:20.945495368 +0000
@@ -115,9 +115,9 @@
 // detect a supported init system however, all checks are skipped and a warning is
 // returned.
 type ServiceCheck struct {
-	Service       string
+	Service    string
 	WantActive bool
-	Label         string
+	Label      string
 }
 
 // Name returns label for ServiceCheck. If not provided, will return based on the service parameter
@@ -149,10 +149,10 @@
 	}
 
 	if sc.WantActive != initSystem.ServiceIsActive(sc.Service) {
-		status  := [2]string{"active", "not active"}
+		status := [2]string{"active", "not active"}
 		command := [2]string{"stop", "start"}
-		index   := 0
-		if (sc.WantActive) {
+		index := 0
+		if sc.WantActive {
 			index = 1
 		}
 		errors = append(errors,

Run ./hack/update-gofmt.sh

@mythi
Copy link
Contributor Author

mythi commented Feb 27, 2018 via email

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 27, 2018
@mythi
Copy link
Contributor Author

mythi commented Feb 27, 2018 via email

@mythi
Copy link
Contributor Author

mythi commented Feb 28, 2018

/test cla/linuxfoundation

@mythi
Copy link
Contributor Author

mythi commented Mar 1, 2018

/retest

@BenTheElder
Copy link
Member

Hmm is that the entire output? The last person I talked to with kazel failing just had kubernetes checked out to the wrong location (should be $GOPATH/src/k8s.io/kubernetes)

@mythi
Copy link
Contributor Author

mythi commented Mar 8, 2018

@BenTheElder That was it!

@mythi
Copy link
Contributor Author

mythi commented Mar 8, 2018

/cc @kad

@k8s-ci-robot k8s-ci-robot requested a review from kad March 8, 2018 07:08
@mythi
Copy link
Contributor Author

mythi commented Mar 13, 2018

retriggering cla check

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 13, 2018
@xiangpengzhao
Copy link
Contributor

retriggering cla check

It works at last! 💯

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please squash commits.

if sc.CheckIfActive && !initSystem.ServiceIsActive(sc.Service) {
if sc.WantActive != initSystem.ServiceIsActive(sc.Service) {
status := [2]string{"active", "not active"}
command := [2]string{"stop", "start"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing of intent from the return is odd to me. I'd think that having a utility function, or augmenting the return arg would be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2018
@dims
Copy link
Member

dims commented Apr 30, 2018

/uncc @dims

@k8s-ci-robot k8s-ci-robot removed the request for review from dims April 30, 2018 12:33
Currently, the service activity checks logic is either

  a) I care about the active status and it needs to be active or
  b) I don't care about the active status

This limits the usage and won't allow to check

  c) I care about the status and it needs to be NOT active

The additional check is necessary to be control kubelet is started
right time in the bootstrap process. The change in the preflight checks
now errors if kubelet is running when kubeadm is called.

Additionally, we change to start kubelet later in the bootstrap
process. Currently, kubelet is started right after the preflight
checks but before the configs are created. At that point, kubeadm
has the assumption that kubelet goes in a crash-loop and starts OK
after the configs are created.

It may be that the crash-loop does not happen but kubelet runs
in a standalone mode (e.g., in case if kubelet is started without
--bootstrap-kubeconfig). This makes kubeadm to fail due to a
timeout in markmaster when running 'kubeadm init'.
@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 15, 2018
@mythi
Copy link
Contributor Author

mythi commented May 15, 2018

rebased + addressed the review feedback.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 15, 2018

@mythi: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-verify 0ffaf2a link /test pull-kubernetes-verify
pull-kubernetes-e2e-gce 0ffaf2a link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@timothysc
Copy link
Member

/assign @detiber @stealthybox

@@ -987,7 +990,7 @@ func addCommonChecks(execer utilsexec.Interface, cfg kubeadmapi.CommonConfigurat
IsPrivilegedUserCheck{},
HostnameCheck{nodeName: cfg.GetNodeName()},
KubeletVersionCheck{KubernetesVersion: cfg.GetKubernetesVersion(), exec: execer},
ServiceCheck{Service: "kubelet", CheckIfActive: false},
ServiceCheck{Service: "kubelet", WantActive: false},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to change the behavior of the kubelet check significantly here... It looks like previously CheckIfActive would skip the active check completely, but now with WantActive this is testing specifically for the kublet not being active.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I'm suggesting. The reasons are explained in the commit message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but since we are checking for active or activating here wouldn't this preflight check fail by default on any system that has the kubelet enabled/started before running kubeadm (which is the default case on Ubuntu/Debian systems)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my observation: if kubelet is active, kubeadm init would fail in markmaster timeout (I was hit by this). If kubelet is activating, it's actually crash-looping and can only start/recover once the configurations are created by kubeadm.

So I thought to make it more bullet proof and make sure we start with kubelet not running at all. That's the state where kubeadm reset leaves it too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mythi that behavior makes perfect sense to me and is the default behavior when installing from the rpms. However the expected behavior with debs is that the service is enabled and started after installation, which is also the behavior with the current debs that we ship.

While I personally don't agree with the default behavior of enabling/starting daemons on package install, it is the expected behavior on Debian/Ubuntu systems.

That said, I believe the only reason that we are currently checking for active or activating is for the check in kubeadm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber I'm not sure I got your point wrt. to my change proposal. I don't mind to keep kubelet running as it is today (and I don't mind if this PR is not OK) as long as the behaviour is consistent. Right now you have to have it crash-looping in order to get kubeadm init to succeed. For newcomers like me it'll take some time to realize why active is not sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mythi maybe I'm not fully grokking these changes, but it appears to me that the preflight check would currently fail on an Ubuntu host because we are testing that initSystem.ServiceIsActive() returns false for the kubelet, since ServiceIsActive() is checking for 'active' or 'activating' and the kubelet service on Ubuntu will be in 'activating' by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber Currently the preflight check for kubelet ignores the activity status (CheckIfActive is set to false).

@k8s-ci-robot
Copy link
Contributor

@mythi: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2018
@mythi
Copy link
Contributor Author

mythi commented May 22, 2018

I don't have the urgent need to get this merged since I know the workaround needed to address the failures I'm seeing. I'm closing this and look into alternatives to improve documentation to ensure people don't hit the same problem.

@mythi mythi closed this May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants