-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: change service activity check logic in ServiceCheck #60304
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
cnfc-cla should be OK now. |
cmd/kubeadm/app/preflight/checks.go
Outdated
@@ -148,10 +148,16 @@ func (sc ServiceCheck) Check() (warnings, errors []error) { | |||
sc.Service, sc.Service)) | |||
} | |||
|
|||
if sc.CheckIfActive && !initSystem.ServiceIsActive(sc.Service) { | |||
if sc.WantActive != initSystem.ServiceIsActive(sc.Service) { | |||
status := [2]string{"active", "not active"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing some tests will need to be updated for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make check WHAT="./cmd/kubeadm/app/preflight" GOFLAGS="-v"
passes with these changes. I'll need to see why the automated PR checks are failing.
/ok-to-test @timothysc FYI: removing |
@mythi seems like CLA status is not detected... |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
k8s-ci-robot added the cncf-cla: no label 12 hours ago @mythi your CLA status seems still not correct :) |
|
On Tue, Feb 27, 2018 at 9:08 AM, Peter (XiangPeng) Zhao wrote:
k8s-ci-robot added the cncf-cla: no label 12 hours ago
@mythi <https://github.com/mythi> your CLA status seems still not correct
:)
I noticed this and I'm currently checking why the robot thinks it's not OK.
I have it signed.
|
On Tue, Feb 27, 2018 at 9:10 AM, Peter (XiangPeng) Zhao wrote:
Run ./hack/update-gofmt.sh
Thanks!
|
/test cla/linuxfoundation |
/retest |
Hmm is that the entire output? The last person I talked to with kazel failing just had kubernetes checked out to the wrong location (should be |
@BenTheElder That was it! |
/cc @kad |
retriggering cla check |
It works at last! 💯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please squash commits.
cmd/kubeadm/app/preflight/checks.go
Outdated
if sc.CheckIfActive && !initSystem.ServiceIsActive(sc.Service) { | ||
if sc.WantActive != initSystem.ServiceIsActive(sc.Service) { | ||
status := [2]string{"active", "not active"} | ||
command := [2]string{"stop", "start"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing of intent from the return is odd to me. I'd think that having a utility function, or augmenting the return arg would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this.
/uncc @dims |
Currently, the service activity checks logic is either a) I care about the active status and it needs to be active or b) I don't care about the active status This limits the usage and won't allow to check c) I care about the status and it needs to be NOT active The additional check is necessary to be control kubelet is started right time in the bootstrap process. The change in the preflight checks now errors if kubelet is running when kubeadm is called. Additionally, we change to start kubelet later in the bootstrap process. Currently, kubelet is started right after the preflight checks but before the configs are created. At that point, kubeadm has the assumption that kubelet goes in a crash-loop and starts OK after the configs are created. It may be that the crash-loop does not happen but kubelet runs in a standalone mode (e.g., in case if kubelet is started without --bootstrap-kubeconfig). This makes kubeadm to fail due to a timeout in markmaster when running 'kubeadm init'.
rebased + addressed the review feedback. |
@mythi: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign @detiber @stealthybox |
@@ -987,7 +990,7 @@ func addCommonChecks(execer utilsexec.Interface, cfg kubeadmapi.CommonConfigurat | |||
IsPrivilegedUserCheck{}, | |||
HostnameCheck{nodeName: cfg.GetNodeName()}, | |||
KubeletVersionCheck{KubernetesVersion: cfg.GetKubernetesVersion(), exec: execer}, | |||
ServiceCheck{Service: "kubelet", CheckIfActive: false}, | |||
ServiceCheck{Service: "kubelet", WantActive: false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to change the behavior of the kubelet check significantly here... It looks like previously CheckIfActive would skip the active check completely, but now with WantActive this is testing specifically for the kublet not being active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I'm suggesting. The reasons are explained in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but since we are checking for active
or activating
here wouldn't this preflight check fail by default on any system that has the kubelet enabled/started before running kubeadm (which is the default case on Ubuntu/Debian systems)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my observation: if kubelet is active
, kubeadm init
would fail in markmaster timeout (I was hit by this). If kubelet is activating
, it's actually crash-looping and can only start/recover once the configurations are created by kubeadm
.
So I thought to make it more bullet proof and make sure we start with kubelet not running at all. That's the state where kubeadm reset
leaves it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mythi that behavior makes perfect sense to me and is the default behavior when installing from the rpms. However the expected behavior with debs is that the service is enabled and started after installation, which is also the behavior with the current debs that we ship.
While I personally don't agree with the default behavior of enabling/starting daemons on package install, it is the expected behavior on Debian/Ubuntu systems.
That said, I believe the only reason that we are currently checking for active
or activating
is for the check in kubeadm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber I'm not sure I got your point wrt. to my change proposal. I don't mind to keep kubelet running as it is today (and I don't mind if this PR is not OK) as long as the behaviour is consistent. Right now you have to have it crash-looping in order to get kubeadm init
to succeed. For newcomers like me it'll take some time to realize why active
is not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mythi maybe I'm not fully grokking these changes, but it appears to me that the preflight check would currently fail on an Ubuntu host because we are testing that initSystem.ServiceIsActive() returns false for the kubelet, since ServiceIsActive() is checking for 'active' or 'activating' and the kubelet service on Ubuntu will be in 'activating' by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber Currently the preflight check for kubelet ignores the activity status (CheckIfActive
is set to false
).
@mythi: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't have the urgent need to get this merged since I know the workaround needed to address the failures I'm seeing. I'm closing this and look into alternatives to improve documentation to ensure people don't hit the same problem. |
What this PR does / why we need it:
The PR brings kubeadm improvements.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
The PR changes the service activity check logic in ServiceCheck. It is necessary because kubelet may be running (in the standalone mode) and the user doesn't necessarily now why the other checks (especially Port10250) are failing.
Release note: