Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm-1.15.3: pod-eviction-timeout is ignored #2

Closed
kskmori opened this issue Aug 29, 2019 · 2 comments
Closed

kubeadm-1.15.3: pod-eviction-timeout is ignored #2

kskmori opened this issue Aug 29, 2019 · 2 comments

Comments

@kskmori
Copy link
Owner

kskmori commented Aug 29, 2019

revision: 1d9a9b6 2019-08-29 Update versions to kubernetes 1.15.3 and the latest documents

It takes 5 minutes until the pods are evicted after a node failure regardless of the kubeadm init config below. It has been working as expected in kubeadm-1.11.3.

controllerManager:
  extraArgs:
    node-monitor-grace-period: "20s"
    pod-eviction-timeout: "40s"

Diagnosis:

As of 1.13, Taint based Evictions is enabled and needs to be configured for it instead of pod-eviction-timeout.

[root@master osc2018tk-demo]# kubectl describe pod postgres-0 | grep Tolerations: -A 1
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

Ref.

kskmori added a commit that referenced this issue Aug 29, 2019
…ion-timeout (fixes #2)

NOTE:
  Taint based Evictions timeout starts since the node status changed to NotReady (or Unreachable)
  so it would take 40s in total after the time of the actual failure:
    40s = node-monitor-grace-period(20s) + default-not-ready-toleration-seconds(20s)
  as it's equal to pod-eviction-timeout=40s
@kskmori
Copy link
Owner Author

kskmori commented Aug 29, 2019

fixed in 26076cb

[root@master osc2018tk-demo]# kubectl describe pod postgres-0 | grep Tolerations: -A 1
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 20s
                 node.kubernetes.io/unreachable:NoExecute for 20s

@kskmori
Copy link
Owner Author

kskmori commented Aug 29, 2019

Another note: the pod status is now shown as "Terminating" instead of "Unknown" in 1.11, but the service availability is same (can not fail over in the event of a node failure).

[root@master osc2018tk-demo]# kubectl get pods -o wide
NAME                     READY   STATUS        RESTARTS   AGE     IP           NODE      NOMINATED NODE   READINESS GATES
httpd-84b6977f6d-dhkrn   1/1     Running       0          3m46s   10.244.1.3   worker2   <none>           <none>
httpd-84b6977f6d-fc5rd   1/1     Terminating   0          9m40s   10.244.2.3   worker1   <none>           <none>
httpd-84b6977f6d-pfwjt   1/1     Running       0          9m40s   10.244.1.2   worker2   <none>           <none>
postgres-0               1/1     Terminating   0          9m43s   10.244.2.2   worker1   <none>           <none>

@kskmori kskmori closed this as completed Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant