Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error: Transport is closing" message when attempting to install #3409

Closed
huang-jy opened this issue Jan 27, 2018 · 63 comments · Fixed by #3482
Closed

"Error: Transport is closing" message when attempting to install #3409

huang-jy opened this issue Jan 27, 2018 · 63 comments · Fixed by #3482

Comments

@huang-jy
Copy link

I am trying to install spinnaker from the kubeapps hub using:

helm install stable/spinnaker --name spinnaker -f values.yaml

This gave me no output but i could see the pods being created, and then later:

Error: transport is closing

Using --wait and --timeout didn't help.

Spinnaker seems to have spun up successfully, but since helm didn't finish registering the installation as complete, helm is stuck in "PENDING_INSTALL" state for spinnaker meaning I can't update/upgrade later.

Any ideas what might be happening?

@Nowaker
Copy link

Nowaker commented Jan 31, 2018

I've been experiencing the same problem. The deployment is "visibly" successful - yet it fails due to "Error: transport is closing" after around three minutes of waiting. This is happening whether I add --wait --timeout 600 or not. Moreover, the release looks just fine:

NAME                    	REVISION	UPDATED                 	STATUS  	CHART            	NAMESPACE
review-feature-ld-s9rxem	1       	Tue Jan 30 18:55:42 2018	DEPLOYED	lde-nginx-941.0.0	default

@huang-jy
Copy link
Author

In addition, letting ELB idle timeout to 3600, still led to the install failing within 3.5 minutes:

$ time helm install stable/spinnaker --name spinnaker -f values-personal.yaml --wait --timeout 3600 --debug
[debug] Created tunnel using local port: '37517'

[debug] SERVER: "127.0.0.1:37517"

[debug] Original chart version: ""
[debug] Fetched stable/spinnaker to /home/jjyooi/.helm/cache/archive/spinnaker-0.3.12.tgz

[debug] CHART PATH: /home/user/.helm/cache/archive/spinnaker-0.3.12.tgz

Error: transport is closing

real	3m31.836s
user	0m0.432s
sys	0m0.034s

@benlangfeld
Copy link
Contributor

benlangfeld commented Feb 2, 2018

I am also seeing timeouts right at 210s (3.5 minutes), regardless of the --timeout flag, with no LBs timing out in the middle. Indeed I see a FIN being sent from the client on two open sockets to the kube-apiserver. This happens while waiting on a post-install hook to execute to completion, and doesn't require passing --wait.

@huang-jy
Copy link
Author

huang-jy commented Feb 2, 2018

Managed to watch the tiller logs as I did another install. No major errors here, so something is cutting out in between. The only thing I can think of is the cluster itself. Helm is reporting to respect the timeout, the ELB is not timing out with a timeout of 3600 secs, so unless the cluster itself is cutting the connection out?

[tiller] 2018/02/02 19:51:46 uninstall: Release not loaded: spinnaker-blenderfox
[tiller] 2018/02/02 19:52:02 preparing install for spinnaker-blenderfox
[storage] 2018/02/02 19:52:02 getting release history for "spinnaker-blenderfox"
[tiller] 2018/02/02 19:52:03 rendering spinnaker chart using values
2018/02/02 19:52:07 info: manifest "spinnaker/charts/jenkins/templates/rbac.yaml" is empty. Skipping.
2018/02/02 19:52:08 info: manifest "spinnaker/charts/minio/templates/minio_statefulset.yaml" is empty. Skipping.
2018/02/02 19:52:08 info: manifest "spinnaker/templates/secrets/gcs.yaml" is empty. Skipping.
2018/02/02 19:52:08 info: manifest "spinnaker/charts/jenkins/templates/config.yaml" is empty. Skipping.
2018/02/02 19:52:08 info: manifest "spinnaker/charts/jenkins/templates/service-account.yaml" is empty. Skipping.
2018/02/02 19:52:11 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-networkpolicy.yaml" is empty. Skipping.
2018/02/02 19:52:11 info: manifest "spinnaker/charts/minio/templates/post-install-create-bucket-pod.yaml" is empty. Skipping.
2018/02/02 19:52:11 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-ingress.yaml" is empty. Skipping.
2018/02/02 19:52:11 info: manifest "spinnaker/charts/redis/templates/networkpolicy.yaml" is empty. Skipping.
2018/02/02 19:52:11 info: manifest "spinnaker/charts/minio/templates/minio_networkpolicy.yaml" is empty. Skipping.
2018/02/02 19:52:11 info: manifest "spinnaker/templates/ingress/deck.yaml" is empty. Skipping.
[tiller] 2018/02/02 19:52:16 performing install for spinnaker-blenderfox
[tiller] 2018/02/02 19:52:16 executing 7 pre-install hooks for spinnaker-blenderfox
[tiller] 2018/02/02 19:52:16 hooks complete for pre-install spinnaker-blenderfox
[storage] 2018/02/02 19:52:16 getting release history for "spinnaker-blenderfox"
[storage] 2018/02/02 19:52:16 creating release "spinnaker-blenderfox.v1"
[kube] 2018/02/02 19:52:23 building resources from manifest
[kube] 2018/02/02 19:52:28 creating 37 resource(s)
[kube] 2018/02/02 19:52:39 beginning wait for 37 resources with timeout of 1h0m0s
[kube] 2018/02/02 19:52:53 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:53:06 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:53:19 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:53:28 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:53:36 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:53:47 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:54:02 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:54:15 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:54:30 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:54:36 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/02 19:54:51 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver
[tiller] 2018/02/02 19:55:01 executing 7 post-install hooks for spinnaker-blenderfox
[kube] 2018/02/02 19:55:01 building resources from manifest
[kube] 2018/02/02 19:55:01 creating 1 resource(s)
[kube] 2018/02/02 19:55:04 Watching for changes to Job spinnaker-blenderfox-create-bucket with timeout of 1h0m0s
[kube] 2018/02/02 19:55:04 Add/Modify event for spinnaker-blenderfox-create-bucket: ADDED
[kube] 2018/02/02 19:55:04 spinnaker-blenderfox-create-bucket: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

@benlangfeld
Copy link
Contributor

@huang-jy In my case, it looks a lot like the helm CLI is timing out, since it's sending a FIN packet.

@huang-jy
Copy link
Author

huang-jy commented Feb 2, 2018

One thing I was going to try was with an older version of helm.

@benlangfeld
Copy link
Contributor

benlangfeld commented Feb 2, 2018

I suspect my problem is happening @ https://github.com/grpc/grpc-go/blob/424e3e9894f9206fca433fb4ba66f639be56e325/stream.go#L299-L300. I have this problem with both 2.7.2 and 2.8.0. Will try next with master.

@benlangfeld
Copy link
Contributor

Well, master is a no-go because I'd have to upgrade Tiller in my production kube install.

@huang-jy
Copy link
Author

huang-jy commented Feb 2, 2018

helm 2.6.1 was used in a udemy course and that installed spinnaker successfully (though not sure the spinnaker version it picked up) so maybe try that too (I'll do the same)

@huang-jy
Copy link
Author

huang-jy commented Feb 3, 2018

I used 2.6.0 and it didn't timeout on me. (spinnaker 0.3.12 was used). It waited properly. My spinnaker install didn't succeed (some of the containers were stuck in CrashLoopBackoffs)

@huang-jy
Copy link
Author

huang-jy commented Feb 3, 2018

So, I used 2.8.0 and latest spinnaker chart. It timed out on me halfway, but from the tiller logs, it still continued.

When I used 2.6.0, and the latest spinnaker chart, it waited for the resources, although they never seemed to come ready (possibly a spinnaker issue rather than helm)

@bacongobbler
Copy link
Member

bacongobbler commented Feb 5, 2018

Just curious, but has anyone tried looking at the FAQ and see if that resolved it for them? https://github.com/kubernetes/helm/blob/29358ef9cef85c8467434008a42bc07e5a0d2a85/docs/install_faq.md#getting-started

@bacongobbler
Copy link
Member

Interesting feedback, @huang-jy.

@Nowaker
Copy link

Nowaker commented Feb 5, 2018

@bacongobbler I'll have a look. That being said, I don't see the first two lines from the quoted FAQ. All I see is Error: transport is closing. But maybe the "missing" lines come from different sources? What sources?

E1014 02:26:32.885226   16143 portforward.go:329] an error occurred forwarding 37008 -> 44134: error forwarding port 44134 to pod tiller-deploy-2117266891-e4lev_kube-system, uid : unable to do port forwarding: socat not found.
2016/10/14 02:26:32 transport: http2Client.notifyError got notified that the client transport was broken EOF.
Error: transport is closing

@benlangfeld
Copy link
Contributor

benlangfeld commented Feb 5, 2018

@bacongobbler @Nowaker I also don't see those first two lines, and the install works just fine if I don't include my post-install hook (and correspondingly remove container health-checks, which fail until the hook is executed). The install actually completes despite the timeout (and the app is functional), but the deployment is never marked complete, and so a subsequent upgrade fails. It's not an outright functional issue, but an actual timeout.

@Nowaker
Copy link

Nowaker commented Feb 5, 2018

Seconded @benlangfeld. Release ends up as DEPLOYED even though Error: transport is closing hits me after a couple minutes.

@huang-jy
Copy link
Author

huang-jy commented Feb 6, 2018

@bacongobbler in my case, one of three things happened:

  1. Transport is closing happened on 2.8.0 and 2.8.1 during install, but the install continued to happen behind the scenes even after the error.
  2. Versions prior to 2.8.0 (checked with 2.7 and 2.6) didn't come up with this message and continued to wait
  3. During the install, Tiller gets killed by the cluster (not evicted)

@benlangfeld
Copy link
Contributor

Note that @huang-jy appears to be the only one with a problem in Tiller (eviction), and in only 1/3 of his cases. This issue normally has nothing to do with Tiller, and appears very much to be a client-side timeout. In my case, Tiller has never ceased to operate, and this is purely a client disconnection.

@huang-jy
Copy link
Author

huang-jy commented Feb 7, 2018

@benlangfeld Yes, it's a client disconnection and something I think within the 2.8.0 version. When I used <2.8.0 version on this cluster, I don't get a transport closed error. Sure, spinnaker doesn't install properly, but that's probably something with the chart and not helm.

I noticed there was a comment about increasing RAM. I might sizing up the worker nodes and see if that helps.

@huang-jy
Copy link
Author

huang-jy commented Feb 7, 2018

Increasing the box to r4.large didn't help, but I noticed on the pod logs, that when I thought the pod was evicted, it in fact wasn't, but was killed by the cluster

Logs from tiller

[tiller] 2018/02/07 09:19:21 preparing install for spinnaker-blenderfox
[storage] 2018/02/07 09:19:21 getting release history for "spinnaker-blenderfox"
[tiller] 2018/02/07 09:19:21 rendering spinnaker chart using values
2018/02/07 09:19:25 info: manifest "spinnaker/charts/minio/templates/post-install-create-bucket-pod.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/templates/secrets/gcs.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/templates/ingress/deck.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/charts/jenkins/templates/rbac.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/charts/minio/templates/minio_networkpolicy.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/charts/jenkins/templates/config.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/redis/templates/networkpolicy.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/minio/templates/minio_statefulset.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-ingress.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/jenkins/templates/service-account.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-networkpolicy.yaml" is empty. Skipping.
[tiller] 2018/02/07 09:19:31 performing install for spinnaker-blenderfox
[tiller] 2018/02/07 09:19:31 executing 7 pre-install hooks for spinnaker-blenderfox
[tiller] 2018/02/07 09:19:31 hooks complete for pre-install spinnaker-blenderfox
[storage] 2018/02/07 09:19:31 getting release history for "spinnaker-blenderfox"
[storage] 2018/02/07 09:19:31 creating release "spinnaker-blenderfox.v1"
[kube] 2018/02/07 09:19:35 building resources from manifest
[kube] 2018/02/07 09:19:39 creating 37 resource(s)
[kube] 2018/02/07 09:19:44 beginning wait for 37 resources with timeout of 5m0s
[kube] 2018/02/07 09:20:06 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:20:35 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:21:06 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:21:21 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:21:40 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:22:01 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:22:31 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:22:54 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver

>>Container crashed out here, pod restarted<<

[main] 2018/02/07 09:23:08 Starting Tiller v2.8.0 (tls=false)
[main] 2018/02/07 09:23:08 GRPC listening on :44134
[main] 2018/02/07 09:23:08 Probes listening on :44135
[main] 2018/02/07 09:23:08 Storage driver is ConfigMap
[main] 2018/02/07 09:23:08 Max history per release is 0

Pod logs

  Normal   Killing                6m                kubelet, ip-10-10-20-112.eu-west-2.compute.internal  Killing container with id docker://tiller:Container failed liveness probe.. Container will be killed and recreated.

@benlangfeld
Copy link
Contributor

So, I reproduced this without using hooks, against minikube and with a minimal chart: https://gist.github.com/benlangfeld/005f5d934c074d67a34fe9f881c84e89

While this particular deployment would of course never succeed (because of the impossible healthchecks), I would not expect it to time out in 210s as it does, but rather continue until the timeout at 300 seconds indicated in the Tiller log, as is the primary contention in this ticket.

@cmdshepard
Copy link

cmdshepard commented Feb 7, 2018

Having the same issue on 2.8.0. Helm's deployment status is DEPLOYED, yet the Helm client exits with the Error: Transport is closing error. It started happening after I upgraded Tiller & Helm from 2.6.0 to 2.8.0. Any ideas on how to mitigate this issue? Quite annoying especially in a CI environment.

@cmdshepard
Copy link

cmdshepard commented Feb 7, 2018

This is the output from Tiller when the error occurs:

[kube] 2018/02/07 20:04:33 Watching for changes to Job staging-stored-value-migration-job with timeout of 10m0s
[kube] 2018/02/07 20:04:34 Add/Modify event for staging-stored-value-migration-job: ADDED
[kube] 2018/02/07 20:04:34 staging-stored-value-migration-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 

@benlangfeld
Copy link
Contributor

benlangfeld commented Feb 7, 2018

My reproduction has the expected behaviour on v2.7.2 (both client and Tiller), timing out at 300 seconds. The same is true for a v2.7.2 client against a v2.8.0 Tiller server. So the bug is in client code somewhere here: v2.7.2...v2.8.0 . I'll see if I can bisect that tomorrow to identify the problem commit. The most suspicious commit is, of course, 838d780

@cmdshepard
Copy link

cmdshepard commented Feb 7, 2018

I downgraded to 2.6.0. No longer have the issue. I have timeout set to 10 mins. Tiller is honoring the timeout but the Helm client does not.

gerhard added a commit to EngineerBetter/kcf that referenced this issue Jul 5, 2018
Otherwise, the kubelet will be rebooted and any tiller releases will
fail to install: helm/helm#3409 (comment)
@Starefossen
Copy link

Starefossen commented Aug 8, 2018

This happens when tiller goes away during an install. In our case it was because Azure Container Service decided to downgrade tiller to a previous version... not Helm's fault.

@uxon123
Copy link

uxon123 commented Aug 24, 2018

I am still encountering this problem. My env is on premise k8s, helm client: 2.8.2, helm server: 2.9.1.
I am getting the error whenever I have a job in my chart configured as post-install hook and I try to 'upgrade --install' or 'delete' the chart's release, after exactly 1 minute of helm client waiting for install/delete completion.
If I don't configure the job as post-install hook, the problem disappears.

@huang-jy
Copy link
Author

@uxon123 I had that originally -- it turned out tiller was getting booted off the node and I added an extra node to fix that. Now I know your env is on-prem, but if you have resource, can you try adding an extra node?

@uxon123
Copy link

uxon123 commented Aug 24, 2018

extra tiller node? you mean extra tiller pod to the tillers deployment or what?

@huang-jy
Copy link
Author

No, extra kubernetes node.

@uxon123
Copy link

uxon123 commented Aug 24, 2018

Hm, I don't know if I fully understand this. You say that the reason of your problem (identical like I am encountering) was there was not enough resources on k8s?
If that was the case, running a job not from post-install, in my opinion should not resolve the problem (it needs the same amount of resources, right?). Tiller is responding if I try to install or uninstall some other releases in the meantime.
The problem occurs only when I use a post-install job, so when I enforce to wait for an answer. I am getting error everytime after nearly exactly 1 minute of waiting (but the tiller continues doing its job on a cluster).

@huang-jy
Copy link
Author

You can check this if you do a watch kubectl get pods and keep an eye on the tiller pod during the helm install.

If it disappears or crashes out, then it's likely tiller is crashing out, as it did in my case.

@uxon123
Copy link

uxon123 commented Aug 24, 2018

Ok, I did what you said. I watched the tiller pod and I executed upgrade --install. The tiller pod neither disappeared nor crashed. I also checked tiller's replica set with describe command and no pods were failed etc. So I think I can eliminate problem with tiller. Thanks for your input though :)

@huang-jy
Copy link
Author

You can also try tailing tiller's logs during the install. See if there's any errors during that time.

@uxon123
Copy link

uxon123 commented Aug 27, 2018

I checked it again and there are no problems with tiller in my case (logs are clean, no pod restarts, etc.).
It seems that the 1 minute timeout is not that long by a coincidence. I found out that all connections to my k8s cluster are routed by loadbalancer wchich is configured with 60 seconds timeouts. So connections are being killed by a load balancer after 60 seconds of inactivity.
So it looks like there are no keepalives between tiller client and server. Shouldn't tiller client be sending keepalives every 30 seconds? (#3183)

@huang-jy
Copy link
Author

Are you running on AWS? Then you should check your api loadbalancer's connection timeout. If I remember right, it's only 60 secs by default.

@makandas
Copy link

Add --tls flag to helm install command.

@huang-jy
Copy link
Author

@makandas --tls relates to the connection between helm and tiller, IIRC.

If this error came up without anything being installed, I would agree that switch would be an option. But this is not the case. Here, Helm can talk to Tiller, initiate the install, but the connection is somehow closed off too early. This ticket has already been closed with a solution, and has been verified by several users.

splisson pushed a commit to splisson/helm that referenced this issue Dec 6, 2018
helm#3183 added a keepalive to the Helm client of 30s period, while Tiller was never configured to permit this, keeping the default minimum keepalive period of 5 minutes, disconnecting any clients which ping more regularly than this.

This commit enforces a minimum that is lower than what Helm is configured for, preventing these disconnections, and thus fixes helm#3409.
splisson pushed a commit to splisson/helm that referenced this issue Dec 6, 2018
helm#3183 added a keepalive to the Helm client of 30s period, while Tiller was never configured to permit this, keeping the default minimum keepalive period of 5 minutes, disconnecting any clients which ping more regularly than this.

This commit enforces a minimum that is lower than what Helm is configured for, preventing these disconnections, and thus fixes helm#3409.
@alanwds
Copy link

alanwds commented Jan 23, 2019

I'm facing this issue yet. Event on the last version available 2.12.3.

I can see that tiller just crash with a log of errors

[tiller] 2019/01/23 16:25:50 preparing install for my-release
[storage] 2019/01/23 16:25:50 getting release history for "my-release"
[tiller] 2019/01/23 16:25:50 rendering my-release chart using values
panic: should not happen [recovered]
	panic: should not happen

goroutine 29 [running]:
k8s.io/helm/vendor/gopkg.in/yaml%2ev2.handleErr(0xc00078af78)
	/go/src/k8s.io/helm/vendor/gopkg.in/yaml.v2/yaml.go:164 +0x9a
....

I'm using at GCP (GKE version: 1.11.5-gke.5)

@AndrewDryga
Copy link

@alanwds same here.

@bacongobbler
Copy link
Member

bacongobbler commented Feb 11, 2019

@alanwds from the truncated panic stack trace, I can only see that it stemmed from the yaml parser. What's the output of helm template on that chart? Do you have the full output of that stack trace somewhere?

@bacongobbler
Copy link
Member

Same question to you, @AndrewDryga :)

@AndrewDryga
Copy link

@bacongobbler for us it looks like Tiller is crashing once we deploy, there is no issue in templates as they did not change since last deployment (which passed while Tiller was available).

Name:               tiller-deploy-58b6bf5687-2498x
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               gke-staging-default-pool-4c48bc57-n6kx/10.142.0.7
Start Time:         Mon, 11 Feb 2019 22:01:37 +0200
Labels:             app=helm
                    name=tiller
                    pod-template-hash=1462691243
Annotations:        cni.projectcalico.org/podIP: 10.16.0.51/32
Status:             Running
IP:                 10.16.0.51
Controlled By:      ReplicaSet/tiller-deploy-58b6bf5687
Containers:
  tiller:
    Container ID:   docker://c5d2f01465da5f3b309fcc8b2b39c21015de204d125a944ba79333c514649901
    Image:          gcr.io/kubernetes-helm/tiller:v2.12.3
    Image ID:       docker-pullable://gcr.io/kubernetes-helm/tiller@sha256:cab750b402d24dd7b24756858c31eae6a007cd0ee91ea802b3891e2e940d214d
    Ports:          44134/TCP, 44135/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 12 Feb 2019 20:08:52 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 12 Feb 2019 20:05:28 +0200
      Finished:     Tue, 12 Feb 2019 20:05:54 +0200
    Ready:          False
    Restart Count:  18
    Liveness:       http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment:
      TILLER_NAMESPACE:    kube-system
      TILLER_HISTORY_MAX:  0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from tiller-token-r74zd (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tiller-token-r74zd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tiller-token-r74zd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                                             Message
  ----     ------     ----                  ----                                             -------
  Warning  Unhealthy  10m (x15 over 20h)    kubelet, gke-staging-default-pool-4c48bc57-n6kx  Liveness probe failed: Get http://10.16.0.51:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  10m (x28 over 12h)    kubelet, gke-staging-default-pool-4c48bc57-n6kx  Readiness probe failed: Get http://10.16.0.51:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    6m51s (x14 over 22h)  kubelet, gke-staging-default-pool-4c48bc57-n6kx  Killing container with id docker://tiller:Container failed liveness probe.. Container will be killed and recreated.
  Warning  BackOff    2m4s (x104 over 22h)  kubelet, gke-staging-default-pool-4c48bc57-n6kx  Back-off restarting failed container

logs -p gives empty return :(.

jianghang8421 pushed a commit to jianghang8421/helm that referenced this issue Feb 17, 2019
helm#3183 added a keepalive to the Helm client of 30s period, while Tiller was never configured to permit this, keeping the default minimum keepalive period of 5 minutes, disconnecting any clients which ping more regularly than this.

This commit enforces a minimum that is lower than what Helm is configured for, preventing these disconnections, and thus fixes helm#3409.
@JPWKU
Copy link

JPWKU commented Apr 22, 2019

I just had this error and it turned out to be I had to little resources set for the tiller deploy.

@phil-lgr
Copy link

Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}

helm hangs with the --wait command

@imriss
Copy link

imriss commented Apr 22, 2020

@JPWKU, could you please provide some info on how much resources for the tilelr deploy? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.