-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Error: Transport is closing" message when attempting to install #3409
Comments
I've been experiencing the same problem. The deployment is "visibly" successful - yet it fails due to "Error: transport is closing" after around three minutes of waiting. This is happening whether I add
|
In addition, letting ELB idle timeout to 3600, still led to the install failing within 3.5 minutes:
|
I am also seeing timeouts right at 210s (3.5 minutes), regardless of the |
Managed to watch the tiller logs as I did another install. No major errors here, so something is cutting out in between. The only thing I can think of is the cluster itself. Helm is reporting to respect the timeout, the ELB is not timing out with a timeout of 3600 secs, so unless the cluster itself is cutting the connection out?
|
@huang-jy In my case, it looks a lot like the helm CLI is timing out, since it's sending a FIN packet. |
One thing I was going to try was with an older version of helm. |
I suspect my problem is happening @ https://github.com/grpc/grpc-go/blob/424e3e9894f9206fca433fb4ba66f639be56e325/stream.go#L299-L300. I have this problem with both 2.7.2 and 2.8.0. Will try next with master. |
Well, master is a no-go because I'd have to upgrade Tiller in my production kube install. |
helm 2.6.1 was used in a udemy course and that installed spinnaker successfully (though not sure the spinnaker version it picked up) so maybe try that too (I'll do the same) |
I used 2.6.0 and it didn't timeout on me. (spinnaker 0.3.12 was used). It waited properly. My spinnaker install didn't succeed (some of the containers were stuck in CrashLoopBackoffs) |
So, I used 2.8.0 and latest spinnaker chart. It timed out on me halfway, but from the tiller logs, it still continued. When I used 2.6.0, and the latest spinnaker chart, it waited for the resources, although they never seemed to come ready (possibly a spinnaker issue rather than helm) |
Just curious, but has anyone tried looking at the FAQ and see if that resolved it for them? https://github.com/kubernetes/helm/blob/29358ef9cef85c8467434008a42bc07e5a0d2a85/docs/install_faq.md#getting-started |
Interesting feedback, @huang-jy. |
@bacongobbler I'll have a look. That being said, I don't see the first two lines from the quoted FAQ. All I see is
|
@bacongobbler @Nowaker I also don't see those first two lines, and the install works just fine if I don't include my post-install hook (and correspondingly remove container health-checks, which fail until the hook is executed). The install actually completes despite the timeout (and the app is functional), but the deployment is never marked complete, and so a subsequent upgrade fails. It's not an outright functional issue, but an actual timeout. |
Seconded @benlangfeld. Release ends up as |
@bacongobbler in my case, one of three things happened:
|
Note that @huang-jy appears to be the only one with a problem in Tiller (eviction), and in only 1/3 of his cases. This issue normally has nothing to do with Tiller, and appears very much to be a client-side timeout. In my case, Tiller has never ceased to operate, and this is purely a client disconnection. |
@benlangfeld Yes, it's a client disconnection and something I think within the 2.8.0 version. When I used <2.8.0 version on this cluster, I don't get a transport closed error. Sure, spinnaker doesn't install properly, but that's probably something with the chart and not helm. I noticed there was a comment about increasing RAM. I might sizing up the worker nodes and see if that helps. |
Increasing the box to r4.large didn't help, but I noticed on the pod logs, that when I thought the pod was evicted, it in fact wasn't, but was killed by the cluster Logs from tiller
Pod logs
|
So, I reproduced this without using hooks, against minikube and with a minimal chart: https://gist.github.com/benlangfeld/005f5d934c074d67a34fe9f881c84e89 While this particular deployment would of course never succeed (because of the impossible healthchecks), I would not expect it to time out in 210s as it does, but rather continue until the timeout at 300 seconds indicated in the Tiller log, as is the primary contention in this ticket. |
Having the same issue on |
This is the output from Tiller when the error occurs:
|
My reproduction has the expected behaviour on v2.7.2 (both client and Tiller), timing out at 300 seconds. The same is true for a v2.7.2 client against a v2.8.0 Tiller server. So the bug is in client code somewhere here: v2.7.2...v2.8.0 . I'll see if I can bisect that tomorrow to identify the problem commit. The most suspicious commit is, of course, 838d780 |
I downgraded to |
Otherwise, the kubelet will be rebooted and any tiller releases will fail to install: helm/helm#3409 (comment)
This happens when tiller goes away during an install. In our case it was because Azure Container Service decided to downgrade tiller to a previous version... not Helm's fault. |
I am still encountering this problem. My env is on premise k8s, helm client: 2.8.2, helm server: 2.9.1. |
@uxon123 I had that originally -- it turned out tiller was getting booted off the node and I added an extra node to fix that. Now I know your env is on-prem, but if you have resource, can you try adding an extra node? |
extra tiller node? you mean extra tiller pod to the tillers deployment or what? |
No, extra kubernetes node. |
Hm, I don't know if I fully understand this. You say that the reason of your problem (identical like I am encountering) was there was not enough resources on k8s? |
You can check this if you do a If it disappears or crashes out, then it's likely tiller is crashing out, as it did in my case. |
Ok, I did what you said. I watched the tiller pod and I executed upgrade --install. The tiller pod neither disappeared nor crashed. I also checked tiller's replica set with describe command and no pods were failed etc. So I think I can eliminate problem with tiller. Thanks for your input though :) |
You can also try tailing tiller's logs during the install. See if there's any errors during that time. |
I checked it again and there are no problems with tiller in my case (logs are clean, no pod restarts, etc.). |
Are you running on AWS? Then you should check your api loadbalancer's connection timeout. If I remember right, it's only 60 secs by default. |
Add --tls flag to helm install command. |
@makandas --tls relates to the connection between helm and tiller, IIRC. If this error came up without anything being installed, I would agree that switch would be an option. But this is not the case. Here, Helm can talk to Tiller, initiate the install, but the connection is somehow closed off too early. This ticket has already been closed with a solution, and has been verified by several users. |
helm#3183 added a keepalive to the Helm client of 30s period, while Tiller was never configured to permit this, keeping the default minimum keepalive period of 5 minutes, disconnecting any clients which ping more regularly than this. This commit enforces a minimum that is lower than what Helm is configured for, preventing these disconnections, and thus fixes helm#3409.
helm#3183 added a keepalive to the Helm client of 30s period, while Tiller was never configured to permit this, keeping the default minimum keepalive period of 5 minutes, disconnecting any clients which ping more regularly than this. This commit enforces a minimum that is lower than what Helm is configured for, preventing these disconnections, and thus fixes helm#3409.
I'm facing this issue yet. Event on the last version available 2.12.3. I can see that tiller just crash with a log of errors
I'm using at GCP (GKE version: 1.11.5-gke.5) |
@alanwds same here. |
@alanwds from the truncated panic stack trace, I can only see that it stemmed from the yaml parser. What's the output of |
Same question to you, @AndrewDryga :) |
@bacongobbler for us it looks like Tiller is crashing once we deploy, there is no issue in templates as they did not change since last deployment (which passed while Tiller was available).
|
helm#3183 added a keepalive to the Helm client of 30s period, while Tiller was never configured to permit this, keeping the default minimum keepalive period of 5 minutes, disconnecting any clients which ping more regularly than this. This commit enforces a minimum that is lower than what Helm is configured for, preventing these disconnections, and thus fixes helm#3409.
I just had this error and it turned out to be I had to little resources set for the tiller deploy. |
helm hangs with the |
@JPWKU, could you please provide some info on how much resources for the tilelr deploy? Thanks |
I am trying to install spinnaker from the kubeapps hub using:
helm install stable/spinnaker --name spinnaker -f values.yaml
This gave me no output but i could see the pods being created, and then later:
Error: transport is closing
Using
--wait
and--timeout
didn't help.Spinnaker seems to have spun up successfully, but since helm didn't finish registering the installation as complete, helm is stuck in "PENDING_INSTALL" state for spinnaker meaning I can't update/upgrade later.
Any ideas what might be happening?
The text was updated successfully, but these errors were encountered: