-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpcurl not closing threads when exiting due to network issues (transport is closing) #382
Comments
@deep27ak, I don't really understand what your are showing. From the error message, it looks like either the server or some proxy in between grpcurl and the server is closing the connection prematurely. From its client point of view, there is nothing to close as the remote side has already closed the socket. Also, even if the the process exited uncleanly, the OS should clean up orphaned resources such as closing sockets/file descriptors if any were left open. In the Also, what OS? Is that OS X or Linux? |
grpcurl version: 1.8.7 Our architecture is Container -> Kubernetes Edge Node to handle External Egress -> Kubernetes pod acting as Load Balancer -> gRPC server Now the Problem-1: The problem from the client is that the actual PID is closed as part of above issue, but the threads which the PID had opened on the underlying kubernetes worker nodes was never cleaned up. So this open FD list keeps piling up and we fear at one point we will run out of open FD. As per For example, for one of the PIDs from previous list we don't have any process running
Problem-2: Every time the client connection is terminated with " So for example 1 When we manually clear these ESTABLISHED connections from the Load Balancer, then the zombie threads are also automatically cleared. But as long as the client is running and as soon as it is closed due to the network issue, it seem to just start creating these ESTABLISHED connection on Load Balancer every few minutes.
|
So... what code changes would be needed in grpcurl to make this go away? Sounds like you are in a lot better shape to debug this than we are. |
I am worried there is something weird going on with your Linux distro. As you already stated, the PID goes away for the process. When the process exits, the OS should clean up all open file descriptors and threads. This is surprising behavior that we've not seen before on any platform. The code already closes all connections at the end: https://github.com/fullstorydev/grpcurl/blob/master/cmd/grpcurl/grpcurl.go#L522-L538 When there is an error invoking the RPC, the code closes the connects and then calls Also you said you get an error with "transport is closing", however the actual error message you included reads "connection reset by peer". Can you post the output of the error when it's the former? That error suggests that somehow the code was trying to close the connection (on the client side) before sending the RPC. That doesn't make any sense -- but it also makes even less sense if that is the situation in which sockets connection to the LB are stuck open. |
As I can see in my environment, the execution of grpcurl creates around 15 Threads but I could not find any goroutines in the code you shared for grpcurl. Is there any other dependent module which creates these threads? I am wondering if
This will output "exit status 1" but will not execute the defer call. We were getting "transport is closing" errors but in the last setup where I was trying to reproduce this issue, I got error as "connection is reset". For now I have enhanced our wrapper golang code which executes the grpcurl to send SIGTERM to the client using a goroutine to look out for any such message in the gRPC stream but I am worried of race conditions as the grpcurl exits immediately after getting such error and if we fail to send SIGTERM in that instance then we will have zombie threads running on the worker nodes. At least for all my tests, this seems to be working. Even with the solution I added in my wrapper code, the connections on the server (LB) is not cleaned which quiet expected as the connection was broken between the server and client. Now the client in my case is getting a specific message but server doesn't get anything other than RST signal. This is something we need to check from our server code side (not related to grpcurl) as when the server receives the RST signal so in such case the LB should clear respective connection. |
Yes. If you look at the code I linked, it does not rely on |
We have a grpcurl client which is connecting to a gRPC server. Due to some network issue in our environment, our clients are terminated with error "transport is closing"
When we start grpcurl with -vv option, then we receive this message during termination
In such scenarios the threads which were opened by grpcurl are not closed and it keeps pilling up on the node:
Sample lsof output
Is there any way to make sure grpcurl connections are gracefully terminated and all threads are closed before exiting?
The text was updated successfully, but these errors were encountered: