-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transport endpoint is not connected #4330
Comments
This is a recurrent issue, as we have faced very similar problems in the past. And as the OP mentioned above, my team and I have upgraded from 6.x, to 8.x and now with 10.5 we faced a very similar problem. |
Please share the full mount logs from the client machine where you observed this issue. |
@aravindavk Hi Sir, thanks for your attention. Please, check the following logs and lets us know if there is something that can fix this issue. This problem is being reiterative with our client and is getting anoying. These are the logs from the client that had the issue. |
@aravindavk Hi Sir May we have a loop on this? |
@aravindavk Hi Sir May we have a follow up on this? |
Any update? |
Description of problem
This is a random problem related with gluster client disconnection and we cant reproduce it always, it occurs randomly (we guess this occur under heavy loads to the SDS). We have upgraded from gluster
8.4
, passing through all versions of gluster10.x
and even with latest10.5
we keep facing same problem. The mount point get a brief disconnection, and thats is fatal for an SDS providing service to VMs. This time the mount point automatically recovered itself, but that brief disconnection is enough to throw to I/O errors all VMs currently running in the node.In this new version of gluster the problem was mitigated to only the affected volume. Before this, was required a reboot to the entire node, because affected all gluster mount points in the affected node. So, is the same base problem, but now different behavior. I know that Gluster Distributed Two ways Replicated is not the best solution, and with Replica 3 I might not face this problem on same way, because of the quorum and the protections against the node disconnections...but is there any way to fix this gluster client disconnection?
Expected results
Don't getting disconnection from the rest of the cluster
Mandatory info:**
The output of the
gluster volume info
commandThe output of the
gluster volume status
commandThe output of the
gluster volume heal
commandAt the moment of writing this, there wasnt any entries on healing, but there was healing, reported by the monitoring system (Zabbix) and our custom checks for it:
Provide logs present on following locations of client and server nodes
No error on glusterd:
Is there any crash ? Provide the backtrace and coredump
My node4 is a gluster client and got disconnected from the cluster.
The operating system / glusterfs version
On each node:
On server nodes:
On node4 (client):
The text was updated successfully, but these errors were encountered: