New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Longhorn seems not to be able to connect to replica outside of the one which is on the same node affected to the pod #8451
Comments
To recapitulate, the cluster is spread between your home and Oracle cloud. When the replica is colocated with the pod on Proxmox at your home, it is functional, but replicas on the cloud nodes are not. As far as you know, there are no latency or permission issues between the nodes. |
@james-munson it doesn't depends on pod location, it can be on a node on oracle cloud or at my home but for the disk replica to be able to work correctly if it's not on the same node than the pod, it will keep failing and rebuilding over and over again. |
also i've corrected a mistake in my description it's master2 that is in my proxmox. For information there are connected through tailscale |
also for your analysis, if you need, i kept a copy of the folder nodes in the support bundle archive. |
In the logs there are any number of failures like:
and
and
All indicating timeouts between components. Coupled with the fact that replicas behave when local (meaning in the same data center) but not when they cross data center boundaries, the conclusion is clear that latency between sties is too large to run the block i/o reliably. This is not a cluster setup that Longhorn would recommend. |
On the previous Longhorn version 1.5 with kernel 5.5 and on previous version of k3s it was working perfectly fine on my setup |
Describe the bug
When i schedule a docker registry pod with a longhorn volume tied to it on the master node.
The two replicas are placed on master2 and on worker1 and they are always failing so the docker registry pod never start.
When i make the node worker1 unschedulable, one of the replicas goes into the master node and boom magically that replica work and the pod starts correctly but the other replica on master2 node is still failling.
To Reproduce
In longhorn v1.6.1, create a disk with 2 replicas which i've called registry.
In the UI, i've created the PV/PVC inside the namespace docker-registry, i've called the pvc registry and i've put ext4 as format.
Support bundle for troubleshooting
supportbundle_51ccfc95-cc51-470e-8d37-837371d0fb87_2024-04-25T22-10-41Z.zip
I've just removed the nodes folder inside the original archive, because it weighs more than 500Mb.
Environment
Additional context
The text was updated successfully, but these errors were encountered: