-
Hi, I have a strage behivour on some pods. I see 4 replicas instead of 3, on of wich is "a ghost": Replicas are on nodes 1,2,4 The pod is properly running, as if using a proper replica. How to properly debug and solve this problem? Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 1 reply
-
Could you set the setting |
Beta Was this translation helpful? Give feedback.
-
Hi @mantissahz How to see if the log-level has been deployed?
Thank you. |
Beta Was this translation helpful? Give feedback.
-
Update: I actually have 3 workloads with the "Local Replica Scheduling Failure Error Message: tags not fulfilled" error: 1 pod with just a scheduling failure flag, no ghost replica 1 pod with the scheduling replica and just a "no data locality" flag 1 pod with both scheduling flag and data locality flag |
Beta Was this translation helpful? Give feedback.
-
Hi, Support Bundle sent (as a Gdrive link). PS: the third pod above does not have the same scheduling flag, but does have the ghost replica. |
Beta Was this translation helpful? Give feedback.
-
Hi @mantissahz I see that the logs from the longhorn-manager of the node(s) involved says: Pod/pvc with both no data locality and not schedulable
Pod/pvc with no data locality
Pod/pvc not schedulable
|
Beta Was this translation helpful? Give feedback.
-
Another insight: If I set replicas=0 in the statefulset of a pod with no data locality, the ghost replica disappears.
|
Beta Was this translation helpful? Give feedback.
-
@mantissahz found out that, for the "no data locality" problem, I had a label mismatch on node/disk. I have 6 nodes, each with two disks (a virtual disk and a physical nvme disk, for a two-tier storage system). Found out that the virtual disk on node5 was labeled "nvme", as soon as I re-labeled it "vdisk" the "no data locality" problem vanished. Also, there are no more "ghost" volumes, probably those where due to the inability to create proper volumes of tier "vdisk" on node5 on which both disks were labeled "nvme". I still have the "Scheduling Failure Local Replica Scheduling Failure Error Message: tags not fulfilled" problem, can't find a way to solve it. |
Beta Was this translation helpful? Give feedback.
-
Hi @mantissahz I finally deleted the unschedulable volumes, as the "tags not fulfilled" was probably due to the nodes/disks tag mismatch on node5 during first creation, and for some reason they didn't get the schedulable status cleared after the tag mismatch fixing. Thank you very much for your help. |
Beta Was this translation helpful? Give feedback.
Hi @mantissahz
I finally deleted the unschedulable volumes, as the "tags not fulfilled" was probably due to the nodes/disks tag mismatch on node5 during first creation, and for some reason they didn't get the schedulable status cleared after the tag mismatch fixing.
Thank you very much for your help.