New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduling issue when multiple nodes with different volume groups on nodes #909
Comments
Is my understanding correct that this should work or is there some problem with my usage? I will try to dig deeper also into this problem |
Hi topolvm-node put capacity information as annotation for each node. Would you show me the annotations like below?
VG may be not match from lvmd.yaml, Would you show me the output of |
Thanks for the response, I think the annotations looks similar to the lvmd yamls to me 🤔
The VGs, (not sure if these warnings are related):
|
This behavior may caused by a mistake setting and a limitation of TopoLVM. First, I found a mistake in the storage class. Would you fix your SC like below? allowedTopologies:
- matchLabelExpressions:
- key: topology.topolvm.io/node
values:
- topolvm-e2e-worker #👈 I would expect to be notified of an error if there is no matching node when allowedTopologies is specified, but I don't know why Pod scheduling should continue. This feature should be k8s common behavior, so if you want to know why, you will have to ask the upstream community. Second, there is a limitation of topolvm scheduler described in the doc below. |
Thanks a lot @llamerada-jp for the help! |
I'm glad I could help you.
If the topology is not present, pods will be scheduled without considering free space and thus may fail to allocate the volume. This is a limitation. I thought we wrote this in the |
I see I mean it makes sense that it works like this, but at least to me it is not clear from the |
Describe the bug
When a volume group only exists on a single node the default scheduler might try to schedule it to a pod where the volume group does not exist, this will cause a failure and it won't be tried to scheduled to the node with the volume group. This happens even though a topology is defined in the storage class. This is reproducible with e2e test-env when lvm config is modified
To Reproduce
Steps to reproduce the behavior:
Create e2e test with the following setup:
Add topology to the storage-class
Deploy pod + pvc with storageclass topolvm-provisioner
See error in controller and node logs
Node:
topolvm-node-cns85 3/3 Running 0 3d22h 10.244.1.2 topolvm-e2e-worker2
topolvm-node-ngf8m 3/3 Running 0 3d22h 10.244.3.3 topolvm-e2e-worker3
topolvm-node-vmnhh 3/3 Running 0 3d22h 10.244.2.4 topolvm-e2e-worker
Controller logs:
Node logs (topolvm-node-cns85):
Node logs ((topolvm-node-vmnhh) nothing happening here):
Expected behavior
I would expect the pod to be provisioned to the worker with the volume group dc1 and not try to provision to a node that does not have it.
The text was updated successfully, but these errors were encountered: