Scheduling issue when multiple nodes with different volume groups on nodes #909

NymanRobin · 2024-05-07T11:29:27Z

Describe the bug
When a volume group only exists on a single node the default scheduler might try to schedule it to a pod where the volume group does not exist, this will cause a failure and it won't be tried to scheduled to the node with the volume group. This happens even though a topology is defined in the storage class. This is reproducible with e2e test-env when lvm config is modified

To Reproduce
Steps to reproduce the behavior:

Create e2e test with the following setup:
- node1 (dc1, dc2)
- node2 (node2-raid1-1, node2-raid1-2)
- node3 (node3-thin1)
Add topology to the storage-class

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: topology.topolvm.io/node
    values:
    - topology.topolvm.io/node=topolvm-e2e-worker
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: topolvm
    meta.helm.sh/release-namespace: topolvm-system
  creationTimestamp: "2024-05-03T12:28:40Z"
  labels:
    app.kubernetes.io/instance: topolvm
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: topolvm
    app.kubernetes.io/version: 0.28.0
    helm.sh/chart: topolvm-14.1.1
  name: topolvm-provisioner
  resourceVersion: "798918"
  uid: 87482345-5a11-484f-9f2f-04a37f3bc820
parameters:
  csi.storage.k8s.io/fstype: xfs
  topolvm.io/device-class: dc1
provisioner: topolvm.io
reclaimPolicy: Delete
volume
BindingMode: WaitForFirstConsumer

Deploy pod + pvc with storageclass topolvm-provisioner
See error in controller and node logs

Node:
topolvm-node-cns85 3/3 Running 0 3d22h 10.244.1.2 topolvm-e2e-worker2
topolvm-node-ngf8m 3/3 Running 0 3d22h 10.244.3.3 topolvm-e2e-worker3
topolvm-node-vmnhh 3/3 Running 0 3d22h 10.244.2.4 topolvm-e2e-worker

Controller logs:

{"level":"info","ts":"2024-05-06T09:44:38Z","logger":"LogicalVolume","msg":"waiting for setting 'status.volumeID'","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"}
{"level":"info","ts":"2024-05-06T09:45:25Z","logger":"driver.controller","msg":"CreateVolume called","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","device_class":"dc1","required":134217728,"limit":0,"parameters":{"topolvm.io/device-class":"dc1"},"num_secrets":0,"capabilities":[{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}}],"content_source":"<nil>","accessibility_requirements":"requisite:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker2\" > > requisite:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker3\" > > requisite:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker\" > > preferred:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker2\" > > preferred:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker3\" > > preferred:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker\" > > "}

Node logs (topolvm-node-cns85):

{"level":"error","ts":"2024-05-06T09:45:25Z","msg":"failed to get list of LV","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"49831b87-6969-41ba-9981-8d4ce0813225","error":"rpc error: code = NotFound desc = device-class not found: dc1","stacktrace":"github.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).volumeExists\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:166\ngithub.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).createLV.func1\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:190\ngithub.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).createLV\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:265\ngithub.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).Reconcile\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:100\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":"2024-05-06T09:45:25Z","msg":"failed to create LV","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"49831b87-6969-41ba-9981-8d4ce0813225","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","error":"rpc error: code = NotFound desc = device-class not found: dc1","stacktrace":"github.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).Reconcile\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":"2024-05-06T09:45:25Z","msg":"Reconciler error","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"49831b87-6969-41ba-9981-8d4ce0813225","error":"rpc error: code = NotFound desc = device-class not found: dc1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2024-05-06T09:45:25Z","msg":"start finalizing LogicalVolume","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"f22334ca-6846-4019-a718-f4f0d87eaa75","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"}
{"level":"info","ts":"2024-05-06T09:45:25Z","msg":"LV already removed","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"f22334ca-6846-4019-a718-f4f0d87eaa75","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","uid":"4b46c52d-ab4f-4a48-92dc-5512b941d3a9"}

Node logs ((topolvm-node-vmnhh) nothing happening here):

{"level":"info","ts":"2024-05-03T12:28:44Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2024-05-03T12:28:44Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-05-03T12:28:44Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2024-05-03T12:28:44Z","msg":"Starting EventSource","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","source":"kind source: *v1.LogicalVolume"}
{"level":"info","ts":"2024-05-03T12:28:44Z","msg":"Starting Controller","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume"}
{"level":"info","ts":"2024-05-03T12:28:44Z","msg":"Starting workers","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","worker count":1}

Expected behavior
I would expect the pod to be provisioned to the worker with the volume group dc1 and not try to provision to a node that does not have it.

The text was updated successfully, but these errors were encountered:

NymanRobin · 2024-05-07T11:33:47Z

Is my understanding correct that this should work or is there some problem with my usage? I will try to dig deeper also into this problem

llamerada-jp · 2024-05-08T08:45:56Z

Hi
I guess by your log message below, lvmd.yaml file may be different from expected. Would you check it?

topolvm-node put capacity information as annotation for each node. Would you show me the annotations like below?

kubectl get node -o json | jq '.items[] | {"name": .metadata.name, "annotations": .metadata.annotations}'

VG may be not match from lvmd.yaml, Would you show me the output of vgs command?

NymanRobin · 2024-05-08T12:00:28Z

Thanks for the response, I think the annotations looks similar to the lvmd yamls to me 🤔
But I am no expert in the area, will try to dive deeper into it

$ kubectl get nodes -o json | jq '.items[] | {"name": .metadata.name, "annotations": .metadata.annotations}'

{
  "name": "topolvm-e2e-control-plane",
  "annotations": {
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}
{
  "name": "topolvm-e2e-worker",
  "annotations": {
    "capacity.topolvm.io/00default": "20396900352",
    "capacity.topolvm.io/dc1": "20396900352",
    "capacity.topolvm.io/dc2": "20396900352",
    "csi.volume.kubernetes.io/nodeid": "{\"topolvm.io\":\"topolvm-e2e-worker\"}",
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}
{
  "name": "topolvm-e2e-worker2",
  "annotations": {
    "capacity.topolvm.io/00default": "0",
    "capacity.topolvm.io/create-option-raid1": "5360320512",
    "capacity.topolvm.io/option-class-raid1": "5360320512",
    "csi.volume.kubernetes.io/nodeid": "{\"topolvm.io\":\"topolvm-e2e-worker2\"}",
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}
{
  "name": "topolvm-e2e-worker3",
  "annotations": {
    "capacity.topolvm.io/00default": "0",
    "capacity.topolvm.io/thin": "21474836480",
    "csi.volume.kubernetes.io/nodeid": "{\"topolvm.io\":\"topolvm-e2e-worker3\"}",
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}

The VGs, (not sure if these warnings are related):

$ sudo vgs
  WARNING: Not using device /dev/loop30 for PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w.
  WARNING: Not using device /dev/loop32 for PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w.
  WARNING: PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w prefers device /dev/loop5 because device name matches previous.
  WARNING: PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w prefers device /dev/loop5 because device name matches previous.
  VG            #PV #LV #SN Attr   VSize   VFree   
  myvg1           1   0   0 wz--n- <20.00g  <20.00g
  myvg2           1   1   0 wz--n- <30.00g  <11.96g
  node1-thick1    1   0   0 wz--n- <20.00g  <20.00g
  node1-thick2    1   0   0 wz--n- <20.00g  <20.00g
  node2-raid1-1   2   0   0 wz--n-   5.99g    5.99g
  node2-raid1-2   2   0   0 wz--n-   5.99g    5.99g
  node3-thin1     1   1   0 wz--n-  <5.00g 1012.00m

llamerada-jp · 2024-05-10T09:03:23Z

This behavior may caused by a mistake setting and a limitation of TopoLVM.

First, I found a mistake in the storage class. Would you fix your SC like below?

allowedTopologies:
- matchLabelExpressions:
  - key: topology.topolvm.io/node
    values:
    - topolvm-e2e-worker  #👈

I would expect to be notified of an error if there is no matching node when allowedTopologies is specified, but I don't know why Pod scheduling should continue. This feature should be k8s common behavior, so if you want to know why, you will have to ask the upstream community.

Second, there is a limitation of topolvm scheduler described in the doc below.
https://github.com/topolvm/topolvm/blob/main/docs/limitations.md#pod-without-pvc
topolvm-controller put annotations for the pod as hints to topolvm-scheduler when creating the pod that using TopoLVM volume. But if you create Pod before PVC, even if it is written in the same manifest file, topolvm-controller can not put annotations. So, the Pod and PVC are scheduled without information on the device class. Could you put PVC before Pod in the manifest file, if there is Pod before PVC in the manifest?

NymanRobin · 2024-05-15T07:55:15Z

Thanks a lot @llamerada-jp for the help!
With these two changes everything works fine, but I see that without the topology the scheduling some times goes wrong is this expected / documented?

llamerada-jp · 2024-05-15T08:15:48Z

I'm glad I could help you.

I see that without the topology the scheduling some times goes wrong is this expected / documented?

If the topology is not present, pods will be scheduled without considering free space and thus may fail to allocate the volume. This is a limitation. I thought we wrote this in the limitations.md, but it may be unclear.

NymanRobin · 2024-05-16T11:25:58Z

I see I mean it makes sense that it works like this, but at least to me it is not clear from the limitations.md, so maybe some extra clarification could be useful for users 🤔
Thanks again for the help @llamerada-jp, I will now close this issue and you can create a new if you decide to change the docs as this issue is not directly related

NymanRobin added the bug Something isn't working label May 7, 2024

NymanRobin closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling issue when multiple nodes with different volume groups on nodes #909

Scheduling issue when multiple nodes with different volume groups on nodes #909

NymanRobin commented May 7, 2024 •

edited

NymanRobin commented May 7, 2024

llamerada-jp commented May 8, 2024 •

edited

NymanRobin commented May 8, 2024 •

edited

llamerada-jp commented May 10, 2024

NymanRobin commented May 15, 2024

llamerada-jp commented May 15, 2024

NymanRobin commented May 16, 2024

Scheduling issue when multiple nodes with different volume groups on nodes #909

Scheduling issue when multiple nodes with different volume groups on nodes #909

Comments

NymanRobin commented May 7, 2024 • edited

NymanRobin commented May 7, 2024

llamerada-jp commented May 8, 2024 • edited

NymanRobin commented May 8, 2024 • edited

llamerada-jp commented May 10, 2024

NymanRobin commented May 15, 2024

llamerada-jp commented May 15, 2024

NymanRobin commented May 16, 2024

NymanRobin commented May 7, 2024 •

edited

llamerada-jp commented May 8, 2024 •

edited

NymanRobin commented May 8, 2024 •

edited