Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduling issue when multiple nodes with different volume groups on nodes #909

Closed
NymanRobin opened this issue May 7, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@NymanRobin
Copy link

NymanRobin commented May 7, 2024

Describe the bug
When a volume group only exists on a single node the default scheduler might try to schedule it to a pod where the volume group does not exist, this will cause a failure and it won't be tried to scheduled to the node with the volume group. This happens even though a topology is defined in the storage class. This is reproducible with e2e test-env when lvm config is modified

To Reproduce
Steps to reproduce the behavior:

  1. Create e2e test with the following setup:

    • node1 (dc1, dc2)
    • node2 (node2-raid1-1, node2-raid1-2)
    • node3 (node3-thin1)
  2. Add topology to the storage-class

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: topology.topolvm.io/node
    values:
    - topology.topolvm.io/node=topolvm-e2e-worker
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: topolvm
    meta.helm.sh/release-namespace: topolvm-system
  creationTimestamp: "2024-05-03T12:28:40Z"
  labels:
    app.kubernetes.io/instance: topolvm
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: topolvm
    app.kubernetes.io/version: 0.28.0
    helm.sh/chart: topolvm-14.1.1
  name: topolvm-provisioner
  resourceVersion: "798918"
  uid: 87482345-5a11-484f-9f2f-04a37f3bc820
parameters:
  csi.storage.k8s.io/fstype: xfs
  topolvm.io/device-class: dc1
provisioner: topolvm.io
reclaimPolicy: Delete
volume
BindingMode: WaitForFirstConsumer
  1. Deploy pod + pvc with storageclass topolvm-provisioner

  2. See error in controller and node logs

Node:
topolvm-node-cns85 3/3 Running 0 3d22h 10.244.1.2 topolvm-e2e-worker2
topolvm-node-ngf8m 3/3 Running 0 3d22h 10.244.3.3 topolvm-e2e-worker3
topolvm-node-vmnhh 3/3 Running 0 3d22h 10.244.2.4 topolvm-e2e-worker

Controller logs:

{"level":"info","ts":"2024-05-06T09:44:38Z","logger":"LogicalVolume","msg":"waiting for setting 'status.volumeID'","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"}
{"level":"info","ts":"2024-05-06T09:45:25Z","logger":"driver.controller","msg":"CreateVolume called","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","device_class":"dc1","required":134217728,"limit":0,"parameters":{"topolvm.io/device-class":"dc1"},"num_secrets":0,"capabilities":[{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}}],"content_source":"<nil>","accessibility_requirements":"requisite:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker2\" > > requisite:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker3\" > > requisite:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker\" > > preferred:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker2\" > > preferred:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker3\" > > preferred:<segments:<key:\"topology.topolvm.io/node\" value:\"topolvm-e2e-worker\" > > "}

Node logs (topolvm-node-cns85):

{"level":"error","ts":"2024-05-06T09:45:25Z","msg":"failed to get list of LV","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"49831b87-6969-41ba-9981-8d4ce0813225","error":"rpc error: code = NotFound desc = device-class not found: dc1","stacktrace":"github.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).volumeExists\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:166\ngithub.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).createLV.func1\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:190\ngithub.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).createLV\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:265\ngithub.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).Reconcile\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:100\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":"2024-05-06T09:45:25Z","msg":"failed to create LV","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"49831b87-6969-41ba-9981-8d4ce0813225","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","error":"rpc error: code = NotFound desc = device-class not found: dc1","stacktrace":"github.com/topolvm/topolvm/internal/controller.(*LogicalVolumeReconciler).Reconcile\n\t/home/ubuntu/workdir/topolvm/internal/controller/logicalvolume_controller.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":"2024-05-06T09:45:25Z","msg":"Reconciler error","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"49831b87-6969-41ba-9981-8d4ce0813225","error":"rpc error: code = NotFound desc = device-class not found: dc1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2024-05-06T09:45:25Z","msg":"start finalizing LogicalVolume","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"f22334ca-6846-4019-a718-f4f0d87eaa75","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"}
{"level":"info","ts":"2024-05-06T09:45:25Z","msg":"LV already removed","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4"},"namespace":"","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","reconcileID":"f22334ca-6846-4019-a718-f4f0d87eaa75","name":"pvc-9962a49c-f92d-4b72-91fb-b438d483e9e4","uid":"4b46c52d-ab4f-4a48-92dc-5512b941d3a9"}

Node logs ((topolvm-node-vmnhh) nothing happening here):

{"level":"info","ts":"2024-05-03T12:28:44Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2024-05-03T12:28:44Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-05-03T12:28:44Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2024-05-03T12:28:44Z","msg":"Starting EventSource","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","source":"kind source: *v1.LogicalVolume"}
{"level":"info","ts":"2024-05-03T12:28:44Z","msg":"Starting Controller","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume"}
{"level":"info","ts":"2024-05-03T12:28:44Z","msg":"Starting workers","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","worker count":1}

Expected behavior
I would expect the pod to be provisioned to the worker with the volume group dc1 and not try to provision to a node that does not have it.

@NymanRobin NymanRobin added the bug Something isn't working label May 7, 2024
@NymanRobin
Copy link
Author

Is my understanding correct that this should work or is there some problem with my usage? I will try to dig deeper also into this problem

@llamerada-jp
Copy link
Contributor

llamerada-jp commented May 8, 2024

Hi
I guess by your log message below, lvmd.yaml file may be different from expected. Would you check it?

topolvm-node put capacity information as annotation for each node. Would you show me the annotations like below?

kubectl get node -o json | jq '.items[] | {"name": .metadata.name, "annotations": .metadata.annotations}'

VG may be not match from lvmd.yaml, Would you show me the output of vgs command?

@NymanRobin
Copy link
Author

NymanRobin commented May 8, 2024

Thanks for the response, I think the annotations looks similar to the lvmd yamls to me 🤔
But I am no expert in the area, will try to dive deeper into it

$ kubectl get nodes -o json | jq '.items[] | {"name": .metadata.name, "annotations": .metadata.annotations}'

{
  "name": "topolvm-e2e-control-plane",
  "annotations": {
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}
{
  "name": "topolvm-e2e-worker",
  "annotations": {
    "capacity.topolvm.io/00default": "20396900352",
    "capacity.topolvm.io/dc1": "20396900352",
    "capacity.topolvm.io/dc2": "20396900352",
    "csi.volume.kubernetes.io/nodeid": "{\"topolvm.io\":\"topolvm-e2e-worker\"}",
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}
{
  "name": "topolvm-e2e-worker2",
  "annotations": {
    "capacity.topolvm.io/00default": "0",
    "capacity.topolvm.io/create-option-raid1": "5360320512",
    "capacity.topolvm.io/option-class-raid1": "5360320512",
    "csi.volume.kubernetes.io/nodeid": "{\"topolvm.io\":\"topolvm-e2e-worker2\"}",
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}
{
  "name": "topolvm-e2e-worker3",
  "annotations": {
    "capacity.topolvm.io/00default": "0",
    "capacity.topolvm.io/thin": "21474836480",
    "csi.volume.kubernetes.io/nodeid": "{\"topolvm.io\":\"topolvm-e2e-worker3\"}",
    "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///run/containerd/containerd.sock",
    "node.alpha.kubernetes.io/ttl": "0",
    "volumes.kubernetes.io/controller-managed-attach-detach": "true"
  }
}

The VGs, (not sure if these warnings are related):

$ sudo vgs
  WARNING: Not using device /dev/loop30 for PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w.
  WARNING: Not using device /dev/loop32 for PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w.
  WARNING: PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w prefers device /dev/loop5 because device name matches previous.
  WARNING: PV FtpUj8-r86x-OL7c-hozC-KdqZ-fsOX-3tp42w prefers device /dev/loop5 because device name matches previous.
  VG            #PV #LV #SN Attr   VSize   VFree   
  myvg1           1   0   0 wz--n- <20.00g  <20.00g
  myvg2           1   1   0 wz--n- <30.00g  <11.96g
  node1-thick1    1   0   0 wz--n- <20.00g  <20.00g
  node1-thick2    1   0   0 wz--n- <20.00g  <20.00g
  node2-raid1-1   2   0   0 wz--n-   5.99g    5.99g
  node2-raid1-2   2   0   0 wz--n-   5.99g    5.99g
  node3-thin1     1   1   0 wz--n-  <5.00g 1012.00m

@llamerada-jp
Copy link
Contributor

This behavior may caused by a mistake setting and a limitation of TopoLVM.

First, I found a mistake in the storage class. Would you fix your SC like below?

allowedTopologies:
- matchLabelExpressions:
  - key: topology.topolvm.io/node
    values:
    - topolvm-e2e-worker  #👈

I would expect to be notified of an error if there is no matching node when allowedTopologies is specified, but I don't know why Pod scheduling should continue. This feature should be k8s common behavior, so if you want to know why, you will have to ask the upstream community.

Second, there is a limitation of topolvm scheduler described in the doc below.
https://github.com/topolvm/topolvm/blob/main/docs/limitations.md#pod-without-pvc
topolvm-controller put annotations for the pod as hints to topolvm-scheduler when creating the pod that using TopoLVM volume. But if you create Pod before PVC, even if it is written in the same manifest file, topolvm-controller can not put annotations. So, the Pod and PVC are scheduled without information on the device class. Could you put PVC before Pod in the manifest file, if there is Pod before PVC in the manifest?

@NymanRobin
Copy link
Author

Thanks a lot @llamerada-jp for the help!
With these two changes everything works fine, but I see that without the topology the scheduling some times goes wrong is this expected / documented?

@llamerada-jp
Copy link
Contributor

I'm glad I could help you.

I see that without the topology the scheduling some times goes wrong is this expected / documented?

If the topology is not present, pods will be scheduled without considering free space and thus may fail to allocate the volume. This is a limitation. I thought we wrote this in the limitations.md, but it may be unclear.

@NymanRobin
Copy link
Author

I see I mean it makes sense that it works like this, but at least to me it is not clear from the limitations.md, so maybe some extra clarification could be useful for users 🤔
Thanks again for the help @llamerada-jp, I will now close this issue and you can create a new if you decide to change the docs as this issue is not directly related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants