Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-node (WIP) cluster can't schedule controller #654

Open
IngwiePhoenix opened this issue Apr 21, 2024 · 1 comment
Open

Single-node (WIP) cluster can't schedule controller #654

IngwiePhoenix opened this issue Apr 21, 2024 · 1 comment

Comments

@IngwiePhoenix
Copy link

(Yep, I did read the template; but for some odd reason I am not seing the signup verification email. I am pretty sure it's a layer 8 problem... so, apologies in advance!)

Hello! I am trying to bootstrap the NFS-CSI driver off the helm chart in a k3s cluster - only one node for now, I intend to grow it to a few more once I have my base config figured out. But, this means that this message:

kube-system   0s                     Warning   FailedScheduling                 Pod/csi-nfs-controller-59b87c6c7c-ktfh7    0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

isn't helping a whole lot. So I have tried to get rid of this but no matter to what I set controller.tolerations, I keep getting that warning.

First, here's my HelmChart and values as kubectl applyd to the k3s node:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: nfs-csi-chart
  namespace: kube-system
spec:
  repo: https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
  chart: csi-driver-nfs
  #version: latest
  targetNamespace: kube-system
  valuesContent: |-
    serviceAccount:
      create: true # When true, service accounts will be created for you. Set to false if you want to use your own.
      # controller: csi-nfs-controller-sa # Name of Service Account to be created or used
      # node: csi-nfs-node-sa # Name of Service Account to be created or used

    rbac:
      create: true
      name: nfs

    driver:
      name: nfs.csi.k8s.io
      mountPermissions: 0

    feature:
      enableFSGroupPolicy: true
      enableInlineVolume: false
      propagateHostMountOptions: false

    # do I have to change that?; k3s on /mnt/usb/k3s but no kubelet dir
    kubeletDir: /var/lib/kubelet

    controller:
      # TODO: do i need to true them?
      runOnControlPlane: true
      runOnMaster: true
      logLevel: 5
      workingMountDir: /tmp
      defaultOnDeletePolicy: retain  # available values: delete, retain
      priorityClassName: system-cluster-critical
      # FIXME: better solution???
      tolerations: []
    node:
      name: csi-nfs-node

    # TODO: sync to backup
    externalSnapshotter:
      enabled: false
      name: snapshot-controller
      priorityClassName: system-cluster-critical
      # Create volume snapshot CRDs.
      customResourceDefinitions:
        enabled: true   #if set true, VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass CRDs will be created. Set it false, If they already exist in cluster.

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-bunker
provisioner: nfs.csi.k8s.io
parameters:
  # alt. use tailscale IP
  server: 192.168.1.2
  share: /mnt/vol1/Services/k3s
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1

When I look at the generated pod that throws the error, I can see the tolerations right then and there:

  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/controlplane
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

Is there something I overlooked to make the controller properly schedule onto my node? Looking at the node itself shows the related taints:

Node spec
# kubectl get node/routerboi -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 192.168.1.3
    csi.volume.kubernetes.io/nodeid: '{"nfs.csi.k8s.io":"routerboi"}'
    etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-21T04:19:08+02:00"
    etcd.k3s.cattle.io/node-address: 192.168.1.3
    etcd.k3s.cattle.io/node-name: routerboi-a33ea14d
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"de:b0:64:00:55:cf"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 100.64.0.2
    flannel.alpha.coreos.com/public-ip-overwrite: 100.64.0.2
    k3s.io/encryption-config-hash: start-70fb6f5afe422f096fc74aa91ff0998185377373139914e3aeaa9d20999adf8f
    k3s.io/external-ip: 100.64.0.2
    k3s.io/hostname: cluserboi
    k3s.io/internal-ip: 192.168.1.3
    k3s.io/node-args: '["server","--log","/var/log/k3s.log","--token","********","--write-kubeconfig-mode","600","--cluster-init","true","--cluster-domain","kube.birb.it","--flannel-external-ip","true","--etcd-snapshot-compress","true","--secrets-encryption","true","--data-dir","/mnt/usb/k3s","--node-external-ip","100.64.0.2","--node-label","node-location=home","--node-name","routerboi","--default-local-storage-path","/mnt/usb/k3s-data"]'
    k3s.io/node-config-hash: 7FJHCLEHT5LLPFFY5MHTC4FNIGPUD3EZI2YWWAVNCRX4UCF2TZZA====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/mnt/usb/k3s/data/7ddd49d3724e00d95d2af069d3247eaeb6635abe80397c8d94d4053dd02ab88d"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-04-20T20:07:06Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: arm64
    beta.kubernetes.io/instance-type: k3s
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: arm64
    kubernetes.io/hostname: routerboi
    kubernetes.io/os: linux
    node-location: home
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/etcd: "true"
    node-role.kubernetes.io/master: "true"
    node.kubernetes.io/instance-type: k3s
  name: routerboi
  resourceVersion: "72651"
  uid: b4e6ff71-c631-4f20-a61f-ef578cf2749d
spec:
  podCIDR: 10.42.0.0/24
  podCIDRs:
  - 10.42.0.0/24
  providerID: k3s://routerboi
status:
  addresses:
  - address: 192.168.1.3
    type: InternalIP
  - address: 100.64.0.2
    type: ExternalIP
  - address: cluserboi
    type: Hostname
  allocatable:
    cpu: "8"
    ephemeral-storage: "28447967825"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 8131288Ki
    pods: "110"
  capacity:
    cpu: "8"
    ephemeral-storage: 29243388Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 8131288Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2024-04-21T03:12:33Z"
    lastTransitionTime: "2024-04-20T20:07:16Z"
    message: Node is a voting member of the etcd cluster
    reason: MemberNotLearner
    status: "True"
    type: EtcdIsVoter
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T20:07:06Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T20:07:06Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T20:07:06Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T22:19:01Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - docker.io/rancher/klipper-helm@sha256:87db3ad354905e6d31e420476467aefcd8f37d071a8f1c8a904f4743162ae546
    - docker.io/rancher/klipper-helm:v0.8.3-build20240228
    sizeBytes: 84105730
  - names:
    - docker.io/vaultwarden/server@sha256:edb8e2bab9cbca22e555638294db9b3657ffbb6e5d149a29d7ccdb243e3c71e0
    - docker.io/vaultwarden/server:latest
    sizeBytes: 66190948
  - names:
    - registry.k8s.io/sig-storage/nfsplugin@sha256:54b97b7ec30ca185c16e8c40e84fc527a7fc5cc8e9f7ea6b857a7a67655fff54
    - registry.k8s.io/sig-storage/nfsplugin:v4.6.0
    sizeBytes: 63690685
  - names:
    - docker.io/rancher/mirrored-library-traefik@sha256:ca9c8fbe001070c546a75184e3fd7f08c3e47dfc1e89bff6fe2edd302accfaec
    - docker.io/rancher/mirrored-library-traefik:2.10.5
    sizeBytes: 40129288
  - names:
    - docker.io/rancher/mirrored-metrics-server@sha256:20b8b36f8cac9e25aa2a0ff35147b13643bfec603e7e7480886632330a3bbc59
    - docker.io/rancher/mirrored-metrics-server:v0.7.0
    sizeBytes: 17809919
  - names:
    - docker.io/rancher/local-path-provisioner@sha256:aee53cadc62bd023911e7f077877d047c5b3c269f9bba25724d558654f43cea0
    - docker.io/rancher/local-path-provisioner:v0.0.26
    sizeBytes: 15933947
  - names:
    - docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
    - docker.io/rancher/mirrored-coredns-coredns:1.10.1
    sizeBytes: 14556850
  - names:
    - registry.k8s.io/sig-storage/livenessprobe@sha256:5baeb4a6d7d517434292758928bb33efc6397368cbb48c8a4cf29496abf4e987
    - registry.k8s.io/sig-storage/livenessprobe:v2.12.0
    sizeBytes: 12635307
  - names:
    - registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:c53535af8a7f7e3164609838c4b191b42b2d81238d75c1b2a2b582ada62a9780
    - registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.0
    sizeBytes: 10291112
  - names:
    - docker.io/rancher/klipper-lb@sha256:558dcf96bf0800d9977ef46dca18411752618cd9dd06daeb99460c0a301d0a60
    - docker.io/rancher/klipper-lb:v0.4.7
    sizeBytes: 4939041
  - names:
    - docker.io/library/busybox@sha256:c3839dd800b9eb7603340509769c43e146a74c63dca3045a8e7dc8ee07e53966
    - docker.io/rancher/mirrored-library-busybox@sha256:0d2d5aa0a465e06264b1e68a78b6d2af5df564504bde485ae995f8e73430bca2
    - docker.io/library/busybox:latest
    - docker.io/rancher/mirrored-library-busybox:1.36.1
    sizeBytes: 1848702
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 253243
  nodeInfo:
    architecture: arm64
    bootID: 198115b5-8292-4d8d-91ef-5faf2ea60504
    containerRuntimeVersion: containerd://1.7.11-k3s2
    kernelVersion: 6.8.7-edge-rockchip-rk3588
    kubeProxyVersion: v1.29.3+k3s1
    kubeletVersion: v1.29.3+k3s1
    machineID: 28b5d8681b21493b87f17ffeb6fcb5b7
    operatingSystem: linux
    osImage: Armbian 24.5.0-trunk.446 bookworm
    systemUUID: 28b5d8681b21493b87f17ffeb6fcb5b7

Do you perhaps see something that I missed?

Thank you and kind regards,
Ingwie

@andyzhangx
Copy link
Member

have you resolved this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants