Node affinity for snapshots #1019

dhess · 2023-12-10T00:48:35Z

Hi, thanks for this great project! We just started using it with our Rook/Ceph volumes, and it's working great.

It doesn't work so well with OpenEBS ZFS LocalPV (ZFS-LocalPV) volumes, however. ZFS-LocalPV has first-class support for CSI snapshotting and cloning, but VolSync can't figure out that the ZFS-LocalPV snapshot of a PVC mounted on, e.g., node-a, can also only be consumed from node-a. copyMethod: Directdoesn't help here for in-use volumes, because they can't be remounted. (Actually, I seem to recall that ZFS-LocalPV does support simultaneous pod mounts with a bit of extra configuration, but I'd prefer to use snapshots for proper PiT backups, anyway.)

Would it be difficult to add first-class support to VolSync for node-local provisioners with snapshotting support, like ZFS-LocalPV? Unless I'm missing something, it seems like it should be possible: since copyMethod: Direct can determine which node a PVC is mounted on and ensure the sync is performed from that node, then naïvely, it seems that an additional configuration option could be added to tell VolSync to mount a snapshot and run the sync operation on the same node where the source PVC is mounted.

The text was updated successfully, but these errors were encountered:

tesshuflower · 2023-12-11T14:55:32Z

@dhess This is an interesting one. I'm not sure the workaround used for Direct mode will work, as it relies on finding another active pod that's currently using the PVC and then scheduling on the same node.

In this case (if I understand correctly), a new PVC from snapshot is created, and the VolSync mover pod should then be the 1st consumer of this PVC. Normally I would have thought the pod should get scheduled automatically in the correct place, but maybe something else is going on.

Does ZFS-LocalPV use the csi topology feature? https://kubernetes-csi.github.io/docs/topology.html

One more question: When you create your original sourcePVC and then run your application pod, do you also need to manually configure that pod to run on a particular node that corresponds to where the PVC was provisioned?

dhess · 2023-12-12T23:48:59Z

Hi @tesshuflower, thanks for the quick response.

Does ZFS-LocalPV use the csi topology feature? https://kubernetes-csi.github.io/docs/topology.html

I'm not familiar with CSI Topology, but from what I can tell, it seems it does:

https://github.com/openebs/zfs-localpv/blob/d646e6b1aa779e986b8cbf5ce65b400f243c557b/deploy/helm/charts/values.yaml#L57

I'm guessing this manifest for the openebs-zfs-localpv-controller also demonstrates that it's using CSI topology:

      - args:
        - --csi-address=$(ADDRESS)
        - --v=5
        - --feature-gates=Topology=true
        - --strict-topology
        - --leader-election
        - --enable-capacity=true
        - --extra-create-metadata=true
        - --default-fstype=ext4
        env:
        - name: ADDRESS
          value: /var/lib/csi/sockets/pluginproxy/csi.sock
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
        imagePullPolicy: IfNotPresent
        name: csi-provisioner
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/csi/sockets/pluginproxy/
          name: socket-dir

Are there any particular topology keys I should use for compatibility with VolSync? Is the ZFS-LocalPV Helm chart's default "All" value a valid key?

One more question: When you create your original sourcePVC and then run your application pod, do you also need to manually configure that pod to run on a particular node that corresponds to where the PVC was provisioned?

I think you're referring to statically provisioned PVCs here? If so, I'm not using those, so I'm not sure. All of the PVCs I'm trying to use as source PVCs for VolSync are dynamically provisioned as part of a StatefulSet or similar, and therefore Kubernetes creates the PVC on the same node where its pod will run.

tesshuflower · 2023-12-13T14:32:09Z

@dhess there's nothing specific in VolSync that you should need to do to ensure compatibility. I guess normally I'd expect that the first consumer (the volsync mover pod in this case) of a PVC should get automatically scheduled on a node where that pvc is accessible. It sounds like this is happening with your statefulset for example.

Maybe you could try something to help me understand - If you create a volumesnapshot for one of your source PVCs and then create a PVC from this snapshot (or do a clone instead of volumesnapshot+pvc if you're using copymethod Clone) - Can you then create a job or deployment that mounts this PVC without specifically needing to set affinity to schedule it on a particular node?

dhess · 2023-12-16T13:01:21Z

Ahh, I see what you mean now. I'll try an experiment and get back to you.

danielsand · 2023-12-31T07:50:00Z

👋
this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator.

democratic-csi/democratic-csi#329

seems to be a time based racecondition.

tesshuflower · 2024-01-03T15:05:33Z

👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator.

democratic-csi/democratic-csi#329

seems to be a time based racecondition.

@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue?

dhess · 2024-03-25T00:44:45Z

So since I originally posted this issue, VolSync snapshots with ZFS-LocalPV have been working pretty reliably. However, we just ran into the issue (or at least a similar one) again, and I think it's possible that I misdiagnosed the original problem.

This time what happened is:

We added some new worker nodes to the cluster, and they have each have a few ZFS-LocalPV storage classes defined.
Shortly thereafter, one of our VolSync jobs that backs up a ZFS-LocalPV PVC got stuck in scheduling, complaining that 0 nodes were available. This VolSync job had previously been working reliably for about a month.
When I looked more carefully at the root cause, I noticed that while the Clone PVC was correctly created on the same node as the source ZFS-LocalPV PVC, the cache PVC was not — it was being created on one of the new worker nodes. Since ZFS-LocalPV volumes can't be mounted across the network, the ReplicationSource job was getting stuck on the remote ZFS-LocalPV cache PVC.

The ReplicationSource job originally looked like this:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: db-primer-service-0
spec:
  sourcePVC: db-primer-service-0
  trigger:
    # 1 backup per hour
    schedule: "30 * * * *"
  restic:
    cacheStorageClassName: zfspv-pool-0
    copyMethod: Clone
    pruneIntervalDays: 7
    repository: restic-config-db-primer-service-0
    retain:
      hourly: 24
      daily: 7
      weekly: 1
    volumeSnapshotClassName: zfspv-snapclass

where zfspv-pool-0 is the same ZFS-LocalPV storage class as the source volume.

In the last few months we've also added support for Mayastor to our cluster, and those PVCs are not tied to a particular node, so when I changed the cache storage class to Mayastor, the backup job ran and completed successfully:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: db-primer-service-0
spec:
  sourcePVC: db-primer-service-0
  trigger:
    # 1 backup per hour
    schedule: "30 * * * *"
  restic:
    cacheStorageClassName: mayastor-pool-0-repl-1
    copyMethod: Clone
    pruneIntervalDays: 7
    repository: restic-config-db-primer-service-0
    retain:
      hourly: 24
      daily: 7
      weekly: 1
    volumeSnapshotClassName: zfspv-snapclass

So I think that the problem here isn't with the source volume, but with the cache volume. I suspect that in order to reliably use a local PV storage class for cache volumes, there'll need to be some way to specify the topology of that volume.

What's still puzzling is that all of our other cacheStorageClassNames also specify a ZFS-LocalPV storage class, and this is the first time I've seen a stuck job in awhile. Why this suddenly popped up again after adding some new nodes is curious. Maybe the scheduler is trying to balance out the number of PVCs across the new nodes?

tesshuflower · 2024-03-26T15:54:43Z

@dhess is your storageclass using a VolumeBindingMode of WaitForFirstConsumer? VolSync doesn't create the cache PVC until just before creating the job, so normally I think it should be figured out in the scheduling - unless you're using a VolumeBindingMode of Immediate, in which case the PVC could be bound to a node that isn't the same one as your pvc from snap.

danielsand · 2024-05-09T12:08:05Z

👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator.
democratic-csi/democratic-csi#329
seems to be a time based racecondition.

@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue?

The linked issue wasnt about the volumepopulator,
democrati csi local-hostpath + volume snapshots + volsync didnt worked for some folks.

Just a reference it on what was is currently running on my end and what is working. (CSI and volume snapshots work as they should)

Volumepopulator is failing at random currently on my setup.
The wrong node gets picked by the volume populator and WaitForFirstConsumer is specified.

Will circle back when I push the topic again.

tesshuflower · 2024-05-09T13:23:12Z

👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator.
democratic-csi/democratic-csi#329
seems to be a time based racecondition.

@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue?

The linked issue wasnt about the volumepopulator, democrati csi local-hostpath + volume snapshots + volsync didnt worked for some folks.

Just a reference it on what was is currently running on my end and what is working. (CSI and volume snapshots work as they should)

Volumepopulator is failing at random currently on my setup. The wrong node gets picked by the volume populator and WaitForFirstConsumer is specified.

Will circle back when I push the topic again.

@danielsand I've created a separate issue #1255 to track this. I believe both issues are about storage drivers that create volumesnapshots/pvcs that are constrained to specific nodes, but I think your issue is related to using the volumepopulator, and this one is not.

danielsand mentioned this issue Dec 31, 2023

local-hostpath + volume snapshots + volsync issue democratic-csi/democratic-csi#329

Closed

tesshuflower mentioned this issue May 9, 2024

VolumePopulator scheduling issue - CSI democrati-csi local-hostpath using the volsync volumepopulator. #1255

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node affinity for snapshots #1019

Node affinity for snapshots #1019

dhess commented Dec 10, 2023

tesshuflower commented Dec 11, 2023

dhess commented Dec 12, 2023 •

edited

tesshuflower commented Dec 13, 2023

dhess commented Dec 16, 2023

danielsand commented Dec 31, 2023

tesshuflower commented Jan 3, 2024

dhess commented Mar 25, 2024

tesshuflower commented Mar 26, 2024

danielsand commented May 9, 2024

tesshuflower commented May 9, 2024

Node affinity for snapshots #1019

Node affinity for snapshots #1019

Comments

dhess commented Dec 10, 2023

tesshuflower commented Dec 11, 2023

dhess commented Dec 12, 2023 • edited

tesshuflower commented Dec 13, 2023

dhess commented Dec 16, 2023

danielsand commented Dec 31, 2023

tesshuflower commented Jan 3, 2024

dhess commented Mar 25, 2024

tesshuflower commented Mar 26, 2024

danielsand commented May 9, 2024

tesshuflower commented May 9, 2024

dhess commented Dec 12, 2023 •

edited