Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are multiple GRPC /DeleteVolume being issued? CSI container has many errors. #1154

Open
reefland opened this issue Mar 4, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@reefland
Copy link

reefland commented Mar 4, 2024

Describe the bug
Using Rook-Ceph v1.13.5 and I have a storage class dedicated to volsync usage. While volsync appears to be working correctly - creating PVCs, snapshots, restic backups to S3 and pruning old snapshots, I'm getting a lot of error messages in Ceph csi-rbdplugin-provsioner pod, csi-rbdplugin container like this:

E0304 20:00:19.186514       1 omap.go:79] ID: 284 Req-ID: 0001-0009-rook-ceph-000000000000000d-c093e953-1fe4-4682-a9e7-7ed660c24b8e omap not found (pool="csi-ceph-blockpool", namespace="", name="csi.volume.c093e953-1fe4-4682-a9e7-7ed660c24b8e"): rados: ret=-2, No such file or directory
W0304 20:00:19.186545       1 voljournal.go:729] ID: 284 Req-ID: 0001-0009-rook-ceph-000000000000000d-c093e953-1fe4-4682-a9e7-7ed660c24b8e unable to read omap keys: pool or key missing: key not found: rados: ret=-2, No such file or directory
E0304 20:00:19.190568       1 rbd_journal.go:689] ID: 284 Req-ID: 0001-0009-rook-ceph-000000000000000d-c093e953-1fe4-4682-a9e7-7ed660c24b8e failed to get image id csi-ceph-blockpool/csi-vol-c093e953-1fe4-4682-a9e7-7ed660c24b8e: image not found: RBD image not found

I do not see any errors or warnings in the volsync logs.

Steps to reproduce
Watch the csi-rbdplugin container logs when the volsync schedule is triggered.

Expected behavior
Not expecting error messages in the container logs.

Actual results
Everything appears to be working fine even with the error messages. As a test, I delete the entire namespace of a test application, and ArgoCD is able to rebuilt it from GitHub & Volsync creates & populates the PVC. Application is back on-line with data.

Additional context
I asked about these messages in the Rook-Ceph discussion area, and they suggest that multiple GRPC /DeleteVolume are being issued (by volsync) rook/rook#13851 Is this expected behavior?

I also asked other volsync users with rook if they see these error messages logged, and they do. I can ask them to chime in here if that would be helpful.

@reefland reefland added the bug Something isn't working label Mar 4, 2024
@onedr0p
Copy link
Contributor

onedr0p commented Mar 4, 2024

I can also confirm I see these logs in the csi-rbdplugin-provsioner pod / csi-rbdplugin container

@tesshuflower
Copy link
Contributor

VolSync doesn't call to /DeleteVolume directly, it should be the CSI external provisioner that does that. However, VolSync does create & delete PVC resources, which should then prompt the external provisioner to call a /DeleteVolume.

VolSync runs a controller-client DeleteAllOf() with label selectors to delete pvcs that were created temporarily, so potentially with caching and multiple reconciles could invoke multiple deletes on the same resource. Multiple reconciles is pretty normal, and when it gets to the cleanup step you'll see something like this in the logs: deleting temporary objects. I guess I'm not too concerned about this, as we're trying to get to eventual consistency, and the main thing is that the temporary object is in fact deleted.

We could possibly look at seeing if we're requeuing too often, but I don't think we can eliminate this entirely. Another way would be to do lookups and delete PVCs individually so that we're sure the client cache is updated, but this seems like unnecessary overhead.

I'm also not sure what role the exernal-provisioner takes in all this, and if it's really due to multiple deleteAllOf() calls on pvcs or not.

You could check if you can recreate the same behaviour by doing a single delete on a PVC, or doing kubectl deletes with a label selector multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

3 participants