New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
freenas-api-iscsi
: Concurrently taken snapshots become stuck
#332
Comments
So there are 2 different failures?:
Can you send the controller logs showing csi failing to submit? |
I don't have the logs from tonight, because I restarted the controller. But I ran them again at 12:40 (UTC). This one is the one that's stuck this time:
The logs of the containers: |
Hey, I have a very similar issue with
From these messages, it seems to be attempting to create, but gets locked, and when it comes second time around to create this snapshot, the snapshot already exists but with a different size. Usually when snapshots are happening my TrueNas server CPU is 100% loaded. I found that if I delete 4 PVCs at a time that depend on these snapshots, then PVCs proceed and my volsync snapshots finish. However, using this approach, I get dangling snapshots and NFS export entries that do not get cleaned up. Also I found that TrueNAS settings have a limit on concurrent replication tasks (I wonder if it hits against this limit). Is it possible to add a bounded queue of provision requests into the driver, which would limit snapshot concurrency? It would be nice if the size of the queue could be configurable too. This way new requests won't be sent to TrueNAS server before previous requests clear out from the queue. |
Hello, I also seem to have this issue. I have 12 volsync replicationsources that trigger backup at the same time, and their I noticed that after limiting Replication Tasks to 1 in Truenas Advanced settings, most of the PVCs eventually succeed (within 20 min), except exactly 2 (not always the same 2) that will remain stuck in Pending indefinitely (I assume, I waited for a few hours). I can also see 2 active jobs in the Truenas dashboard ( At this point, if I delete the 2 stuck PVCs, volsync will schedule new ones and those get provisioned. The 2 active jobs remain until Truenas reboot however. I can confirm this behavior on both 23.10 and now on 24.04.0 after upgrading this morning |
I'm using
freenas-api-iscsi
in conjunction with VolSync to take regular backups of my data. VolSync takes a snapshot of the PV, mounts it and copies the data offsite. While this is generally functional, scheduling multiple backups at the same time results in some of them becoming stuck.Right now I have scheduled five different backups at 2am. Every morning, three have been successfully completed and two are stuck in a pending state. Apparently, the driver couldn't create some of the snapshot and timed out after 90 seconds:
TrueNAS shows the operation as 'in progress':
The actual snapshot has not been created:
Restarting the democratic-csi controller causes it to pick up the failed snapshots and successfully create them (sometimes multiple restarts are necessary because one times out again). TrueNAS keeps displaying the jobs in the UI until I reboot the server or manually restart the middleware.
I can work around the issue by scheduling the backups five minutes apart from each other.
The helm release (I'm using Talos):
The installed snapshot-controller is the democratic-csi one:
TrueNAS is running version
TrueNAS-SCALE-22.12.3.3
.The text was updated successfully, but these errors were encountered: