Replies: 2 comments 2 replies
-
Do you mean the I/O error is somehow triggered while rebuilding? |
Beta Was this translation helpful? Give feedback.
-
IIUC, when creating a single-replica volume, the engine and replica might be on different nodes. If the data locality is |
Beta Was this translation helpful? Give feedback.
-
I am wondering if best-effort with replica count = 1 is a supported use case and could be improved.
Background:
Our critical VM workloads are deployed with built-in redundancy mechanisms which means every replica is only a waste of disk space and bandwidth. While this looks like a good fit for local storage (strict-local), services still get disrupted for a short period when shutting it down for node maintenance and some re-synchronization is required when the VM comes back up. Therefore we prefer live-migrations for node maintenance. Unfortunately strict-local does not allow migration.
Our workaround is using best-effort with replica-count = 1 which works quite well but still comes with some issue that keep me wondering if this is a supported scenario and if yes, this could be improved:
In case a VM gets scheduled (on start) to a node not holding the the replica, the VM starts and the replica gets rebuild on the the new node. An I/O error is generated during this time which results in freezing the VM for the duration of the rebuild. As soon as the rebuild completed, the VM resumes and the old replica gets deleted.
Could this be improved by:
a) Introducing some mechanism to prefer scheduling the VM to nodes already hosting a replica (might be possible by modifying the VM scheduling preferences after its creation with some policy engine but this feels a bit hacky)? Maybe I am overlooking something here.
b) Most important: Rebuild the replica in the background and freeze I/O only for completion? This would reduce the time the VM is frozen and mimic the VM BlockMigration feature.
Beta Was this translation helpful? Give feedback.
All reactions