Suppported scenarion: dataLocality=strict-local vs best-effort with only 1 replica? #7912

mazocode · 2024-02-11T01:05:46Z

mazocode
Feb 11, 2024

I am wondering if best-effort with replica count = 1 is a supported use case and could be improved.

Background:

Our critical VM workloads are deployed with built-in redundancy mechanisms which means every replica is only a waste of disk space and bandwidth. While this looks like a good fit for local storage (strict-local), services still get disrupted for a short period when shutting it down for node maintenance and some re-synchronization is required when the VM comes back up. Therefore we prefer live-migrations for node maintenance. Unfortunately strict-local does not allow migration.

Our workaround is using best-effort with replica-count = 1 which works quite well but still comes with some issue that keep me wondering if this is a supported scenario and if yes, this could be improved:

In case a VM gets scheduled (on start) to a node not holding the the replica, the VM starts and the replica gets rebuild on the the new node. An I/O error is generated during this time which results in freezing the VM for the duration of the rebuild. As soon as the rebuild completed, the VM resumes and the old replica gets deleted.

Could this be improved by:

a) Introducing some mechanism to prefer scheduling the VM to nodes already hosting a replica (might be possible by modifying the VM scheduling preferences after its creation with some policy engine but this feels a bit hacky)? Maybe I am overlooking something here.

b) Most important: Rebuild the replica in the background and freeze I/O only for completion? This would reduce the time the VM is frozen and mimic the VM BlockMigration feature.

derekbit · 2024-03-05T03:19:25Z

derekbit
Mar 5, 2024
Maintainer

In case a VM gets scheduled (on start) to a node not holding the the replica, the VM starts and the replica gets rebuild on the the new node. An I/O error is generated during this time which results in freezing the VM for the duration of the rebuild.

Do you mean the I/O error is somehow triggered while rebuilding?

1 reply

mazocode Mar 12, 2024
Author

It looks like the replication process halts all other I/O until the rebuild completes on the target system, if there is only a single replica available.

derekbit · 2024-03-05T03:54:51Z

derekbit
Mar 5, 2024
Maintainer

Introducing some mechanism to prefer scheduling the VM to nodes already hosting a replica (might be possible by modifying the VM scheduling preferences after its creation with some policy engine but this feels a bit hacky)? Maybe I am overlooking something here.

IIUC, when creating a single-replica volume, the engine and replica might be on different nodes. If the data locality is best-effort, Longhorn tries to spawn a replacement on the attached node. What you mean is to avoid spawning a replacement, right?

1 reply

mazocode Mar 12, 2024
Author

Not sure if I remember this correctly but I think OpenEBS is automatically modifying the workloads affinity after the PVC got scheduled. This would always schedule the pod to the same node hosting he volume as long as there are sufficient resources and would prevent rebuilding the replica in this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suppported scenarion: dataLocality=strict-local vs best-effort with only 1 replica? #7912

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Suppported scenarion: dataLocality=strict-local vs best-effort with only 1 replica? #7912

mazocode Feb 11, 2024

Replies: 2 comments · 2 replies

derekbit Mar 5, 2024 Maintainer

mazocode Mar 12, 2024 Author

derekbit Mar 5, 2024 Maintainer

mazocode Mar 12, 2024 Author

mazocode
Feb 11, 2024

Replies: 2 comments 2 replies

derekbit
Mar 5, 2024
Maintainer

mazocode Mar 12, 2024
Author

derekbit
Mar 5, 2024
Maintainer

mazocode Mar 12, 2024
Author