-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMPROVEMENT] Disable revision counter by default #8563
Comments
@shuo-wu @PhanLe1010 @WebberHuang1118 @Vicente-Cheng Any thoughts for the improvement? |
We discussed this internally. The discussion so far is that the reason we can safely disable RC are:
There is one concern we need to solve which is:
And the idea for this one is:
Btw, we will have this in 1.7.0 instead of stable release 1.6.x |
The auto salvage feature should pick up one reusable replica only. Considering the concurrent replica write out-of-order issue, this is required even if the revision counter is enabled. |
Hi @PhanLe1010 @derekbit @shuo-wu |
All data should have the same data content. |
I am not sure what we have to do if we want to implement implement |
For this idea, @shuo-wu @ejweber @james-munson and I discussed in the US sync and agree that Longhorn manager can salvage multiple replicas and it is the job of the engine to select the best replica and mark other replicas as err. The reason is that:
|
Actually, we need to do few things extra after disabling the revision counter as most of the logic is already there:
|
I am walking through the flow of auto-salvage for volume with disabled RC to double-check the logic. From a closer look, it seems that the engine will pick multiple replicas with the last modification timestamp within 5s duration of the latest one: https://github.com/longhorn/longhorn-engine/blob/e39b7f0313b22d5c435ce57d1442800999a0f4ac/pkg/controller/control.go#L644-L652 IMO, 5s can be problematic as during this time there might be IO differences between the replicas. This might cause the replicas' data inconsistency. However, if we reduce this value or even eliminate this value (so that the engine will always pick only the replica with the latest modification timestamp), we will trigger more replica rebuilding (potentially unnecessary expensive replica rebuilding) So this becomes a trade-off between the risk of data being inconsistent VS potentially unnecessary expensive rebuilding cost. I am thinking of reduce this value to 1s as a middle ground between the 2 options. WDYT? @shuo-wu @derekbit @ejweber @innobead ? |
I am leaning towards eliminating it and only salvaging from one replica:
|
Prefer to salvage one replica only so that the replica consistency can be guaranteed.
|
I agree that we can't guess how critical the inconsistency is
I think when volume doesn't have IO during the accident like instance-manager pod crash, they would not have to have to rebuild before this change. After this change if we only select one replica to use, other other replicas will always have to be rebuilt (because the last modification timestamp are hardly the same between replicas I think) |
Update: I am sorry for the wrong statement above. Looks like the original design already attempted to select only 1 replica and mark other replica as err. However, there is a BUG in the implementation. The intended design from the original LEP:
The actual implementation:
|
We are using direct I/O, but not synchronous I/O. Without synchronous I/O, there is a higher risk of data loss even though user has issued
|
|
Update: Proposing this modified logic for the engine to select the replica candidate for auto salvage case longhorn/longhorn-engine#1114 (comment) |
Pre Ready-For-Testing Checklist
|
Test Plan:Test settingCase 1: Longhorn UI
Case 1: Default Longhorn StorageClass
Case 1: A StorageClass without disableRevisionCounter parameter
Test resilient(This should be implemented by an e2e test. If you are testing it manually, you can try 5 times instead of 20 times to save time)
|
Is your improvement request related to a feature? Please describe (馃憤 if you like this request)
The purpose of revision counter is to help choose one replica containing the latest data. However, the mechanism introduces a significant performance drop in Longhorn volumes.
After revisiting the design, why we think disabling RC doesn't matter is
sync
, filesystem and block layers don't guarantee data integrity. Therefore, choosing any replica for rebuilding should be finesync
, all IOs are flushed. All replicas should have identical data.Describe the solution you'd like
Describe alternatives you've considered
Additional context
https://suse.slack.com/archives/C02EDCA93RA/p1715712709278379?thread_ts=1715659059.436039&cid=C02EDCA93RA
The text was updated successfully, but these errors were encountered: