New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object loss after recovery cancelled on hetero-disk, auto-vnodes and avoiding-diskfull cluster #363
Comments
Can this problem be reproduced with the fixed vnode mode? I believe the auto vnode is completely flawed stuff so should be removed in the future. If this problem is auto vnode specific, its priority wouldn't be high. |
No. This is not reproduced on fixed-vnodes cluster. As you mentioned, this seems auto-vnode specific problem. I agree that auto-vnode feature should be removed. At least, auto-vnode should be calculated node-local disk space, not cluster-wide. For example, give 1 vnodes for 1-GiB disk space; if a node have a 100-GiB disk, set its vnode to 100. |
At first stage of recovery, each sheep node sends GET_OBJ_LIST request to all other nodes in the cluster to prepare a list of the objects which should be recovered. The list can be incomplete if any of the requests failed. In such a case, the node should not send COMPLETE_RECOVERY notification to the cluster, or the cluster can lose some objects when all the nodes send that notification. This commit resolves such an issue by "canceling" recovery, instead of "finishing" it, when any of the GET_OBJ_LIST requests failed. Once the recovery in a sheep cancelled, that sheep never send COMPLETE_RECOVERY until another epoch-lifting recovery is started then completed. It also sends a DISABLE_RECOVER operation to the cluster to pause ongoing recovery in other nodes. This commit also fixes sheepdog#363. Signed-off-by: Takashi Menjo <menjo.takashi@lab.ntt.co.jp>
Reproducibility
100%.
I used Sheepdog 1.0_88_g978eeae, but other version(s) may reproduce this.
Steps to reproduce
sheep
anddog cluster format
without-V
)dog format cluster
with-F
)dog vdi write
alpha to fill it some non-zero datadog node kill
N3dog vdi read
alphaExpected behavior
At step 6, I can read the same data as the written at step 2. Because I use 2-replica cluster, up to 1 node failure should be torelable.
Actual behavior
At step 6, I cannot read the same data as the written at step 2.
I think this is real object loss. The
dog vdi read
said that "Failed to read object 00ed202b00000026 No object found". I can find that object just after step 3, but cannot after step 5 on any alive nodes i.e. N0-N2, even in.stale
.Full reproduction script and running log
The text was updated successfully, but these errors were encountered: