Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object loss after recovery cancelled on hetero-disk, auto-vnodes and avoiding-diskfull cluster #363

Open
tmenjo opened this issue Feb 16, 2017 · 2 comments · May be fixed by #371
Open

Object loss after recovery cancelled on hetero-disk, auto-vnodes and avoiding-diskfull cluster #363

tmenjo opened this issue Feb 16, 2017 · 2 comments · May be fixed by #371

Comments

@tmenjo
Copy link
Contributor

tmenjo commented Feb 16, 2017

Reproducibility

100%.

I used Sheepdog 1.0_88_g978eeae, but other version(s) may reproduce this.

Steps to reproduce

  1. Set up a 2-replica cluster with 4 nodes (say N0-N3) like below:
    • hetero-disk: each of N0-N2 has a 128-MiB disk and N3 has a 256-MiB one
    • auto-vnodes (sheep and dog cluster format without -V)
    • enable avoiding-diskfull option (dog format cluster with -F)
  2. Create a 256-MiB thin-provisioned VDI alpha
  3. Do dog vdi write alpha to fill it some non-zero data
  4. Do dog node kill N3
  5. Wait until recovery cancelled because of diskfull
  6. Do dog vdi read alpha

Expected behavior

At step 6, I can read the same data as the written at step 2. Because I use 2-replica cluster, up to 1 node failure should be torelable.

Actual behavior

At step 6, I cannot read the same data as the written at step 2.

I think this is real object loss. The dog vdi read said that "Failed to read object 00ed202b00000026 No object found". I can find that object just after step 3, but cannot after step 5 on any alive nodes i.e. N0-N2, even in .stale.

Full reproduction script and running log

@mitake
Copy link
Contributor

mitake commented Feb 17, 2017

Can this problem be reproduced with the fixed vnode mode? I believe the auto vnode is completely flawed stuff so should be removed in the future. If this problem is auto vnode specific, its priority wouldn't be high.

@tmenjo tmenjo changed the title Object loss after recovery canelled on hetero-disk, auto-vnodes and avoiding-diskfull cluster Object loss after recovery cancelled on hetero-disk, auto-vnodes and avoiding-diskfull cluster Feb 17, 2017
@tmenjo
Copy link
Contributor Author

tmenjo commented Feb 17, 2017

No. This is not reproduced on fixed-vnodes cluster. As you mentioned, this seems auto-vnode specific problem.

I agree that auto-vnode feature should be removed. At least, auto-vnode should be calculated node-local disk space, not cluster-wide. For example, give 1 vnodes for 1-GiB disk space; if a node have a 100-GiB disk, set its vnode to 100.

tmenjo added a commit to tmenjo/sheepdog that referenced this issue Feb 27, 2017
At first stage of recovery, each sheep node sends GET_OBJ_LIST
request to all other nodes in the cluster to prepare a list of
the objects which should be recovered. The list can be incomplete
if any of the requests failed. In such a case, the node should not
send COMPLETE_RECOVERY notification to the cluster, or the cluster
can lose some objects when all the nodes send that notification.

This commit resolves such an issue by "canceling" recovery, instead
of "finishing" it, when any of the GET_OBJ_LIST requests failed.
Once the recovery in a sheep cancelled, that sheep never send
COMPLETE_RECOVERY until another epoch-lifting recovery is started
then completed. It also sends a DISABLE_RECOVER operation to the
cluster to pause ongoing recovery in other nodes.

This commit also fixes sheepdog#363.

Signed-off-by: Takashi Menjo <menjo.takashi@lab.ntt.co.jp>
@tmenjo tmenjo linked a pull request Feb 27, 2017 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants