Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

When replicas specified in ml-config, fails wipe if number of hosts is one; allows replication with number of hosts less than 3. #832

Open
jamsilvia opened this issue Aug 16, 2017 · 4 comments
Assignees
Milestone

Comments

@jamsilvia
Copy link
Contributor

jamsilvia commented Aug 16, 2017

Two parts to this problem:

  1. If the ml-config.xml specifies a replica, but there is only one host, bootstrap will succeed (with no warnings, and no replicas created). However, trying to wipe will fail with "ERROR: XDMP-DIVBYZERO". The problem being that the checks for the number of hosts to bypass "reassign-replicas" is only done on the create step, but not on the wipe step.
  2. Replication is not going to be usable for failover unless the number of hosts > 3 (to ensure a quorum). In general, the host count should be "2n+1", where n is the number of replicas. A warning should be generated if requesting replication for < 3 hosts. A warning COULD be generated when the "2n+1" rule is not followed.

Reproduce:

  1. Problem 1 is reproducible by:
  • Edit a default ml-config.xml to setup replication for a content forest:
    <assignment>
      <forest-name>${content-db}</forest-name>
      <replica-names>
        <replica-name>${content-db}-rep1</replica-name>
      </replica-names>
      @ml.forest-data-dir-xml
    </assignment>
    <assignment>
      <forest-name nr-replicas="1">${content-db}-rep1</forest-name>
      @ml.forest-data-dir-xml
    </assignment>
  • ./ml dev bootstrap
  • ./ml dev wipe
  • Should report the division by zero error.
  1. Problem 2 is reproducible by:
  • Edit a default ml-config.xml to setup replication for a content forest:
    <assignment>
      <forest-name>${content-db}</forest-name>
      <replica-names>
        <replica-name>${content-db}-rep1</replica-name>
      </replica-names>
      @ml.forest-data-dir-xml
    </assignment>
    <assignment>
      <forest-name nr-replicas="1">${content-db}-rep1</forest-name>
      @ml.forest-data-dir-xml
    </assignment>
  • ./ml dev bootstrap
  • Silently ignores replication.
  • Create a 2-node cluster and do the same steps as above.
  • Creates a replica, but failover would be unusable.

Which Operating System are you using?
Linux and Mac OS-X

Which version of MarkLogic are you using?
9.0-2

Which version of Roxy are you using (see version.txt)?
1.7.6 and 1.7.7

I have a fix for this coded already, and we are currently testing it on our project.

@tdiepenbrock
Copy link

We have a fix already for this, Joe McGroarty has implemented it.

@grtjn
Copy link
Contributor

grtjn commented Sep 5, 2017

@tdiepenbrock If you could share the changes applied, that would be grand. Offline, PR, or in a comment..

@jamsilvia
Copy link
Contributor Author

jamsilvia commented Sep 5, 2017 via email

jamsilvia added a commit to jamsilvia/roxy that referenced this issue Sep 6, 2017
…en hosts > 3Bootstrap originally only checked for hosts >1, and wipe did not check at all.But both cases should check for number of hosts > 3.
jamsilvia added a commit to jamsilvia/roxy that referenced this issue Sep 6, 2017
…en hosts > 3

Bootstrap originally only checked for hosts >1, and wipe did not check at all.
But both cases should check for number of hosts > 3.
@jamsilvia
Copy link
Contributor Author

I submitted the pull request with the changes. Let me know if you need anything else.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants