Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI tools: check_if_node_is_quorum_critical: reduce response wait time from peers that are stopped, unreachable or down #9755

Open
michaelklishin opened this issue Oct 20, 2023 · 0 comments
Assignees

Comments

@michaelklishin
Copy link
Member

Previously discussed in #9522.

Currently rabbitmq-diagnostics check_if_node_is_quorum_critical does the following to find out if any of the queues or streams would lose their quorum if the current node is stopped:

  • List QQs with local replicas with minimum quorum
  • List streams with local replicas with minimum quorum
  • See if the list is blank or not

To find out if a QQ or stream has "minimum quorum" it contacts all running nodes, where the definition of "running" is that of rabbit_nodes:list_running/0, which contacts other nodes with a 10s timeout.

By using a local snapshot of cluster members (that is, without checking with other nodes to see if they are online/reachable), the effects of down nodes on CLI command return operation should be significantly reduced.

@ikavgo ikavgo self-assigned this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants