Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: RAFT Stability and observability #4928

Open
5 tasks
reyreaud-l opened this issue May 15, 2024 · 0 comments
Open
5 tasks

Epic: RAFT Stability and observability #4928

reyreaud-l opened this issue May 15, 2024 · 0 comments
Assignees

Comments

@reyreaud-l
Copy link
Contributor

RAFT currently is implemented and works, but we lack strong observability to the internals of it to help running weaviate in production and have actionable information during incidents.

Observability front

  • Provide a dedicated set of raft-related metrics (current leader, time between election, local log lags, etc...)
  • Provide/improve internal dashboards to include these metrics
  • Provide/improve internal alerting to have alerts related to these metrics (split brain alert, quorum near loss, etc...)

Operations front

  • Provide a CLI tool to interact with a weaviate cluster (list current nodes and raft status, list shard distribution, list backups, etc...)
  • Provide the ability (using the CLI above ?) to manually intervene on a weaviate cluster and force a raft state (force a node leader, add/remove nodes manually, etc...)
@reyreaud-l reyreaud-l self-assigned this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant