Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the connection to etcd is broken, dfuse search fails to update its connection #165

Open
matthewdarwin opened this issue Oct 6, 2020 · 0 comments

Comments

@matthewdarwin
Copy link

When the connection to etcd is broken and etcd is replaced by a different instance, dfuse search fails to update its connection and stays broken. Also the health reports as "healthy" so monitoring when this situation occurs is challenging.

One possible solution:

Add a mechanism that detects that the GRPC connection to etcd was broken and just exit and wait to get restarted by k8s or systemd or whatever.

Scenario is probably something like this:

  1. archive A tells etcd that it serves blocks 1000->2000 (BUT THAT ETCD IS GONE, REPLACED BY NEW REBUILT CLUSTER !!!)
  2. router checks etcd, reads this and sends a query to archive A down to block 1000 (BUT THAT ETCD IS GONE, SO NO UPDATES !!!)
  3. archive A says: hey I don't have block 1000, my lowest block is 1100 ("I TRIED TO TELL YOU VIA ETCD BUT MY UPDATE IS STALLED")
  4. Manually restart the router and archives
  5. they connect to the new etcd and that's all good
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant