Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client stops acknowledging responses #208

Open
1 of 7 tasks
ongardie opened this issue Feb 20, 2016 · 0 comments
Open
1 of 7 tasks

client stops acknowledging responses #208

ongardie opened this issue Feb 20, 2016 · 0 comments
Labels

Comments

@ongardie
Copy link
Member

ClientImpl.cc misses a call to doneWithRPC() for keepalives that have to be retried:
https://github.com/logcabin/logcabin/blob/v1.1.0/Client/ClientImpl.cc#L397
This caused the client's firstOutstandingRPC to get stuck, causing servers to accumulate session state, write a corrupt snapshot (SnapshotStateMachine protobuf got too large), then entering a period of not being able to write snapshots (SnapshotStateMachine protobuf got even larger).

This is a critical bug that caused a production outage, since the snapshot could not be read back in.

There's several things we should improve:

  • Add a doneWithRPC() call to that path
  • Make doneWithRPC() an RAII-style thing that's automatically called
  • Clients should have an upper limit on how many outstanding RPCs they're willing to have
  • State machines should have an upper limit on how many outstanding RPCs they permit each client to have (may have to be conservative for older clients, probably requires a state machine version bump)
  • Add relevant stats to the server side for size of the session responses
  • PANIC() a server immediately when it's attempting to serialize a protobuf that's larger than a few MB in size. The protobuf library (v2.x) seems to be able to write messages that it cannot read when they get too large.
  • For disaster recovery, enhance storage-tool with an option to rewrite a snapshot except with a SnapshotStateMachineHeader that just includes the version history, removing all the sessions.

/cc @nhardt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant