Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recovery: retry on fetching log spec #307

Open
jgraettinger opened this issue Sep 24, 2021 · 1 comment
Open

recovery: retry on fetching log spec #307

jgraettinger opened this issue Sep 24, 2021 · 1 comment
Labels

Comments

@jgraettinger
Copy link
Contributor

Observed from a consumer application on a recent release, where gazette and the consumer were rolling in tandem (the consumer happened to pick a gazette broker that was exiting):

{"err":"beginRecovery: fetching log spec: rpc error: code = Unavailable desc = error reading from server: read tcp 10.0.0.35:51576-\u003e10.3.247.101:8080: read: connection reset by peer","level":"error","msg":"serveStandby failed","shard":"/gazette/consumers/flow/reactor/items/ ... ","time":"2021-09-24T18:58:07Z"}
@jgraettinger
Copy link
Contributor Author

jgraettinger commented Sep 24, 2021

Digging into this, I'm hesitant to slap a retry on it because we are using FailFast(false) or equivalently WaitForReady(true) on these and other calls, and per WaitForReady comments gRPC should be re-trying these requests on a transient failure if the server didn't process the request (details.

We go a graceful gRPC stop, and all of the brokers exited cleanly, so I'm unsure why these pods received a TCP RST.

Likely will take no immediate action unless this is a repeated problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant