New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pubsub): retry deadline exceeded errors in Acknowledge #3157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand what's going on here.
You're using the first 3 min context for sleeping, but it will unpause early if the context transitions to Done, at which point it's unusable anyways.
The second cctx2 context appears to be used for making the Acknowledge calls.
The check for the "context deadline exceeded" is just a matter of probing context.Done(), isn't it? Should this be rewritten as for loop with a select that let's you wait on context expiry or a response from the rpc?
It's entirely possible I'm missing something obvious here like the interaction between the iterator and the acks, given my lack of familiarity with this.
// Use the outer context with timeout here. Errors from gax, including | ||
// context deadline exceeded should be transparent, as unacked messages | ||
// will be redelivered. | ||
if err := gax.Sleep(cctx, bo.Pause()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if we got a deadline exceeded, this seems to be a sleep and return, but I'm not seeing the retry. I'd expect a continue if a retry of the Acknowledge() call on line 400 is the intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the gax.Sleep
call is to briefly pause before retrying the Acknowledge
call. The err checking on this line, and the other instance of gax.Sleep
, is to see if the outer 3 minute context has finished or not. If so, the entire call will terminate since err != nil
and the next line will return nil
. If err == nil
here, Acknowledge
will be called again on the next iteration of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was misreading the logic, thanks for the clarification.
The 3m context is for sleeping yes, but the context should only transition to Done after 3 minutes (I don't cancel elsewhere). The idea is that I only want to retry
I believe in strange circumstances, the RPC call can return a |
// Use the outer context with timeout here. Errors from gax, including | ||
// context deadline exceeded should be transparent, as unacked messages | ||
// will be redelivered. | ||
if err := gax.Sleep(cctx, bo.Pause()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was misreading the logic, thanks for the clarification.
Currently,
DeadlineExceeded
errors in theAcknowledge
RPC can cause theiterator
to fail, and thus causing the entireerrgroup
andReceive
call to shut down. This is a bad user experience as the better solution is to allowDeadlineExceeded
to be transparent and allow messages to be redelivered. This change marks rpcDeadlineExceeded
andcontext deadline exceeded
errors from theAcknowledge
RPC to fail transparently soReceive
does not shut down.In addition, this adds a 3 minute timeout for
DeadlineExceeded
errors to be retried with backoff in the case of network issues. This should not impact most users as the default 60 second timeout for eachAcknowledge
RPC is still intact. We can look into exposing these values in the future.