New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pubsub: non-retryable rpc errors in subscription Receive()
method
#4257
Comments
Thanks for reporting. This error is actually being returned by a reverse proxy server rather than the Pub/Sub service itself, and the issue you described affects all of our Pub/Sub libraries. Assuming you're not overriding connections, this is likely just an issue with the server getting hitting a large number of connections. I'll submit a fix for this later this week to get this retried automatically. |
Hi @hongalex, thank you for this quick update. If I can be of any help, don't hesitate to ask. |
Any news here? This is hitting us hard recently. |
I did some investigation since this seems to not happen for some services we have running, while it happens continually for others. After som digging I managed to boil it down to one dependency; I have 2 github tags in my service, and when I do One tag is using Maybe it's an issue with a proxy, but it definitely seems to depend on some behaviour of the grpc lib as well. |
Apologies for the delay, other issues had crept up. I have a PR open to get this addressed in the short term, and will be following up on another PR to allow custom retries of our methods so workarounds will be available earlier. This fix should be merged/released later today. |
The frontend server for Pub/Sub might occasionally emit GOAWAY errors which are currently not retried. This is not unique to the Go client, though the categorization as `UNKNOWN` vs `UNVAILABLE` is a golang-GRPC issue. Although UNKNOWN [should not generally be retried](https://google.aip.dev/194), this will unblock users of `Receive` until the grpc library can be changed. See internal cl/377393940 for a similar fix in another Go library. Fixes #4257.
Is there any ETA on when this will be released? Seems like it didn't make it into the previous (v1.12) release. |
This should be released in in v1.12.1 via #4334 shortly. Thanks for your patience |
Unfortunately this still happens. Much less frequent but still. Now with this error:
The fix doesn't work because it says Should it perhaps retry on an gRPC error with status code unknown instead? |
Yeah, that was an oversight on my part. I don't think we want to retry all unknown errors (based on AIP), but I'll fix the error message checking to just check for |
🤖 I have created a release \*beep\* \*boop\* --- ### [1.12.2](https://www.github.com/googleapis/google-cloud-go/compare/pubsub/v1.12.1...pubsub/v1.12.2) (2021-07-08) ### Bug Fixes * **pubsub:** retry all goaway errors ([#4384](https://www.github.com/googleapis/google-cloud-go/issues/4384)) ([1eae86f](https://www.github.com/googleapis/google-cloud-go/commit/1eae86f1882660d901b9fb0e8dab6f138a048dbb)), refs [#4257](https://www.github.com/googleapis/google-cloud-go/issues/4257) This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Client
PubSub
Environment
Ubuntu Focal on GKE
Go Environment
$ go version
go version go1.16.5
Code
We're using the standard way to receive messages from a subscription, with default configurations (similar to the code snippet from doc):
We've managed to use this method for several months in long-living containers, without receiving any (non-retryable) errors.
On June 2, 2021, we started to receive sporadic fatal RPC errors in the following forms:
We've seen these kinds of errors several times per day since then.
Expected behavior
Automatic internal retries on these kinds of RPC errors, so that the Receive() method doesn't error.
I've noticed that there is a piece of retrying logic in pubsub/service.go that detects server shutdowns associated to the
Unavailable
code and the"Server shutdownNow invoked"
message.Maybe something recently changed in the way the remote server signals that it is shutting down and this logic needs to be updated?
Actual behavior
The Receive() method from the subscription errors.
Additional context
Started on June 2, 2021.
Not long after we upgraded to pubsub v1.11.0, although I don't see how it would be related - so I'm mentioning this mostly for completeness.
The text was updated successfully, but these errors were encountered: