New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCP pub/sub subscription stops working after error #1250
Comments
Thanks for reporting this. Would you be interested in submitting a PR to fix this? |
@yaron2 I would be happy to submit a PR for this. In that case, I wonder if it would be enough to just upgrade to v 1.12.2 of the GCP pub/sub library? Or do you think it would still be useful to implement a reconnect loop in the dapr GCP pub/sub component as well (in addition to upgrading the GCP pub/sub library)? I will be testing the fix at work over the next few days and will report here by the end of this week. |
Perhaps upgrading the SDK version is enough. It would be great if you can report back and open a PR for this when possible. |
@yaron2 testing over the past few days is looking good - it appears that upgrading the GCP pub/sub library to version 1.12.2 solves the problem. I plan to submit a PR for this next week. |
Sounds good, please submit the PR when ready. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions. |
* upgrade cloud.google.com/go/pubsub to v1.12.2 Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com> * update GCP secret manager test Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com> * make modtidy-all Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com> Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com> Co-authored-by: Yaron Schneider <schneider.yaron@live.com>
Expected Behavior
GCP pub/sub subscription should be resilient to temporary service / network issues.
Actual Behavior
When the Subscription::Receive method returns an error, the subscription is not reconnected automatically and only starts working again after the dapr process is restarted; see https://github.com/dapr/components-contrib/blob/master/pubsub/gcp/pubsub/pubsub.go#L238
An example error can be:
Subscription closed with error:
rpc error: code = Unknown
desc = closing transport due to:
connection error:
desc = "error reading from server: EOF",
received prior goaway:
code: NO_ERROR,
debug data: server_shutting_down"
Steps to Reproduce the Problem
(1) Start dapr sidecar and application and subscribe to GCP pub/sub
(2) After some amount of time, when the subscription is closed due to an error, new messages will not be received any more
(3) Restart dapr to continue receiving messages
NOTE: since errors that have caused the subscription to close are not currently logged, there is no way to know when the issue has occurred; I was only able to get to the error above after adding the following code just before
return err
in thehandleSubscriptionMessages
method.I imagine that a reconnect loop could be implemented in the
handleSubscriptionMessages
method to fix this issue and make the dapr GCP pub/sub component more resilient, similar to the dapr Azure service bus component here: https://github.com/dapr/components-contrib/blob/master/pubsub/azure/servicebus/servicebus.go#L342Release Note
RELEASE NOTE:
The text was updated successfully, but these errors were encountered: