Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #1250 by upgrading cloud.google.com/go/pubsub to v1.12.2 #1285

Merged
merged 5 commits into from Jan 13, 2022

Conversation

yordan-pavlov
Copy link
Contributor

Description

Upgrade cloud.google.com/go/pubsub to v1.12.2 which includes a fix for googleapis/google-cloud-go#4257 where the Subscription::Receive method exits with error but should instead automatically reconnect.

Prior to upgrading the GCP pub/sub library, the dapr sidecar would keep running (without logging any errors or warnings) but the subscription connection would stop working and no further messages would be received until the dapr sidecar is restarted. An example error that could trigger this behavior is:

rpc error: code = Unknown
desc = closing transport due to:
connection error:
desc = "error reading from server: EOF",
received prior goaway:
code: NO_ERROR,
debug data: server_shutting_down

After upgrading cloud.google.com/go/pubsub to v1.12.2, the issue hasn't been observed any more after several days of testing.

Issue reference

Please reference the issue this PR will close: #1250

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • not relevant to this change - Created/updated tests
  • not relevant to this change - Extended the documentation / Created issue in the https://github.com/dapr/docs/ repo: dapr/docs#[issue number]

@yordan-pavlov yordan-pavlov requested review from a team as code owners November 9, 2021 14:21
@ghost
Copy link

ghost commented Nov 9, 2021

CLA assistant check
All CLA requirements met.

Copy link
Contributor

@CodeMonkeyLeet CodeMonkeyLeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this change, it's much needed! The update to the cloud.google.com libs though also triggers the linter to rebuild and check components that depend on it and it's flagging the cloudstate state store component for its use of the deprecated ptypes package. That component will need to be updated at the same time for the CI pipeline to allow this PR to be merged.

@yordan-pavlov
Copy link
Contributor Author

@CodeMonkeyLeet thank you for reviewing and looking into the linting errors.
I was wondering if something like this would happen.
I am not convinced that updating the cloudstate component should be in this PR - I think it would be cleaner in a separate PR; any thoughts?
Does anyone have capacity to update the affected component?

@CodeMonkeyLeet
Copy link
Contributor

@yordan-pavlov I agree, it probably should be a separate PR, and that it should go in before this one. There's already a bug tracking it at #1261, but I suspect that with the code freeze for v1.5 release in RC, that's probably not going to get done right now. We should circle back to this no later than next week though.

@yordan-pavlov
Copy link
Contributor Author

I have also added a reconnect loop for GCP pub/sub subscriptions here #1293 as a more general solution to this type of problem.

artursouza
artursouza previously approved these changes Nov 19, 2021
@beiwei30
Copy link
Member

@yordan-pavlov would you pls. kindly check my PR (yordan-pavlov#2) to solve the conflict?

@yordan-pavlov
Copy link
Contributor Author

@beiwei30 thank you for looking into this, I will check your PR as soon as I can (should be later today or tomorrow morning)

@beiwei30
Copy link
Member

@yordan-pavlov thanks and no hurry. I made a mistake by accident when I tried to resolve conflict in your PR :)

@CodeMonkeyLeet
Copy link
Contributor

FYI, there's another issue being resolved in #1320 that will probably impact this PR.

@beiwei30
Copy link
Member

FYI, there's another issue being resolved in #1320 that will probably impact this PR.

@CodeMonkeyLeet I change the solution to #1332, pls. take a look.

@beiwei30
Copy link
Member

Running [/home/runner/golangci-lint-1.31.0-linux-amd64/golangci-lint run --out-format=github-actions] in [] ...
level=error msg="Running error: context loading failed: no go files to analyze"

@yordan-pavlov pls. run go mod tidy to pass this error, and then you may then encounter the unit test failure in gcp.

@yaron2
Copy link
Member

yaron2 commented Dec 26, 2021

@yordan-pavlov can you check why the tests are failing? We would like to get this into 1.6 if possible.

@yordan-pavlov
Copy link
Contributor Author

yordan-pavlov commented Dec 26, 2021

@yaron2 looks like the tests are failing because of the following error, not sure what's causing that - looks like tests were passing after my last commit:

--- FAIL: TestInit (0.00s)
    --- FAIL: TestInit/Init_with_Wrong_metadata (0.00s)
        secretmanager_test.go:32: 
            	Error Trace:	secretmanager_test.go:32
            	Error:      	Expected nil, but got: &errors.errorString{s:"failed to setup secretmanager client: google: could not parse key: private key should be a PEM or plain PKCS1 or PKCS8; parse error: asn1: syntax error: truncated tag or length"}
            	Test:       	TestInit/Init_with_Wrong_metadata
FAIL
coverage: 45.1% of statements
FAIL	github.com/dapr/components-contrib/secretstores/gcp/secretmanager	0.005s

@yordan-pavlov
Copy link
Contributor Author

it would be great if #1293 could be be merged for the 1.6 release as well - it adds much needed logging plus a configurable reconnect loop for even more reselience to the GCP pub/sub component

@yaron2
Copy link
Member

yaron2 commented Dec 27, 2021

it would be great if #1293 could be be merged for the 1.6 release as well - it adds much needed logging plus a configurable reconnect loop for even more reselience to the GCP pub/sub component

Sure, will take a look at it too.

@yaron2
Copy link
Member

yaron2 commented Jan 7, 2022

@yordan-pavlov can you run go mod tidy -compat=1.17 and push the changes please?

@yordan-pavlov
Copy link
Contributor Author

yordan-pavlov commented Jan 8, 2022

@yaron2 I did run go mod tidy -compat=1.17 but this didn't generate any changes to commit. I am using a windows machine and I noticed the go mod tidy build check failed for linux, but I don't think it should matter. I wonder what could be causing the build to fail.

@Taction
Copy link
Member

Taction commented Jan 9, 2022

@yaron2 I did run go mod tidy -compat=1.17 but this didn't generate any changes to commit. I am using a windows machine and I noticed the go mod tidy build check failed for linux, but I don't think it should matter. I wonder what could be causing the build to fail.

@yordan-pavlov You can run make modtidy-all and then push the changes. Also please fix DCO issues, you can refer to here about how to fix it.

@yaron2
Copy link
Member

yaron2 commented Jan 9, 2022

@yaron2 I did run go mod tidy -compat=1.17 but this didn't generate any changes to commit. I am using a windows machine and I noticed the go mod tidy build check failed for linux, but I don't think it should matter. I wonder what could be causing the build to fail.

Also, please make sure you have Go 1.17 installed

@yordan-pavlov
Copy link
Contributor Author

@yaron2 I understand now, the certification tests have their own go.mod files which make modtidy-all updates, I have just pushed that change, thanks for your help; my go version is go version go1.17.2 windows/amd64; I will fix the DCO next.

@yordan-pavlov
Copy link
Contributor Author

@yaron2 the github.com/dapr/components-contrib/secretstores/gcp/secretmanager tests are still failing, it appears that this check

// Even if we pass wrong credentials `client.NewClient` dont throw an error
assert.Nil(t, err)

here https://github.com/dapr/components-contrib/blob/master/secretstores/gcp/secretmanager/secretmanager_test.go#L45
is no longer true as this now returns error:

Expected nil, but got: &errors.errorString{s:"failed to setup secretmanager client: 
google: could not parse key: private key should be a PEM or plain PKCS1 or PKCS8; 
parse error: asn1: syntax error: truncated tag or length

I wonder if extra validation was added in one of the upgraded libraries. I still haven't been able to track what exactly is causing this change in behavior.

Yordan Pavlov added 2 commits January 13, 2022 10:00
Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com>
Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com>
@yordan-pavlov
Copy link
Contributor Author

@yaron2 I have fixed the DCO and updated the failing GCP secret manager test

@codecov
Copy link

codecov bot commented Jan 13, 2022

Codecov Report

Merging #1285 (c48b97d) into master (3af57ca) will decrease coverage by 0.11%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1285      +/-   ##
==========================================
- Coverage   35.11%   35.00%   -0.12%     
==========================================
  Files         153      153              
  Lines       13642    13644       +2     
==========================================
- Hits         4791     4776      -15     
- Misses       8345     8365      +20     
+ Partials      506      503       -3     
Impacted Files Coverage Δ
state/azure/blobstorage/blobstorage.go 27.08% <66.66%> (+1.02%) ⬆️
secretstores/gcp/secretmanager/secretmanager.go 38.88% <0.00%> (-23.62%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44199ca...c48b97d. Read the comment docs.

@yaron2 yaron2 merged commit b8063f2 into dapr:master Jan 13, 2022
@yordan-pavlov yordan-pavlov deleted the upgrade-gcp-pubsub-lib branch February 16, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCP pub/sub subscription stops working after error
7 participants