Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: do not retry connection errors #18412

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

nvartolomei
Copy link
Contributor

Let's see what CI says.


I spotted that in some rpfixture tests the cluster metadata uploader is running with an incorrect port1. Beside logging lots of connection refused errors, I found it to adding a non-negligible overhead to some variations of the test. The overhead is result of relatively tight loop in the client::get_connected method which I'm trying to remove in this commit.

I don't believe this is necessary. Returning the error to the caller is much better. The callers usually have better retry mechanisms in place with backoff and can log much more useful messages since they have access to more context about the operation.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

Footnotes

  1. We need to investigate separately why it doesn't get the updated configuration when we configure cloud storage (you can try this test for example https://github.com/redpanda-data/redpanda/issues/13275) or if it needs to run as well.

I spotted that in some rpfixture tests the cluster metadata uploader is
running with an incorrect port[^1]. Beside logging lots of connection
refused errors, I found it to adding a non-negligible overhead to some
variations of the test. The overhead is result of relatively tight loop
in the `client::get_connected` method which I'm trying to remove in this
commit.

I don't believe this is necessary.  Returning the error to the caller is
much better. The callers usually have better retry mechanisms in place
with backoff and can log much more useful messages since they have
access to more context about the operation.

[^1]: We need to investigate separately why it doesn't get the updated
configuration when we configure cloud storage (you can try this test for
example redpanda-data#13275) or if it
needs to run as well.
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 11, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68d6-a47c-4938-bc9d-dd0bfeb2e728:

"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68d6-a47e-4e6a-b920-32df6730dae7:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_admin_oauth"

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68d6-a483-4f85-9068-9801b0c288c6:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_kafka_oauth.authz_match=AuthorizationMatch.RBAC"

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68d6-a481-4bd0-b5ee-d6abf440a3af:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_kafka_oauth.authz_match=AuthorizationMatch.ACL"

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68df-c1ab-437d-8f4f-b1eaafda6af1:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_admin_oauth"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68df-c1ae-4bf1-8107-77c9356e41d5:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_kafka_oauth.authz_match=AuthorizationMatch.ACL"

new failures in https://buildkite.com/redpanda/redpanda/builds/48975#018f68df-c1b1-4f73-a1c1-6b80279eb782:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_kafka_oauth.authz_match=AuthorizationMatch.RBAC"

@nvartolomei
Copy link
Contributor Author

Blocked on #18422

@nvartolomei nvartolomei marked this pull request as draft May 13, 2024 09:42
current = ss::lowres_clock::now();
// Any TLS error have to be propagated because it's not
// transient. It won't help to try once again.
if (_as != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misses the deadline .... if above deadline we should not try, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants