-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for broken_promise exception observed on connection shutdown #106
Fix for broken_promise exception observed on connection shutdown #106
Conversation
c3a5aca
to
403239d
Compare
Changes in force-push
|
403239d
to
5c76161
Compare
Changes in force-push
|
d47468a
to
d76a31a
Compare
d76a31a
to
c09cd25
Compare
Changes in force-push
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one clean up suggestion
src/net/ossl.cc
Outdated
static const auto max_len = 4096; | ||
char buffer[max_len]; | ||
memset(buffer, 0, max_len); | ||
BIO_read(out.get(), buffer, max_len - 1); | ||
return sstring(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static const auto max_len = 4096; | |
char buffer[max_len]; | |
memset(buffer, 0, max_len); | |
BIO_read(out.get(), buffer, max_len - 1); | |
return sstring(buffer); | |
char * bio_ptr = nullptr; | |
auto len = BIO_get_mem_data(out.get(), &bio_ptr); | |
return sstring(bio_ptr, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char * bio_ptr = nullptr; auto len = BIO_get_mem_data(out.get(), &bio_ptr); return sstring(bio_ptr, len);
BIO_get_mem_data() returns the total number of bytes available on success, 0 if b is NULL, or a negative value in case of other errors.
Looks like the error condition should be checked?
src/net/ossl.cc
Outdated
return ossl_str; | ||
static std::optional<sstring> get_dn_string(X509_NAME* name) { | ||
auto out = bio_ptr(BIO_new(BIO_s_mem())); | ||
if (!X509_NAME_print_ex(out.get(), name, 0, ASN1_STRFLGS_RFC2253 | XN_FLAG_SEP_COMMA_PLUS | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X509_NAME_print_ex() and X509_NAME_print_ex_fp() return 1 on success or 0 on error if the XN_FLAG_COMPAT is set, which is the same as X509_NAME_print(). Otherwise, it returns -1 on error or other values on success.
Should the check be -1 == X509_NAME_print_ex
?
src/net/ossl.cc
Outdated
static const auto max_len = 4096; | ||
char buffer[max_len]; | ||
memset(buffer, 0, max_len); | ||
BIO_read(out.get(), buffer, max_len - 1); | ||
return sstring(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char * bio_ptr = nullptr; auto len = BIO_get_mem_data(out.get(), &bio_ptr); return sstring(bio_ptr, len);
BIO_get_mem_data() returns the total number of bytes available on success, 0 if b is NULL, or a negative value in case of other errors.
Looks like the error condition should be checked?
@@ -1444,6 +1444,10 @@ SEASTAR_THREAD_TEST_CASE(test_dn_name_handling) { | |||
fout.get(); | |||
|
|||
auto dn = fdn.get(); | |||
BOOST_REQUIRE(dn.has_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for the failure modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, i believe these unit tests could be enhanced a bit more
(cherry picked from commit 5466b77)
- Now that code has been moved around, the session class' methods need to be known to consumers for example tls_connected_socket_impl. - Solution is to create a new interface, called session_impl that tls::session inherits from. session_ref now stores a pointer to this interface instead of the concrete instance. - This will be useful for defining other session types in future commits (cherry picked from commit 0ffb0fc)
- Now that code has been moved around the private _impl within the certificate credentials class cannot be accessed externally due to the fact that impl is forward defined in tls.hh - Solution is to create new method on the certificate_credentials class that will forward the call to the underlying impl (cherry picked from commit d44bf62)
- Also disables websockets (cherry picked from commit eb2cfd5)
- The implementation of the struct isn't neccessary to get up and running with a simple server. However the methods must have implementations or the code won't compile. (cherry picked from commit 173b1a6)
(cherry picked from commit 789669a)
(cherry picked from commit 1cf20ec)
(cherry picked from commit 1298e57)
(cherry picked from commit 0e8cfae)
(cherry picked from commit 10c083a)
- Caching the x509 certs as they are in the process of being verified is neccessary because openssl does not provide an API to get certificates that have failed verification. - Examining failed certificates for the reason they failed will be implemented in future commits. (cherry picked from commit 5e14b14)
(cherry picked from commit b453dd7)
(cherry picked from commit 699444a)
(cherry picked from commit 62f6a66)
(cherry picked from commit dfb03e3)
- Differences in dn name handling will cause issues in projects that expected a subject or issuer to be in a particular format. This is the case for features like mtls principal mapping within redpanda. - The method X509_NAME_oneline printed a nicely human readable version of the distinguished name parameters however this differed from what GnuTLS would print. - The change swaps out the use of X509_NAME_oneline in favor of X509_NAME_print_ex which accepts an integer for which can be used to control how the distinguished name is formatted. (cherry picked from commit c09cd25)
c09cd25
to
5809ab4
Compare
Note to reviewers, base has been modified to |
- x509_verify was never the correct function to verify the signature of a private key, replacing with x509_check_private_key instead. Fixes: CORE-2025
- Flush() aquires the _out_sem and so does do_shutdown, a deadlock was observed due to this fact. Flush would wait for shutdown to complete, which was being held by wait_for_eof. After a 10s timeout, eof would manually be set and the stall would be resolved. - The fix is to not call wait_for_eof within the context of the _out_sem.
- With this behavior redpanda will not log disconnects as ERRORs during shutdown
- This line will cause redpanda to crash as its currently set to pass GnuTLS specific priority strings.
9512dd9
to
c33efd7
Compare
src/net/ossl.cc
Outdated
@@ -824,7 +850,7 @@ class session : public enable_shared_from_this<session>, public session_impl { | |||
f = pull_encrypted_and_send(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this by maybe_send_data()
?
@@ -1204,7 +1227,8 @@ class session : public enable_shared_from_this<session>, public session_impl { | |||
future<> server_handshake() { | |||
return wait_for_input().then([this]{ | |||
if (eof()) { | |||
return make_exception_future<>(std::runtime_error("EOF observed during handshake")); | |||
_error = std::make_exception_ptr(std::system_error(ENOTCONN, std::system_category())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be an existing error that should be preferred/not overwritten?
} | ||
|
||
future<> client_handshake() { | ||
return do_handshake(SSL_connect, [this]{ | ||
return pull_encrypted_and_send().then([this]{ | ||
return wait_for_input().then([this]{ | ||
if (eof()) { | ||
return make_exception_future<>(std::runtime_error("EOF observed during handshake")); | ||
_error = std::make_exception_ptr(std::system_error(ENOTCONN, std::system_category())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be an existing error that should be preferred/not overwritten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a test?
The commit message talks about Redpanda, but I think it would be more appropriate to discuss this in terms of compatibility with existing behaviour in GnuTLS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message talks about Redpanda, but I think it would be more appropriate to discuss this in terms of compatibility with existing behaviour in GnuTLS.
In this situation, GnuTLS will return an error code which turns into a system_error
which Redpanda does parse to see if it's a reconnect type error.
if (!ext) { | ||
return {}; | ||
} | ||
auto names = general_names_ptr((GENERAL_NAMES*)X509V3_EXT_d2i(ext.get())); | ||
auto names = general_names_ptr((GENERAL_NAMES*)X509V3_EXT_d2i(ext)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer static_cast
.
seastar_set_dep_args (OpenSSL REQUIRED | ||
VERSION 3.0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this version correct? Is it an exact match? A minimum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minimum.
src/net/ossl.cc
Outdated
if (!_options.wait_for_eof_on_shutdown) { | ||
return make_ready_future<>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note about why this is "sidestepped".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches original behavior in the GnuTLS implementation
@@ -315,11 +315,9 @@ class tls::certificate_credentials::impl { | |||
if (!pkey) { | |||
throw ossl_error("Error attempting to parse private key"); | |||
} | |||
#if 0 // https://github.com/redpanda-data/core-internal/issues/1233 | |||
if (!X509_verify(x509_cert.get(), pkey.get())) { | |||
if (!X509_check_private_key(x509_cert.get(), pkey.get())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a test be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seastar's TLS unit tests do verify the coherency of the key and cert
@@ -828,7 +828,7 @@ class session : public enable_shared_from_this<session>, public session_impl { | |||
} | |||
|
|||
future<> do_shutdown() { | |||
if(_error || !connected() || eof()) { | |||
if(_error || !connected()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a test be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of shutdown unit tests in Seastar that verify the shutdown behavior
58f6053
to
83fae91
Compare
Leaving this in draft, but will close. Please review #115 |
REVIEWERS: Please review #115 instead
A broken_promise exception was observed when calling
close()
due to the fact that the input stream was called while it was waiting on a pending operation to complete.This occurred on call to close because of how
do_shutdown()
is implemented.do_shutdown()
acquires the output semaphore, but not the input, then attempts towait_for_input()
. If there is already a waiter on the read path, then the broken_promise exception will be observed whendo_shutdown()
callsdo_get()
.The fix is to simply modify
do_shutdown()
to callwait_for_eof()
instead of wait_for_input(),however this would be incomplete because there are other instances in the implementation where out of band operations on the read / write path come from opposite paths.To resolve the above concern new methods have been developed (maybe_wait_for_data and maybe_send_data) which attempt to acquire the in/out semaphore before performing the desired operation. If the semaphore is already acquired, then the method just returns to the caller, which would lead to a retry until the operation can be performed.
Fixes: CORE-539