Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC: Blocking calls are unreliable in multi-threaded usage #24166

Open
hlandau opened this issue Apr 16, 2024 · 4 comments
Open

QUIC: Blocking calls are unreliable in multi-threaded usage #24166

hlandau opened this issue Apr 16, 2024 · 4 comments
Assignees
Labels
branch: master Merge to master branch branch: 3.2 Merge to openssl-3.2 branch: 3.3 Merge to openssl-3.3 triaged: bug The issue/pr is/fixes a bug triaged: documentation The issue/pr deals with documentation (errors)

Comments

@hlandau
Copy link
Member

hlandau commented Apr 16, 2024

Problem

The implementation of blocking calls (e.g. to SSL_read) in the QUIC implementation in OpenSSL 3.2 and 3.3 has a somewhat fundamental conceptual error in the implementation approach which renders blocking calls (SSL_set_blocking_mode(ssl, 1)) unreliable when used with multiple threads performing (auto-)ticking calls on the same connection.

To recap, a blocking call to e.g. SSL_read calls the function block_until_pred in quic_impl.c, which is designed to block until a certain condition is met:

SSL_read(conn0)
  block_until_pred()
    waits on the network read/write BIOs and a timeout, calls select()/poll()

However, consider the following sequence involving two threads T1 and T2:

T1: connects conn0 and subsidiary streams stream0, stream1
T1: enters blocking call to SSL_read(stream0), calls to OS poll(conn0_fd)

T2: enters a call to SSL_write(stream1) (which happens to not block);
T2:   call to stream1 autoticks conn0 (SSL_handle_events(conn0))
T2:     recvmsg() receives datagram and processes it before T1's call to poll(2) wakes up
T2:     stream0 is now ready to read

T1: still blocking, and will continue to block until event timeout

The underlying issue here is that if a datagram is received on a socket and a thread is currently blocking in a call to poll() on that socket, this does not guarantee that that thread will be woken up, for example if another thread calls recv() on that socket before a context switch occurs.

Resolution

The only viable resolution for this issue is to create another OS resource which can be passed to select()/poll() and use this to artificially wake other threads.12

On Linux, eventfd(2) etc. exists for this purpose. On other platforms, including Windows, this functionality can be emulated using a pipe or socketpair. It is possible to support all of our supported platforms this way.

Essentially, a socketpair or eventfd is made readable by writing a byte to it, causing poll() in other threads to wake. The readability condition is then cleared by reading the byte from the FD. The final solution actually needs to be a fair bit more complicated than this as there is a need to reliably wake up all waiting threads, not just one; the solution will probably look similar to the condition variable emulation code for Windows XP I wrote found in crypto/thread/arch/thread_win.c.

In any case, the problem with this approach is that it requires creating OS socket resources behind the back of the application, which feels at best rude for a library, particularly one like OpenSSL which seeks to allow the application to control all I/O. I intend to find a better story here (e.g. allowing the application to choose between non-blocking usage only, blocking but single-threaded usage, neither of which requires internal sockets to be created, and blocking multi-threaded usage with internal polling sockets created by OpenSSL).

In any case, IMO the solution is too invasive to backport to 3.2 and 3.3 and this will need to be documented as a known issue.

Footnotes

  1. At least, in the non-thread-assisted case. When we are allowed to spin up a thread, as in thread-assisted mode, all I/O processing could be moved to the assist thread and a condition variable could then be used instead. Note that there are compelling reasons to do this in many use cases, especially server-side, so this is likely to become a supported model in the future anyway.

  2. Technically you can wake a poll() call via a signal which results in EINTR. This is obviously not a sane or portable solution.

@hlandau hlandau added the issue: bug report The issue was opened to report a bug label Apr 16, 2024
@hlandau hlandau self-assigned this Apr 16, 2024
@t8m t8m added branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug branch: 3.2 Merge to openssl-3.2 branch: 3.3 Merge to openssl-3.3 triaged: documentation The issue/pr deals with documentation (errors) and removed issue: bug report The issue was opened to report a bug labels Apr 16, 2024
@nhorman
Copy link
Contributor

nhorman commented Apr 16, 2024

Couldn't this problem be solved more efficiently with the use of a muted, condition variable, and state variable (the last of which records which streams are readable)? It would save a potentially significant number of syscalls, and as you note, avoid having to create socket resources under the covers

@hlandau
Copy link
Member Author

hlandau commented Apr 17, 2024

Couldn't this problem be solved more efficiently with the use of a muted, condition variable, and state variable (the last of which records which streams are readable)? It would save a potentially significant number of syscalls, and as you note, avoid having to create socket resources under the covers

The problem is that modern OSes generally don't have syscalls which can block on both a condition variable and a socket at the same time.

This basically means your options for blocking on something are:

  • Block on a condition variable and have it woken up by another thread (which means we have to spin up another thread to do background processing, and that thread makes the poll() call)
  • Block on some set of sockets (as described above)

So there will always be more resources consumed — the choice is between creating a socketpair/eventfd and creating a thread.

@nhorman
Copy link
Contributor

nhorman commented Apr 17, 2024

Sorry, Ii wasn't being clear, all I meant to suggest was that you could implement what you needed without creating an additional thread - i.e. a "someone is waiting" heuristic. something like the following pseudo code:

lock_mutex
   if another_thread_is_blocking_on_the_socket
      wait_on_condition_variable
   set another_thread_is_blocking_on_the_socket = true
unlock_mutex
block_on_socket_ready
read_socket_data
lock_mutex
set_another_thread_is_blocking_on_the_socket = false
signal_condition_variable
unlock_mutex
return socket data

Thats pretty simplistic of course, and it doesn't account for cases in which the socket data has to be from a given stream in the quic connection, but you could setup a per stream signal variable, so that whatever thread gets into your critical section only wakes the correct thread (which may be itself, in which case you should signal all waiting threads, so another one can take over the blocking operation. The resource fanout is still linear of course if you do a per stream condition variable, but it might be less egregious than having to pump data through various socket pairs, or spawn a new entire thread.

@hlandau
Copy link
Member Author

hlandau commented Apr 24, 2024

@nhorman So, this is a nice idea, but won't quite work. There are various reasons, but here's an arbitrary one:

T1 calls some blocking API call, and enters block_until_pred and goes to sleep
  There was no data queued for the network when this happened, so poll() is listening to POLLIN but not POLLOUT
    (since the UDP socket kernel buffers are not full, listening to POLLOUT would make it wake up immediately)

Later T2 calls a non-blocking API call, and the connection is ticked
Data is generated and queued for transmission to the network
The amount of data generated fills the UDP socket kernel buffers, which means some
  data needs to be transmitted in the future when POLLOUT is asserted
 
The data to be transmitted may e.g. be an application message which yields a response which T1 is waiting for
Therefore to make forward progress, T1's internal poll(2) call must return and be re-executed
  so that it is now listening for POLLOUT
Since this can't happen without T2 having some way to notify T1 to wake up, T1 still has circumstances where
  blocking calls will incorrectly hang (at least until an event timeout)

But it's a nice idea and closing in on a decent solution. My take on the fix here (which does still need a notifier) is in #24257.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch branch: 3.2 Merge to openssl-3.2 branch: 3.3 Merge to openssl-3.3 triaged: bug The issue/pr is/fixes a bug triaged: documentation The issue/pr deals with documentation (errors)
Projects
Status: New
Status: New
Development

No branches or pull requests

3 participants