New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QUIC: Blocking calls are unreliable in multi-threaded usage #24166
Comments
Couldn't this problem be solved more efficiently with the use of a muted, condition variable, and state variable (the last of which records which streams are readable)? It would save a potentially significant number of syscalls, and as you note, avoid having to create socket resources under the covers |
The problem is that modern OSes generally don't have syscalls which can block on both a condition variable and a socket at the same time. This basically means your options for blocking on something are:
So there will always be more resources consumed — the choice is between creating a socketpair/eventfd and creating a thread. |
Sorry, Ii wasn't being clear, all I meant to suggest was that you could implement what you needed without creating an additional thread - i.e. a "someone is waiting" heuristic. something like the following pseudo code:
Thats pretty simplistic of course, and it doesn't account for cases in which the socket data has to be from a given stream in the quic connection, but you could setup a per stream signal variable, so that whatever thread gets into your critical section only wakes the correct thread (which may be itself, in which case you should signal all waiting threads, so another one can take over the blocking operation. The resource fanout is still linear of course if you do a per stream condition variable, but it might be less egregious than having to pump data through various socket pairs, or spawn a new entire thread. |
@nhorman So, this is a nice idea, but won't quite work. There are various reasons, but here's an arbitrary one:
But it's a nice idea and closing in on a decent solution. My take on the fix here (which does still need a notifier) is in #24257. |
Problem
The implementation of blocking calls (e.g. to
SSL_read
) in the QUIC implementation in OpenSSL 3.2 and 3.3 has a somewhat fundamental conceptual error in the implementation approach which renders blocking calls (SSL_set_blocking_mode(ssl, 1)
) unreliable when used with multiple threads performing (auto-)ticking calls on the same connection.To recap, a blocking call to e.g.
SSL_read
calls the functionblock_until_pred
inquic_impl.c
, which is designed to block until a certain condition is met:However, consider the following sequence involving two threads T1 and T2:
The underlying issue here is that if a datagram is received on a socket and a thread is currently blocking in a call to poll() on that socket, this does not guarantee that that thread will be woken up, for example if another thread calls recv() on that socket before a context switch occurs.
Resolution
The only viable resolution for this issue is to create another OS resource which can be passed to select()/poll() and use this to artificially wake other threads.12
On Linux, eventfd(2) etc. exists for this purpose. On other platforms, including Windows, this functionality can be emulated using a pipe or socketpair. It is possible to support all of our supported platforms this way.
Essentially, a socketpair or eventfd is made readable by writing a byte to it, causing poll() in other threads to wake. The readability condition is then cleared by reading the byte from the FD. The final solution actually needs to be a fair bit more complicated than this as there is a need to reliably wake up all waiting threads, not just one; the solution will probably look similar to the condition variable emulation code for Windows XP I wrote found in crypto/thread/arch/thread_win.c.
In any case, the problem with this approach is that it requires creating OS socket resources behind the back of the application, which feels at best rude for a library, particularly one like OpenSSL which seeks to allow the application to control all I/O. I intend to find a better story here (e.g. allowing the application to choose between non-blocking usage only, blocking but single-threaded usage, neither of which requires internal sockets to be created, and blocking multi-threaded usage with internal polling sockets created by OpenSSL).
In any case, IMO the solution is too invasive to backport to 3.2 and 3.3 and this will need to be documented as a known issue.
Footnotes
At least, in the non-thread-assisted case. When we are allowed to spin up a thread, as in thread-assisted mode, all I/O processing could be moved to the assist thread and a condition variable could then be used instead. Note that there are compelling reasons to do this in many use cases, especially server-side, so this is likely to become a supported model in the future anyway. ↩
Technically you can wake a poll() call via a signal which results in EINTR. This is obviously not a sane or portable solution. ↩
The text was updated successfully, but these errors were encountered: