[Pal/lib] Add spinlocks to mbedTLS-specific SSL recv/send #2059

dimakuv · 2021-01-05T11:26:02Z

Description of the changes

Linux-SGX PAL wraps all pipe/UNIX domain socket communication in TLS sessions. Previously, Graphene assumed that only one thread at a time accesses one TLS session (i.e., no multi-threading support). This commit adds spinlocks to lib_SSL* functions to support such (rare) multi-threading scenarios.

Note that we cannot rely on pthreads and/or mutexes so we use simple spinlocks. Because this is in the "common library" used by PAL and LibOS, with no "complex" dependencies allowed.

Also, there is a prerequisite (first) commit in this PR: remove api.h from spinlock.h. See commit messages for explanations.

Fixes #2058.

How to test this PR?

This PR adds a corresponding LibOS test pipe_multithread (kudos to @lejunzhu for the initial version of this test and debugging this).

This change is

The common header "spinlock.h" contains spinlock implementation. It may be used in LibOS, PAL, and other parts of Graphene. Previously, it included "api.h" which is PAL-specific. This prevented using "spinlock.h" in PAL-agnostic parts of Graphene such as mbedTLS crypto adapters. This commit removes this dependency and fixes a few emerged include-related bugs.

Linux-SGX PAL wraps all pipe/UNIX domain socket communication in TLS sessions. Previously, Graphene assumed that only one thread at a time accesses one TLS session (i.e., no multi-threading support). This commit adds spinlocks to `lib_SSL*` functions to support such (rare) multi-threading scenarios. Note that we cannot rely on pthreads and/or mutexes so we use simple spinlocks. This commit adds a corresponding LibOS test.

dimakuv

Reviewable status: 0 of 8 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

LibOS/shim/test/regression/Makefile, line 173 at r1 (raw file):

CFLAGS-sigaction_per_process += -pthread
CFLAGS-signal_multithread += -pthread
CFLAGS-pthread_set_get_affinity += -pthread

FYI: This wasn't sorted, so I just added sorting.

Pal/lib/crypto/adapters/mbedtls_adapter.c, line 388 at r1 (raw file):

     *       recv() and the other will spin and burn CPU cycles. We consider "shared SSL context"
     *       a rare case and use simple spinlocks instead of mutexes. */
    assert(spinlock_is_locked(&ssl_ctx->lock));

FYI: I consider the performance overhead of spinlocks here minimal. We never hit multi-threaded pipes until #2058 (actually, I know that Erlang does the same), so most apps won't notice this. And the apps that do use this multi-threaded pattern... well, I just hope it's not on critical path because these are just pipes for intra-process communication.

Pal/src/pal_internal.h, line 12 at r1 (raw file):

#include <stdarg.h>
#include <sys/types.h>

FYI: That's because one function here uses ssize_t type.

dimakuv · 2021-01-05T11:54:40Z

Jenkins, retest Jenkins-Debug please (LTP test alarm03 failed, cannot reproduce locally and generally unrelated)

lejunzhu · 2021-01-06T00:42:21Z

I think protecting mbedtls_ssl_read and mbedtls_ssl_write with the same lock may cause other problems. Read usually blocks for a long time. If the application is trying to send when recv'ing in another thread, it will cause a deadlock.

For example, this test C program runs fine under Linux and Graphene SGX before this PR, but with this PR it will hang.
test.txt

I don't know a real world example like this, but I guess it is possible.

Looking at mbedTLS code, read and write does not seem to cause conflict, except when handshake or re-negotiation is pending. And I can't think why an application will call recv() from two threads. Therefore we can probably call mbedtls_ssl_read without a lock.

dimakuv · 2021-01-06T12:23:10Z

I don't know a real world example like this, but I guess it is possible.

The attached test program reads and writes on the same file descriptor from two threads. This looks like an unrealistic scenario. Typically applications create a socket pair or a pipe pair and use one end for recv and another end for send.

...And I can't think why an application will call recv() from two threads. Therefore we can probably call mbedtls_ssl_read without a lock.

I definitely don't like the idea of having mbedtls_ssl_read() not protected with a lock. What happens if we do mbedtls_ssl_read(my_ssl_ctx) and a simultaneous mbedlts_ssl_write(my_ssl_ctx)? Even though the write is protected with a lock, it modifies my_ssl_ctx. If the read looks at my_ssl_ctx simultaneously, then it may observe inconsistent state of the SSL context, and all hell breaks loose. It's better to deadlock in such situations than to have memory/SSL state corruptions.

lejunzhu · 2021-01-07T00:56:06Z

It seems that calling fork() after a thread have started recv() will also cause deadlock. See:
test.txt

Perhaps we can drop the lock before ocall_recv, then re-acquire it after it returns? This should be fine as long as re-negotiation is disabled.

Or, at least we can use spinlock with a timeout and print some warning message, so users can see and report the problem when it happens.

yamahata

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

Pal/lib/crypto/adapters/mbedtls_adapter.c, line 417 at r1 (raw file):

     *       a rare case and use simple spinlocks instead of mutexes. */
    assert(spinlock_is_locked(&ssl_ctx->lock));
    ssize_t ret = ssl_ctx->pal_send_cb(fd, buf, buf_size);

pal_send_cb eventually calls blocking host system call?
it would cause troubles.
Anyway I don't know about MT-requirement for mbedtls. So I leave this comment non-blocking.

dimakuv

Perhaps we can drop the lock before ocall_recv, then re-acquire it after it returns? This should be fine as long as re-negotiation is disabled.

But this doesn't help with the SSL context... mbedTLS internally changes its own metadata before and after the recv() syscall/ocall. So there can be a situation when two threads try to recv on the same SSL context, and one unlocks the SSL context and blocks on recv(), and the other thread now can proceed, and observes inconsistent state of the SSL context... and stuff breaks.

I googled around but couldn't find a good solution. I have a feeling that it's just not possible to escape the issue of "blocking recv/send syscall while holding the lock". @mkow @boryspoplawski @yamahata Maybe you know how to circumvent this?

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

Pal/lib/crypto/adapters/mbedtls_adapter.c, line 417 at r1 (raw file):

Previously, yamahata wrote…

pal_send_cb eventually calls blocking host system call?
it would cause troubles.
Anyway I don't know about MT-requirement for mbedtls. So I leave this comment non-blocking.

Yes, it will cause troubles (deadlocks in scenarios @lejunzhu mentioned)... I can't come up with a way to fix this.

lejunzhu · 2021-01-07T13:04:06Z

But this doesn't help with the SSL context... mbedTLS internally changes its own metadata before and after the recv() syscall/ocall. So there can be a situation when two threads try to recv on the same SSL context, and one unlocks the SSL context and blocks on recv(), and the other thread now can proceed, and observes inconsistent state of the SSL context... and stuff breaks.

I can't think why an application needs to use two recv() simultaneously. If it does happen, perhaps we can add a flag in ssl_ctx indicating the first mbedtls_ssl_read() is already running, and just return fail to the second one before it even enters mbedTLS.

boryspoplawski

I see one solution: create two TLS contexts, one for each communication direction. Then either:

recv changes only context internals, but does not send anything on the socket (like some metadata, idk)
recv also sends some data on the socket.
If it's 1) then we could wrap the very same fd into both context, otherwise (2) we need to create two separate connections to emulate each way of the original one.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

boryspoplawski

Btw I assumed mbedTLS does not give us any way to implement this (but I'm not that familiar with it, so this needs to be confirmed).

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

mkow

sounds like a risky assumption, I guess TLS may sometimes send some data when receiving, e.g. to rotate symmetric keys, in case of long-lasting connections.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

lejunzhu

Maybe non-blocking I/O can solve this. Roughly, calls to mbedtls_ssl_handshake(), mbedtls_ssl_read() and mbedtls_ssl_write() will look like this:

while (1) {
  lock();
  ret = mbedtls_ssl_...(); /* handshake/read/write */
  unlock();
  if (ret == MBEDTLS_ERR_SSL_WANT_WRITE) {
    ocall_poll(stream_fd, POLLOUT); /* wait until we can write */
    continue;
  } else if (ret == MBEDTLS_ERR_SSL_WANT_READ) {
    ocall_poll(stream_fd, POLLIN); /* wait until we can read */
    continue;
  } else {
    break; /* Go on and process the return value as usual */
  }
}

Threads will only block at ocall_poll(). All calls to mbedTLS API are serialized, therefore the internal state must be consistent.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

dimakuv

To be honest, I didn't understand Borys's idea. (As Michal mentioned, 1 is a risky assumption so we could go with 2, but I don't really understand 2. Sounds very complicated.)

I like @lejunzhu's workaround. @lejunzhu Do I understand correctly that we do this:

We always create non-blocking host stream_fd for TLS connections
If the user app wants non-blocking pipes/sockets, then nothing to be done additionally
But if the user app wants blocking pipes/sockets, then we additionally do ocall_poll() to wait for data to be available for read/write outside of the lock.

Is my understanding correct? From the first glance, this looks like a good workaround. @boryspoplawski @mkow Any drawbacks to this hack?

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

dimakuv

Also summoning @yamahata

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

lejunzhu · 2021-01-14T00:59:31Z

To be honest, I didn't understand Borys's idea. (As Michal mentioned, 1 is a risky assumption so we could go with 2, but I don't really understand 2. Sounds very complicated.)

I like @lejunzhu's workaround. @lejunzhu Do I understand correctly that we do this:

We always create non-blocking host stream_fd for TLS connections

If the user app wants non-blocking pipes/sockets, then nothing to be done additionally

But if the user app wants blocking pipes/sockets, then we additionally do ocall_poll() to wait for data to be available for read/write outside of the lock.

Is my understanding correct? From the first glance, this looks like a good workaround. @boryspoplawski @mkow Any drawbacks to this hack?

Yes, this is what I'm suggesting.

llly · 2021-03-12T06:40:31Z

I worked out a commit llly@e4dc52e based on this PR. It's similar to @lejunzhu 's idea,
The test program in this PR can work.
The test program in #2058 doesn't return errno13 anymore, but the counts are incorrect.

c1=99994, c2=100006, e13=0
c1=100009, c2=99991, e13=0

I don't know what caused this behavior.

dimakuv added 2 commits January 5, 2021 02:58

dimakuv commented Jan 5, 2021

View reviewed changes

dimakuv mentioned this pull request Jan 5, 2021

[Pal/Linux-SGX] Calling send() from multiple threads causes error in mbedTLS #2058

Open

yamahata reviewed Jan 7, 2021

View reviewed changes

dimakuv commented Jan 7, 2021

View reviewed changes

boryspoplawski reviewed Jan 9, 2021

View reviewed changes

mkow reviewed Jan 11, 2021

View reviewed changes

lejunzhu reviewed Jan 13, 2021

View reviewed changes

dimakuv commented Jan 13, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pal/lib] Add spinlocks to mbedTLS-specific SSL recv/send #2059

[Pal/lib] Add spinlocks to mbedTLS-specific SSL recv/send #2059

dimakuv commented Jan 5, 2021 •

edited by mkow

dimakuv left a comment

dimakuv commented Jan 5, 2021

lejunzhu commented Jan 6, 2021

dimakuv commented Jan 6, 2021

lejunzhu commented Jan 7, 2021

yamahata left a comment

dimakuv left a comment

lejunzhu commented Jan 7, 2021

boryspoplawski left a comment

boryspoplawski left a comment

mkow left a comment

lejunzhu left a comment

dimakuv left a comment

dimakuv left a comment

lejunzhu commented Jan 14, 2021

llly commented Mar 12, 2021

[Pal/lib] Add spinlocks to mbedTLS-specific SSL recv/send #2059

Are you sure you want to change the base?

[Pal/lib] Add spinlocks to mbedTLS-specific SSL recv/send #2059

Conversation

dimakuv commented Jan 5, 2021 • edited by mkow

Description of the changes

How to test this PR?

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv commented Jan 5, 2021

lejunzhu commented Jan 6, 2021

dimakuv commented Jan 6, 2021

lejunzhu commented Jan 7, 2021

yamahata left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

lejunzhu commented Jan 7, 2021

boryspoplawski left a comment

Choose a reason for hiding this comment

boryspoplawski left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

lejunzhu left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

lejunzhu commented Jan 14, 2021

llly commented Mar 12, 2021

dimakuv commented Jan 5, 2021 •

edited by mkow