Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[Pal/lib] Add spinlocks to mbedTLS-specific SSL recv/send #2059

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented Jan 5, 2021

Description of the changes

Linux-SGX PAL wraps all pipe/UNIX domain socket communication in TLS sessions. Previously, Graphene assumed that only one thread at a time accesses one TLS session (i.e., no multi-threading support). This commit adds spinlocks to lib_SSL* functions to support such (rare) multi-threading scenarios.

Note that we cannot rely on pthreads and/or mutexes so we use simple spinlocks. Because this is in the "common library" used by PAL and LibOS, with no "complex" dependencies allowed.

Also, there is a prerequisite (first) commit in this PR: remove api.h from spinlock.h. See commit messages for explanations.

Fixes #2058.

How to test this PR?

This PR adds a corresponding LibOS test pipe_multithread (kudos to @lejunzhu for the initial version of this test and debugging this).


This change is Reviewable

The common header "spinlock.h" contains spinlock implementation. It
may be used in LibOS, PAL, and other parts of Graphene. Previously,
it included "api.h" which is PAL-specific. This prevented using
"spinlock.h" in PAL-agnostic parts of Graphene such as mbedTLS
crypto adapters. This commit removes this dependency and fixes
a few emerged include-related bugs.
Linux-SGX PAL wraps all pipe/UNIX domain socket communication in
TLS sessions. Previously, Graphene assumed that only one thread
at a time accesses one TLS session (i.e., no multi-threading
support). This commit adds spinlocks to `lib_SSL*` functions to
support such (rare) multi-threading scenarios. Note that we cannot
rely on pthreads and/or mutexes so we use simple spinlocks.

This commit adds a corresponding LibOS test.
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


LibOS/shim/test/regression/Makefile, line 173 at r1 (raw file):

CFLAGS-sigaction_per_process += -pthread
CFLAGS-signal_multithread += -pthread
CFLAGS-pthread_set_get_affinity += -pthread

FYI: This wasn't sorted, so I just added sorting.


Pal/lib/crypto/adapters/mbedtls_adapter.c, line 388 at r1 (raw file):

     *       recv() and the other will spin and burn CPU cycles. We consider "shared SSL context"
     *       a rare case and use simple spinlocks instead of mutexes. */
    assert(spinlock_is_locked(&ssl_ctx->lock));

FYI: I consider the performance overhead of spinlocks here minimal. We never hit multi-threaded pipes until #2058 (actually, I know that Erlang does the same), so most apps won't notice this. And the apps that do use this multi-threaded pattern... well, I just hope it's not on critical path because these are just pipes for intra-process communication.


Pal/src/pal_internal.h, line 12 at r1 (raw file):

#include <stdarg.h>
#include <sys/types.h>

FYI: That's because one function here uses ssize_t type.

@dimakuv
Copy link
Contributor Author

dimakuv commented Jan 5, 2021

Jenkins, retest Jenkins-Debug please (LTP test alarm03 failed, cannot reproduce locally and generally unrelated)

@lejunzhu
Copy link
Contributor

lejunzhu commented Jan 6, 2021

I think protecting mbedtls_ssl_read and mbedtls_ssl_write with the same lock may cause other problems. Read usually blocks for a long time. If the application is trying to send when recv'ing in another thread, it will cause a deadlock.

For example, this test C program runs fine under Linux and Graphene SGX before this PR, but with this PR it will hang.
test.txt

I don't know a real world example like this, but I guess it is possible.

Looking at mbedTLS code, read and write does not seem to cause conflict, except when handshake or re-negotiation is pending. And I can't think why an application will call recv() from two threads. Therefore we can probably call mbedtls_ssl_read without a lock.

@dimakuv
Copy link
Contributor Author

dimakuv commented Jan 6, 2021

I don't know a real world example like this, but I guess it is possible.

The attached test program reads and writes on the same file descriptor from two threads. This looks like an unrealistic scenario. Typically applications create a socket pair or a pipe pair and use one end for recv and another end for send.

...And I can't think why an application will call recv() from two threads. Therefore we can probably call mbedtls_ssl_read without a lock.

I definitely don't like the idea of having mbedtls_ssl_read() not protected with a lock. What happens if we do mbedtls_ssl_read(my_ssl_ctx) and a simultaneous mbedlts_ssl_write(my_ssl_ctx)? Even though the write is protected with a lock, it modifies my_ssl_ctx. If the read looks at my_ssl_ctx simultaneously, then it may observe inconsistent state of the SSL context, and all hell breaks loose. It's better to deadlock in such situations than to have memory/SSL state corruptions.

@lejunzhu
Copy link
Contributor

lejunzhu commented Jan 7, 2021

It seems that calling fork() after a thread have started recv() will also cause deadlock. See:
test.txt

Perhaps we can drop the lock before ocall_recv, then re-acquire it after it returns? This should be fine as long as re-negotiation is disabled.

Or, at least we can use spinlock with a timeout and print some warning message, so users can see and report the problem when it happens.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


Pal/lib/crypto/adapters/mbedtls_adapter.c, line 417 at r1 (raw file):

     *       a rare case and use simple spinlocks instead of mutexes. */
    assert(spinlock_is_locked(&ssl_ctx->lock));
    ssize_t ret = ssl_ctx->pal_send_cb(fd, buf, buf_size);

pal_send_cb eventually calls blocking host system call?
it would cause troubles.
Anyway I don't know about MT-requirement for mbedtls. So I leave this comment non-blocking.

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can drop the lock before ocall_recv, then re-acquire it after it returns? This should be fine as long as re-negotiation is disabled.

But this doesn't help with the SSL context... mbedTLS internally changes its own metadata before and after the recv() syscall/ocall. So there can be a situation when two threads try to recv on the same SSL context, and one unlocks the SSL context and blocks on recv(), and the other thread now can proceed, and observes inconsistent state of the SSL context... and stuff breaks.

I googled around but couldn't find a good solution. I have a feeling that it's just not possible to escape the issue of "blocking recv/send syscall while holding the lock". @mkow @boryspoplawski @yamahata Maybe you know how to circumvent this?

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)


Pal/lib/crypto/adapters/mbedtls_adapter.c, line 417 at r1 (raw file):

Previously, yamahata wrote…

pal_send_cb eventually calls blocking host system call?
it would cause troubles.
Anyway I don't know about MT-requirement for mbedtls. So I leave this comment non-blocking.

Yes, it will cause troubles (deadlocks in scenarios @lejunzhu mentioned)... I can't come up with a way to fix this.

@lejunzhu
Copy link
Contributor

lejunzhu commented Jan 7, 2021

But this doesn't help with the SSL context... mbedTLS internally changes its own metadata before and after the recv() syscall/ocall. So there can be a situation when two threads try to recv on the same SSL context, and one unlocks the SSL context and blocks on recv(), and the other thread now can proceed, and observes inconsistent state of the SSL context... and stuff breaks.

I can't think why an application needs to use two recv() simultaneously. If it does happen, perhaps we can add a flag in ssl_ctx indicating the first mbedtls_ssl_read() is already running, and just return fail to the second one before it even enters mbedTLS.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one solution: create two TLS contexts, one for each communication direction. Then either:

  1. recv changes only context internals, but does not send anything on the socket (like some metadata, idk)
  2. recv also sends some data on the socket.
    If it's 1) then we could wrap the very same fd into both context, otherwise (2) we need to create two separate connections to emulate each way of the original one.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I assumed mbedTLS does not give us any way to implement this (but I'm not that familiar with it, so this needs to be confirmed).

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. sounds like a risky assumption, I guess TLS may sometimes send some data when receiving, e.g. to rotate symmetric keys, in case of long-lasting connections.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

Copy link
Contributor

@lejunzhu lejunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe non-blocking I/O can solve this. Roughly, calls to mbedtls_ssl_handshake(), mbedtls_ssl_read() and mbedtls_ssl_write() will look like this:

while (1) {
  lock();
  ret = mbedtls_ssl_...(); /* handshake/read/write */
  unlock();
  if (ret == MBEDTLS_ERR_SSL_WANT_WRITE) {
    ocall_poll(stream_fd, POLLOUT); /* wait until we can write */
    continue;
  } else if (ret == MBEDTLS_ERR_SSL_WANT_READ) {
    ocall_poll(stream_fd, POLLIN); /* wait until we can read */
    continue;
  } else {
    break; /* Go on and process the return value as usual */
  }
}

Threads will only block at ocall_poll(). All calls to mbedTLS API are serialized, therefore the internal state must be consistent.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I didn't understand Borys's idea. (As Michal mentioned, 1 is a risky assumption so we could go with 2, but I don't really understand 2. Sounds very complicated.)

I like @lejunzhu's workaround. @lejunzhu Do I understand correctly that we do this:

  1. We always create non-blocking host stream_fd for TLS connections
  2. If the user app wants non-blocking pipes/sockets, then nothing to be done additionally
  3. But if the user app wants blocking pipes/sockets, then we additionally do ocall_poll() to wait for data to be available for read/write outside of the lock.

Is my understanding correct? From the first glance, this looks like a good workaround. @boryspoplawski @mkow Any drawbacks to this hack?

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also summoning @yamahata

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @lejunzhu)

@lejunzhu
Copy link
Contributor

To be honest, I didn't understand Borys's idea. (As Michal mentioned, 1 is a risky assumption so we could go with 2, but I don't really understand 2. Sounds very complicated.)

I like @lejunzhu's workaround. @lejunzhu Do I understand correctly that we do this:

  1. We always create non-blocking host stream_fd for TLS connections
  2. If the user app wants non-blocking pipes/sockets, then nothing to be done additionally
  3. But if the user app wants blocking pipes/sockets, then we additionally do ocall_poll() to wait for data to be available for read/write outside of the lock.

Is my understanding correct? From the first glance, this looks like a good workaround. @boryspoplawski @mkow Any drawbacks to this hack?

Yes, this is what I'm suggesting.

@llly
Copy link
Contributor

llly commented Mar 12, 2021

I worked out a commit llly@e4dc52e based on this PR. It's similar to @lejunzhu 's idea,
The test program in this PR can work.
The test program in #2058 doesn't return errno13 anymore, but the counts are incorrect.

c1=99994, c2=100006, e13=0
c1=100009, c2=99991, e13=0

I don't know what caused this behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Pal/Linux-SGX] Calling send() from multiple threads causes error in mbedTLS
6 participants