[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1881

dimakuv · 2024-05-13T09:04:15Z

Description of the changes

Gramine uses the AESM daemon to request SGX Quotes from the Quoting Enclave. AESM runs in the background and may be restarted. For some reason, after a restart AESM "forgets" the platform's attestation key and requires an additional INIT_QUOTE_REQUEST before it is able to generate SGX Quotes again. Moreover, the Quoting Enclave's Target Info may change at this point, so Gramine must update its cached QE Target Info. This behavior is applicable only to DCAP attestation and is documented in Intel SGX PSW/DCAP software. Gramine didn't implement this behavior and thus failed with "AESM service returned error 42".

This PR complies with the SGX PSW/DCAP documentation and adds the send of INIT_QUOTE_REQUEST + update of QE Target Info + retry of SGX quote retrieval.

This is a re-creation of #1877, with a better design. Closes #1877.

How to test this PR?

Change CI-Examples/python/scripts/sgx-quote.py to this:

import os
import sys
import time

while True:
    if not os.path.exists("/dev/attestation/quote"):
        print("Cannot find `/dev/attestation/quote`; "
                "are you running under SGX, with remote attestation enabled?")
        sys.exit(1)

    with open('/dev/attestation/attestation_type') as f:
        print(f"Detected attestation type: {f.read()}")

    with open("/dev/attestation/user_report_data", "wb") as f:
        f.write(b'\0'*64)

    with open("/dev/attestation/quote", "rb") as f:
        quote = f.read()

    print(f"Extracted SGX quote with size = {len(quote)} and the following fields:")
    print(f"  ATTRIBUTES.FLAGS: {quote[96:104].hex()}  [ Debug bit: {quote[96] & 2 > 0} ]")
    print(f"  ATTRIBUTES.XFRM:  {quote[104:112].hex()}")
    print(f"  MRENCLAVE:        {quote[112:144].hex()}")
    print(f"  MRSIGNER:         {quote[176:208].hex()}")
    print(f"  ISVPRODID:        {quote[304:306].hex()}")
    print(f"  ISVSVN:           {quote[306:308].hex()}")
    print(f"  REPORTDATA:       {quote[368:400].hex()}")
    print(f"                    {quote[400:432].hex()}")

    sys.stdout.flush()
    time.sleep(5)

Apply the above Python script, run it in one terminal, and in another terminal do sudo systemctl restart aesmd. Without this PR, Gramine will fail. With this PR, Gramine will continue execution of the endless loop.

This change is

Gramine uses the AESM daemon to request SGX Quotes from the Quoting Enclave. AESM runs in the background and may be restarted. For some reason, after a restart AESM "forgets" the platform's attestation key and requires an additional INIT_QUOTE_REQUEST before it is able to generate SGX Quotes again. Moreover, the Quoting Enclave's Target Info may change at this point, so Gramine must update its cached QE Target Info. This behavior is applicable only to DCAP attestation and is documented in Intel SGX PSW/DCAP software. Gramine didn't implement this behavior and thus failed with "AESM service returned error 42". This commit complies with the SGX PSW/DCAP documentation and adds the send of INIT_QUOTE_REQUEST + update of QE Target Info + retry of SGX quote retrieval. Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>

llly

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

mkow

Reviewed 5 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

-- commits line 2 at r1:

Suggestion:

Re-init attestation key after an AESM restart

-- commits line 15 at r1:
ditto later in this sentence

Suggestion:

sending

pal/src/host/linux-sgx/enclave_ocalls.h line 136 at r1 (raw file):

 *
 * \param      is_epid            True if EPID is used, false if DCAP/ECDSA is used.
 * \param[out] qe_targetinfo      Retrieved Quoting Enclave's target info; buffer must be submitted

I'd drop this note altogether, isn't it obvious? It's a single-level pointer, so there's no way for the function to allocate a buffer by itself and return it.

Suggestion:

buffer must be provided

pal/src/host/linux-sgx/enclave_ocalls.h line 141 at r1 (raw file):

 * \returns 0 on success, negative Linux error code otherwise.
 *
 * The obtained QE Target Info is not validated in any way (i.e., this function does not check

Suggestion:

this function does not check whether

pal/src/host/linux-sgx/enclave_ocalls.h line 144 at r1 (raw file):

 * Target Info contents make sense).
 */
int ocall_get_qe_targetinfo(bool is_epid, sgx_target_info_t* qe_targetinfo);

Double checking if I understand the security model here correctly: qe_targetinfo is completely untrusted, and thus we can't assume that we're talking directly to a real QE (we can be talking to a fake one, or be MitM-ed).
This is not a problem, because only the authentic QE has access to the keys to generate a valid quote (so, the fake QE can't), and our report targets whatever we got in qe_targetinfo and the MitMer can't re-encrypt it and then send it to a real QE (due to how EREPORT works).
Is this right?

pal/src/host/linux-sgx/enclave_ocalls.c line 2199 at r1 (raw file):

    if (!ocall_qe_ti_args) {
        sgx_reset_ustack(old_ustack);
        return -ENOMEM;

Almost all other places return -EPERM in this case.

pal/src/host/linux-sgx/enclave_platform.c line 8 at r1 (raw file):

/* this function assumes proper synchronization by callers, such that
 * g_pal_linuxsgx_state.qe_targetinfo is not racey */

This should be documented in the header, not here. Also, it's not only about calling this function multiple times concurrently, but also about anything else which could be reading g_pal_linuxsgx_state.qe_targetinfo at the same time.

pal/src/host/linux-sgx/enclave_platform.c line 8 at r1 (raw file):

/* this function assumes proper synchronization by callers, such that
 * g_pal_linuxsgx_state.qe_targetinfo is not racey */

Do we actually have that synchronization in place? Also, it's quite a surprising/risky interface. Why not just use a lock?

dimakuv mentioned this pull request May 13, 2024

[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1877

Open

llly reviewed May 14, 2024

View reviewed changes

mkow requested changes May 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1881

[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1881

dimakuv commented May 13, 2024 •

edited by mkow

llly left a comment

mkow left a comment

[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1881

Are you sure you want to change the base?

[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1881

Conversation

dimakuv commented May 13, 2024 • edited by mkow

Description of the changes

How to test this PR?

llly left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

dimakuv commented May 13, 2024 •

edited by mkow