Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1877

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented May 8, 2024

Description of the changes

Gramine uses the AESM daemon to request SGX Quotes from the Quoting Enclave. AESM runs in the background and may be restarted. For some reason, after a restart AESM "forgets" the platform's attestation key and requires an additional INIT_QUOTE_REQUEST before it is able to generate SGX Quotes again. This behavior is documented in Intel SGX PSW/DCAP software, but Gramine didn't implement this and thus failed with "AESM service returned error 42".

This PR complies with the SGX PSW/DCAP documentation and adds the send of INIT_QUOTE_REQUEST + retry of SGX quote retrieval.

For documentation, see e.g. this comment: https://github.com/intel/linux-sgx/blob/80a6625c497056c43e78d993e414ca99a9efed5c/common/inc/sgx_uae_quote_ex.h#L139-L140

How to test this PR?

This was detected by a customer.

A simple reproducer is like this:

diff --git a/CI-Examples/python/scripts/sgx-quote.py b/CI-Examples/python/scripts/sgx-quote.py
index 8a79b4f1..3c979553 100644
--- a/CI-Examples/python/scripts/sgx-quote.py
+++ b/CI-Examples/python/scripts/sgx-quote.py
@@ -5,27 +5,32 @@

 import os
 import sys
+import time

-if not os.path.exists("/dev/attestation/quote"):
-    print("Cannot find `/dev/attestation/quote`; "
-          "are you running under SGX, with remote attestation enabled?")
-    sys.exit(1)
-
-with open('/dev/attestation/attestation_type') as f:
-    print(f"Detected attestation type: {f.read()}")
-
-with open("/dev/attestation/user_report_data", "wb") as f:
-    f.write(b'\0'*64)
-
-with open("/dev/attestation/quote", "rb") as f:
-    quote = f.read()
-
-print(f"Extracted SGX quote with size = {len(quote)} and the following fields:")
-print(f"  ATTRIBUTES.FLAGS: {quote[96:104].hex()}  [ Debug bit: {quote[96] & 2 > 0} ]")
-print(f"  ATTRIBUTES.XFRM:  {quote[104:112].hex()}")
-print(f"  MRENCLAVE:        {quote[112:144].hex()}")
-print(f"  MRSIGNER:         {quote[176:208].hex()}")
-print(f"  ISVPRODID:        {quote[304:306].hex()}")
-print(f"  ISVSVN:           {quote[306:308].hex()}")
-print(f"  REPORTDATA:       {quote[368:400].hex()}")
-print(f"                    {quote[400:432].hex()}")
+while True:
+    if not os.path.exists("/dev/attestation/quote"):
+        print("Cannot find `/dev/attestation/quote`; "
+                "are you running under SGX, with remote attestation enabled?")
+        sys.exit(1)
+
+    with open('/dev/attestation/attestation_type') as f:
+        print(f"Detected attestation type: {f.read()}")
+
+    with open("/dev/attestation/user_report_data", "wb") as f:
+        f.write(b'\0'*64)
+
+    with open("/dev/attestation/quote", "rb") as f:
+        quote = f.read()
+
+    print(f"Extracted SGX quote with size = {len(quote)} and the following fields:")
+    print(f"  ATTRIBUTES.FLAGS: {quote[96:104].hex()}  [ Debug bit: {quote[96] & 2 > 0} ]")
+    print(f"  ATTRIBUTES.XFRM:  {quote[104:112].hex()}")
+    print(f"  MRENCLAVE:        {quote[112:144].hex()}")
+    print(f"  MRSIGNER:         {quote[176:208].hex()}")
+    print(f"  ISVPRODID:        {quote[304:306].hex()}")
+    print(f"  ISVSVN:           {quote[306:308].hex()}")
+    print(f"  REPORTDATA:       {quote[368:400].hex()}")
+    print(f"                    {quote[400:432].hex()}")
+
+    sys.stdout.flush()
+    time.sleep(5)

Apply the above patch, run it in one terminal, and in another terminal do sudo systemctl restart aesmd. Without this PR, Gramine will fail. With this PR, Gramine will continue execution of the endless loop.


This change is Reviewable

Gramine uses the AESM daemon to request SGX Quotes from the Quoting
Enclave. AESM runs in the background and may be restarted. For some
reason, after a restart AESM "forgets" the platform's attestation key
and requires an additional INIT_QUOTE_REQUEST before it is able to
generate SGX Quotes again. This behavior is documented in Intel SGX
PSW/DCAP software, but Gramine didn't implement this and thus failed
with "AESM service returned error 42".

This commit complies with the SGX PSW/DCAP documentation and adds the
send of INIT_QUOTE_REQUEST + retry of SGX quote retrieval.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. I would be a bit easier to test if you just pasted the new version of sgx-quote.py as is, instead of the diff - it replaces the whole script anyways ;)

Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: 1 of 2 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):

Apply the above patch, run it in one terminal, and in another terminal do sudo systemctl restart aesmd. Without this PR, Gramine will fail.

It doesn't fail for me on a client CPU with EPID attestation. What's the exact setup to reproduce this?



pal/src/host/linux-sgx/host_ocalls.c line 725 at r1 (raw file):

                         &ocall_quote_args->nonce, &ocall_quote_args->quote,
                         &ocall_quote_args->quote_len);
    if (ret == -EAGAIN) {

I'd prefer a do-while instead of this copy-paste. Also, I think it should actually be a loop, not just a single retry (in case we race with another restart).


pal/src/host/linux-sgx/host_platform.c line 22 at r1 (raw file):

/* retrieving the SGX quote may return error 42, which means that the attestation key is not
 * available and AESM service must re-generate the key; when Gramine sees such error, it performs a
 * dummy INIT_QUOTE_REQUEST and then re-tries retrieving the SGX quote */

I'd say that this explanation belongs to the function retrying the get-quote request, not to the constant.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @mkow)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Apply the above patch, run it in one terminal, and in another terminal do sudo systemctl restart aesmd. Without this PR, Gramine will fail.

It doesn't fail for me on a client CPU with EPID attestation. What's the exact setup to reproduce this?

Yes, the issue was originally reported on a DCAP setup.

I think this specific error (AESM_ATT_KEY_NOT_INITIALIZED) and the issue should only be seen on DCAP where an attestation key re-generation in QE is involved. While w/ EPID, the attestation key is actually provisioned to and sealed by QVE and later unsealed by QE (during init and get quote) where such keygen process is irrelevant.



pal/src/host/linux-sgx/host_ocalls.c line 725 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd prefer a do-while instead of this copy-paste. Also, I think it should actually be a loop, not just a single retry (in case we race with another restart).

+1, I think better with a retry limit


pal/src/host/linux-sgx/host_platform.c line 305 at r1 (raw file):

        Response__GetQuoteResponse* r = res->getquoteres;
        if (r->errorcode == AESM_ATT_KEY_NOT_INITIALIZED) {

I think the PR does not apply to EPID, pls see the discussions above.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @kailun-qin, and @mkow)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

Yes, the issue was originally reported on a DCAP setup.

I think this specific error (AESM_ATT_KEY_NOT_INITIALIZED) and the issue should only be seen on DCAP where an attestation key re-generation in QE is involved. While w/ EPID, the attestation key is actually provisioned to and sealed by QVE and later unsealed by QE (during init and get quote) where such keygen process is irrelevant.

Yes, this was reported with DCAP, and I also reproduced on a DCAP machine. I didn't have an EPID machine around, so I just assumed it has the same issue.

I reverted the changes to the EPID paths now. Thanks Kailun for explanations.



-- commits line 10 at r1:
Add somewhere here that this doesn't concern EPID attestation. Only DCAP.


pal/src/host/linux-sgx/host_ocalls.c line 725 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

+1, I think better with a retry limit

Done.


pal/src/host/linux-sgx/host_platform.c line 22 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd say that this explanation belongs to the function retrying the get-quote request, not to the constant.

Done.


pal/src/host/linux-sgx/host_platform.c line 305 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I think the PR does not apply to EPID, pls see the discussions above.

Done.

Copy link
Contributor

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @kailun-qin, and @mkow)


pal/src/host/linux-sgx/host_ocalls.c line 736 at r2 (raw file):

         * a dummy INIT_QUOTE_REQUEST and then re-try retrieving the SGX quote.
         */
        sgx_target_info_t dummy_qe_targetinfo = {0};

New qe_targetinfo should not by dummy.
QE can be upgraded after aesm_service restart.
Must store it tog_pal_linuxsgx_state.qe_targetinfo.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)


pal/src/host/linux-sgx/host_ocalls.c line 736 at r2 (raw file):

Previously, llly (Li Xun) wrote…

New qe_targetinfo should not by dummy.
QE can be upgraded after aesm_service restart.
Must store it tog_pal_linuxsgx_state.qe_targetinfo.

I tried to implement this in r3 of this PR, but I now realize that if the flow is more complicated and my current PR still fails if QE Target Info was modified:

  1. SGX Report must be generated inside the SGX enclave from the QE Target Info (in trusted PAL).
  2. This SGX Report is put inside the GET_QUOTE request to AESM (in untrusted PAL).
  3. AESM calls into Quoting Enclave (QE).
  4. QE checks that QE Target Info inside the GET_QUOTE's SGX Report corresponds to its own Target Info.
  5. Only if Target Infos match, then QE generates the SGX Quote.
  6. AESM returns the SGX Quote to Gramine's untrusted PAL, or (if AESM restarted) returns an error 42.
  7. If Gramine's untrusted PAL gets error 42, Gramine must get the new QE Target Info (in untrusted PAL, by talking to AESM), then jump into the SGX enclave (in trusted PAL) and restart the whole sequence from step 1.

The problem with my current PR is that we retry the loop only in untrusted PAL (basically in step 2). In fact, we must wrap the whole sequence of steps 1-7 into a while loop.

So my "hack" of ocall_get_quote() returning a new QE Target Info happens too late -- we may get into a situation when AESM updated QE Target Info, so it won't generate a new SGX Quote but return some error "hey, Gramine, you sent me an old QE Target Info, I don't trust you".

@mkow @kailun-qin @llly My current idea is to introduce a new OCALL like ocall_update_qe_targetinfo() which will be called if ocall_get_quote() returns -EAGAIN (corresponds to AESM error 42). Does this sounds reasonable to you?

Copy link
Contributor

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 5 files at r3, all commit messages.
Reviewable status: 3 of 6 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @kailun-qin, and @mkow)


pal/src/host/linux-sgx/enclave_platform.c line 14 at r3 (raw file):

    /* must align all arguments to sgx_report() so that EREPORT doesn't complain */
    __sgx_mem_aligned sgx_report_t report;
    __sgx_mem_aligned sgx_target_info_t targetinfo = g_pal_linuxsgx_state.qe_targetinfo;

Mark


pal/src/host/linux-sgx/host_ocalls.c line 736 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I tried to implement this in r3 of this PR, but I now realize that if the flow is more complicated and my current PR still fails if QE Target Info was modified:

  1. SGX Report must be generated inside the SGX enclave from the QE Target Info (in trusted PAL).
  2. This SGX Report is put inside the GET_QUOTE request to AESM (in untrusted PAL).
  3. AESM calls into Quoting Enclave (QE).
  4. QE checks that QE Target Info inside the GET_QUOTE's SGX Report corresponds to its own Target Info.
  5. Only if Target Infos match, then QE generates the SGX Quote.
  6. AESM returns the SGX Quote to Gramine's untrusted PAL, or (if AESM restarted) returns an error 42.
  7. If Gramine's untrusted PAL gets error 42, Gramine must get the new QE Target Info (in untrusted PAL, by talking to AESM), then jump into the SGX enclave (in trusted PAL) and restart the whole sequence from step 1.

The problem with my current PR is that we retry the loop only in untrusted PAL (basically in step 2). In fact, we must wrap the whole sequence of steps 1-7 into a while loop.

So my "hack" of ocall_get_quote() returning a new QE Target Info happens too late -- we may get into a situation when AESM updated QE Target Info, so it won't generate a new SGX Quote but return some error "hey, Gramine, you sent me an old QE Target Info, I don't trust you".

@mkow @kailun-qin @llly My current idea is to introduce a new OCALL like ocall_update_qe_targetinfo() which will be called if ocall_get_quote() returns -EAGAIN (corresponds to AESM error 42). Does this sounds reasonable to you?

Reasonable for me.
I add a "Mark" comment in enclave_platform.c. It uses g_pal_linux.sgx_state.qe_targetinfo to create a report.
I think you can add a loop there that if ocall_get_quote returns 42, callocall_update_qe_targetinfo() then go back to create a new report and call ocall_get_quote again until success.

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)


-- commits line 10 at r1:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Add somewhere here that this doesn't concern EPID attestation. Only DCAP.

Done in #1881


pal/src/host/linux-sgx/host_ocalls.c line 736 at r2 (raw file):

Previously, llly (Li Xun) wrote…

Reasonable for me.
I add a "Mark" comment in enclave_platform.c. It uses g_pal_linux.sgx_state.qe_targetinfo to create a report.
I think you can add a loop there that if ocall_get_quote returns 42, callocall_update_qe_targetinfo() then go back to create a new report and call ocall_get_quote again until success.

Thank you @llly !

I re-created this patch as #1881 (as the implementation drastically changed). I'm blocking this PR, and it will be closed automatically after #1881 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants