-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PAL/Linux-SGX] Re-init attestation key after AESM restarted #1877
base: master
Are you sure you want to change the base?
Conversation
Gramine uses the AESM daemon to request SGX Quotes from the Quoting Enclave. AESM runs in the background and may be restarted. For some reason, after a restart AESM "forgets" the platform's attestation key and requires an additional INIT_QUOTE_REQUEST before it is able to generate SGX Quotes again. This behavior is documented in Intel SGX PSW/DCAP software, but Gramine didn't implement this and thus failed with "AESM service returned error 42". This commit complies with the SGX PSW/DCAP documentation and adds the send of INIT_QUOTE_REQUEST + retry of SGX quote retrieval. Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. I would be a bit easier to test if you just pasted the new version of sgx-quote.py
as is, instead of the diff - it replaces the whole script anyways ;)
Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: 1 of 2 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
a discussion (no related file):
Apply the above patch, run it in one terminal, and in another terminal do
sudo systemctl restart aesmd
. Without this PR, Gramine will fail.
It doesn't fail for me on a client CPU with EPID attestation. What's the exact setup to reproduce this?
pal/src/host/linux-sgx/host_ocalls.c
line 725 at r1 (raw file):
&ocall_quote_args->nonce, &ocall_quote_args->quote, &ocall_quote_args->quote_len); if (ret == -EAGAIN) {
I'd prefer a do-while instead of this copy-paste. Also, I think it should actually be a loop, not just a single retry (in case we race with another restart).
pal/src/host/linux-sgx/host_platform.c
line 22 at r1 (raw file):
/* retrieving the SGX quote may return error 42, which means that the attestation key is not * available and AESM service must re-generate the key; when Gramine sees such error, it performs a * dummy INIT_QUOTE_REQUEST and then re-tries retrieving the SGX quote */
I'd say that this explanation belongs to the function retrying the get-quote request, not to the constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @mkow)
a discussion (no related file):
Previously, mkow (Michał Kowalczyk) wrote…
Apply the above patch, run it in one terminal, and in another terminal do
sudo systemctl restart aesmd
. Without this PR, Gramine will fail.It doesn't fail for me on a client CPU with EPID attestation. What's the exact setup to reproduce this?
Yes, the issue was originally reported on a DCAP setup.
I think this specific error (AESM_ATT_KEY_NOT_INITIALIZED
) and the issue should only be seen on DCAP where an attestation key re-generation in QE is involved. While w/ EPID, the attestation key is actually provisioned to and sealed by QVE and later unsealed by QE (during init and get quote) where such keygen process is irrelevant.
pal/src/host/linux-sgx/host_ocalls.c
line 725 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I'd prefer a do-while instead of this copy-paste. Also, I think it should actually be a loop, not just a single retry (in case we race with another restart).
+1, I think better with a retry limit
pal/src/host/linux-sgx/host_platform.c
line 305 at r1 (raw file):
Response__GetQuoteResponse* r = res->getquoteres; if (r->errorcode == AESM_ATT_KEY_NOT_INITIALIZED) {
I think the PR does not apply to EPID, pls see the discussions above.
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @kailun-qin, and @mkow)
a discussion (no related file):
Previously, kailun-qin (Kailun Qin) wrote…
Yes, the issue was originally reported on a DCAP setup.
I think this specific error (
AESM_ATT_KEY_NOT_INITIALIZED
) and the issue should only be seen on DCAP where an attestation key re-generation in QE is involved. While w/ EPID, the attestation key is actually provisioned to and sealed by QVE and later unsealed by QE (during init and get quote) where such keygen process is irrelevant.
Yes, this was reported with DCAP, and I also reproduced on a DCAP machine. I didn't have an EPID machine around, so I just assumed it has the same issue.
I reverted the changes to the EPID paths now. Thanks Kailun for explanations.
-- commits
line 10 at r1:
Add somewhere here that this doesn't concern EPID attestation. Only DCAP.
pal/src/host/linux-sgx/host_ocalls.c
line 725 at r1 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
+1, I think better with a retry limit
Done.
pal/src/host/linux-sgx/host_platform.c
line 22 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I'd say that this explanation belongs to the function retrying the get-quote request, not to the constant.
Done.
pal/src/host/linux-sgx/host_platform.c
line 305 at r1 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I think the PR does not apply to EPID, pls see the discussions above.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @kailun-qin, and @mkow)
pal/src/host/linux-sgx/host_ocalls.c
line 736 at r2 (raw file):
* a dummy INIT_QUOTE_REQUEST and then re-try retrieving the SGX quote. */ sgx_target_info_t dummy_qe_targetinfo = {0};
New qe_targetinfo
should not by dummy.
QE can be upgraded after aesm_service restart.
Must store it tog_pal_linuxsgx_state.qe_targetinfo
.
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
pal/src/host/linux-sgx/host_ocalls.c
line 736 at r2 (raw file):
Previously, llly (Li Xun) wrote…
New
qe_targetinfo
should not by dummy.
QE can be upgraded after aesm_service restart.
Must store it tog_pal_linuxsgx_state.qe_targetinfo
.
I tried to implement this in r3 of this PR, but I now realize that if the flow is more complicated and my current PR still fails if QE Target Info was modified:
- SGX Report must be generated inside the SGX enclave from the QE Target Info (in trusted PAL).
- This SGX Report is put inside the GET_QUOTE request to AESM (in untrusted PAL).
- AESM calls into Quoting Enclave (QE).
- QE checks that QE Target Info inside the GET_QUOTE's SGX Report corresponds to its own Target Info.
- Only if Target Infos match, then QE generates the SGX Quote.
- AESM returns the SGX Quote to Gramine's untrusted PAL, or (if AESM restarted) returns an error 42.
- If Gramine's untrusted PAL gets error 42, Gramine must get the new QE Target Info (in untrusted PAL, by talking to AESM), then jump into the SGX enclave (in trusted PAL) and restart the whole sequence from step 1.
The problem with my current PR is that we retry the loop only in untrusted PAL (basically in step 2). In fact, we must wrap the whole sequence of steps 1-7 into a while loop.
So my "hack" of ocall_get_quote()
returning a new QE Target Info happens too late -- we may get into a situation when AESM updated QE Target Info, so it won't generate a new SGX Quote but return some error "hey, Gramine, you sent me an old QE Target Info, I don't trust you".
@mkow @kailun-qin @llly My current idea is to introduce a new OCALL like ocall_update_qe_targetinfo()
which will be called if ocall_get_quote()
returns -EAGAIN (corresponds to AESM error 42). Does this sounds reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r3, all commit messages.
Reviewable status: 3 of 6 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @kailun-qin, and @mkow)
pal/src/host/linux-sgx/enclave_platform.c
line 14 at r3 (raw file):
/* must align all arguments to sgx_report() so that EREPORT doesn't complain */ __sgx_mem_aligned sgx_report_t report; __sgx_mem_aligned sgx_target_info_t targetinfo = g_pal_linuxsgx_state.qe_targetinfo;
Mark
pal/src/host/linux-sgx/host_ocalls.c
line 736 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I tried to implement this in r3 of this PR, but I now realize that if the flow is more complicated and my current PR still fails if QE Target Info was modified:
- SGX Report must be generated inside the SGX enclave from the QE Target Info (in trusted PAL).
- This SGX Report is put inside the GET_QUOTE request to AESM (in untrusted PAL).
- AESM calls into Quoting Enclave (QE).
- QE checks that QE Target Info inside the GET_QUOTE's SGX Report corresponds to its own Target Info.
- Only if Target Infos match, then QE generates the SGX Quote.
- AESM returns the SGX Quote to Gramine's untrusted PAL, or (if AESM restarted) returns an error 42.
- If Gramine's untrusted PAL gets error 42, Gramine must get the new QE Target Info (in untrusted PAL, by talking to AESM), then jump into the SGX enclave (in trusted PAL) and restart the whole sequence from step 1.
The problem with my current PR is that we retry the loop only in untrusted PAL (basically in step 2). In fact, we must wrap the whole sequence of steps 1-7 into a while loop.
So my "hack" of
ocall_get_quote()
returning a new QE Target Info happens too late -- we may get into a situation when AESM updated QE Target Info, so it won't generate a new SGX Quote but return some error "hey, Gramine, you sent me an old QE Target Info, I don't trust you".@mkow @kailun-qin @llly My current idea is to introduce a new OCALL like
ocall_update_qe_targetinfo()
which will be called ifocall_get_quote()
returns -EAGAIN (corresponds to AESM error 42). Does this sounds reasonable to you?
Reasonable for me.
I add a "Mark" comment in enclave_platform.c
. It uses g_pal_linux.sgx_state.qe_targetinfo
to create a report.
I think you can add a loop there that if ocall_get_quote
returns 42, callocall_update_qe_targetinfo()
then go back to create a new report and call ocall_get_quote
again until success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Add somewhere here that this doesn't concern EPID attestation. Only DCAP.
Done in #1881
pal/src/host/linux-sgx/host_ocalls.c
line 736 at r2 (raw file):
Previously, llly (Li Xun) wrote…
Reasonable for me.
I add a "Mark" comment inenclave_platform.c
. It usesg_pal_linux.sgx_state.qe_targetinfo
to create a report.
I think you can add a loop there that ifocall_get_quote
returns 42, callocall_update_qe_targetinfo()
then go back to create a new report and callocall_get_quote
again until success.
Thank you @llly !
I re-created this patch as #1881 (as the implementation drastically changed). I'm blocking this PR, and it will be closed automatically after #1881 is merged.
Description of the changes
Gramine uses the AESM daemon to request SGX Quotes from the Quoting Enclave. AESM runs in the background and may be restarted. For some reason, after a restart AESM "forgets" the platform's attestation key and requires an additional INIT_QUOTE_REQUEST before it is able to generate SGX Quotes again. This behavior is documented in Intel SGX PSW/DCAP software, but Gramine didn't implement this and thus failed with "AESM service returned error 42".
This PR complies with the SGX PSW/DCAP documentation and adds the send of INIT_QUOTE_REQUEST + retry of SGX quote retrieval.
For documentation, see e.g. this comment: https://github.com/intel/linux-sgx/blob/80a6625c497056c43e78d993e414ca99a9efed5c/common/inc/sgx_uae_quote_ex.h#L139-L140
How to test this PR?
This was detected by a customer.
A simple reproducer is like this:
Apply the above patch, run it in one terminal, and in another terminal do
sudo systemctl restart aesmd
. Without this PR, Gramine will fail. With this PR, Gramine will continue execution of the endless loop.This change is