Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupt enclu execution on main enclave thread exiting #493

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

maxknv
Copy link

@maxknv maxknv commented Jul 4, 2023

  • Issue SIGUSR1 to worker threads on exiting the main sysloop
  • Register SIGUSR1 handler to rewrite thread's IP if it's on enclu

Resolves #165

@maxknv maxknv force-pushed the max/gh-165-abort-hanged-enclave-thread branch from 8226083 to a1a2e0f Compare July 5, 2023 10:52
intel-sgx/enclave-runner/src/usercalls/mod.rs Outdated Show resolved Hide resolved
intel-sgx/fortanix-sgx-tools/src/bin/ftxsgx-runner.rs Outdated Show resolved Hide resolved
intel-sgx/fortanix-sgx-tools/src/bin/ftxsgx-runner.rs Outdated Show resolved Hide resolved
});
if is_enclu {
// At enclu instruction - force IP to the next instruction after enclu
unsafe { (*(_context as *mut ucontext_t)).uc_mcontext.gregs[REG_RIP as usize] += 3 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the enclave execution not to be re-entered, but it does not kill the thread. But what happens in that worker thread in those cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some garbage CoResult will be returned and sending it to IO queue will fail as sysloop is terminated by that moment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should adjust RAX and potentially other registers as well to something meaningful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now enclave exiting flag is passed into coenter() to break enclu/AEX loop and special value CoResult::Abort will be returned if an enclave thread was interrupted that way. Why we may still need register adjustments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you need to guarantee the VDSO code branches properly?

@maxknv maxknv force-pushed the max/gh-165-abort-hanged-enclave-thread branch 4 times, most recently from 739418a to a501d3e Compare July 10, 2023 08:28
@maxknv maxknv changed the title #165 Abort hanged enclave thread by SIGUSR1 Abort hanged enclave thread by SIGUSR1 Jul 10, 2023
@maxknv maxknv requested a review from raoulstrackx July 10, 2023 08:39
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
intel-sgx/fortanix-sgx-tools/src/bin/ftxsgx-runner.rs Outdated Show resolved Hide resolved
});
if is_enclu {
// At enclu instruction - force IP to the next instruction after enclu
unsafe { (*(_context as *mut ucontext_t)).uc_mcontext.gregs[REG_RIP as usize] += 3 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should adjust RAX and potentially other registers as well to something meaningful


let hdl = signal::SigHandler::SigAction(handle_sigusr1);
let sig_action = signal::SigAction::new(hdl, signal::SaFlags::empty(), signal::SigSet::empty());
signal::sigaction(signal::SIGUSR1, &sig_action).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USR1 is ok, but isn't e.g. HUP more appropriate? Combined with only conditionally rewriting the IP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to HUP.
Calling this handler is now conditional for vdso/non-vdso. Is that what you mean by conditional rewriting IP?

@maxknv maxknv force-pushed the max/gh-165-abort-hanged-enclave-thread branch from f0a9c61 to 5d62085 Compare July 12, 2023 16:05
sgx_result = run.function;
match sgx_result.try_into() {
enclu_leaf = run.function;
match enclu_leaf.try_into() {
Ok(Enclu::EExit) => { /* normal case */ },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer the "normal case". When you interrupt the enclave in the signal handler, it explicitly writes $EEXIT to enclu_leaf, even though it should be ERESUME. See the vdso. You need to detect such cases and reenter the enclave. You can check the vdso to see which registers you can use for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added EXITING flag which is tested in the signup handler addresses this issue as well

// Issuing a signal to return execution control back to the enclave-runner's worker thread.
// * In non-vdso case execution control is claimed during AEX using special handler provided by AEP address
// in coenter() function and signal is not required
unsafe { libc::pthread_kill(handler.as_pthread_t() as _, signal::SIGHUP as _); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this conditional on has_vdso_sgx_enter_enclave?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 different solutions for vdso/non-vdso:
with vdso: enclave execution is interrupted by a signal and RIP rewriting.
withouth vdso: enclave execution interrupted using special instruction under AEP, no signal is needed

Comment on lines 195 to 199
lea 1f(%rip), %rcx // set SGX AEP
push %rbx // store original rbx value on the stack
mov {0}, %rbx
enclu
1: pop %rbx // restore original rbx value in case we interrupt enclave during AEX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the change here. This looks like it's intended for RIP rewriting, but the current signal handler is only enabled with the VDSO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all, AEP is changed here to check exiting flag on each AEX and interrupt enclu if needed.
secondly, push/pop instructions for RBX just allow for preserving the original value in RBX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments on %rbx don't add much value. The point here is that due to a limitation of LLVM we can't use rbx as an input to the inline assembly directly, nor can we say it's content is clobbered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdx comments removed

@maxknv maxknv changed the title Abort hanged enclave thread by SIGUSR1 Interrupt enclu execution on main enclave thread exiting Jul 14, 2023
intel-sgx/enclave-runner/src/tcs.rs Show resolved Hide resolved
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
sgx_result = run.function;
match sgx_result.try_into() {
enclu_leaf = run.function;
match enclu_leaf.try_into() {
Ok(Enclu::EExit) => { /* normal case */ },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a comment here that you reach this when the signal handler intervenes and forces the enclave not to be re-entered?

intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
if has_vdso_sgx_enter_enclave() {
// * The enclave thread may be in a long-running `AEX/enclu[ERESUME]` loop.
// Issuing a signal to return execution control back to the enclave-runner's worker thread.
// * In non-vdso case execution control is claimed during AEX using special handler provided by AEP address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "special handler"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the comment

intel-sgx/enclave-runner/src/usercalls/mod.rs Outdated Show resolved Hide resolved
// Interrupt enclave execution by setting IP to the instruction following the ENCLU to mimic normal ENCLU[EXIT])
unsafe {
(*(_context as *mut ucontext_t)).uc_mcontext.gregs[REG_RIP as usize] += 3;
(*(_context as *mut ucontext_t)).uc_mcontext.gregs[REG_RAX as usize] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0? This doesn't mimic a normal EEXIT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose 0 just as some deterministic value... Idk which one is better, both looks not very nice. but ok, ill switch to EExit

intel-sgx/enclave-runner/src/usercalls/mod.rs Outdated Show resolved Hide resolved
if is_enclu && EXITING.with(|cell| cell.borrow().as_ref().is_some_and(|val| val.load(Ordering::SeqCst) == true)) {
// Interrupt enclave execution by setting IP to the instruction following the ENCLU to mimic normal ENCLU[EXIT])
unsafe {
(*(_context as *mut ucontext_t)).uc_mcontext.gregs[REG_RIP as usize] += 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rename _context to context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@maxknv maxknv force-pushed the max/gh-165-abort-hanged-enclave-thread branch 2 times, most recently from 3b3ad1a to cd0790f Compare July 17, 2023 09:49
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
intel-sgx/enclave-runner/src/tcs.rs Outdated Show resolved Hide resolved
intel-sgx/enclave-runner/src/tcs.rs Show resolved Hide resolved
intel-sgx/enclave-runner/src/usercalls/mod.rs Outdated Show resolved Hide resolved
intel-sgx/enclave-runner/src/usercalls/mod.rs Outdated Show resolved Hide resolved
@raoulstrackx
Copy link
Contributor

Looks good to me! @mkaynov can you rebase so I can take the final version for a spin?

@maxknv maxknv force-pushed the max/gh-165-abort-hanged-enclave-thread branch 4 times, most recently from 8268018 to bbe7c37 Compare July 25, 2023 14:25
@maxknv maxknv force-pushed the max/gh-165-abort-hanged-enclave-thread branch from bbe7c37 to 9d998d9 Compare July 26, 2023 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enclave threads should be terminated on exit
4 participants