Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Kernel Memory Protection #3599

Open
lschuermann opened this issue Aug 9, 2023 · 1 comment
Open

RFC: Kernel Memory Protection #3599

lschuermann opened this issue Aug 9, 2023 · 1 comment
Labels
bug enhancement rfc Issue designed for discussion and to solicit feedback. risc-v RISC-V architecture

Comments

@lschuermann
Copy link
Member

lschuermann commented Aug 9, 2023

An increasing number of microcontrollers feature mechanisms to limit memory accesses not only within an unprivileged user-mode, but also in their privileged machine-mode. Examples for these memory protection units include the RISC-V ePMP, (up to a certain extend) the RISC-V PMP, and potentially the MPU in ARM Cortex-M systems with TrustZone-M.

Tock's security and threat-model relies in large parts on the compile-time guarantees of the Rust programming language. However, this does not mean that the kernel is immune to attacks from user-space. Even assuming a correct Rust compiler, the kernel's unsafe code or low-level assembly can introduce vulnerabilities where userspace may cause the kernel to overwrite arbitrary memory locations, or execute arbitrary userspace code in machine-mode. Thus, to reduce these attack surfaces, security-oriented chips such as the OpenTitan EarlGrey SoC utilize their ePMP to limit their machine-mode accessible memory sections and permissions (see #3597).

Issues with the current interface

Tock currently features the kernel::platform::mpu::KernelMPU interface to configure such kernel memory protection regions. However, this interface has a host of issues, for example:

  • It does not account for the complex interplay between regions affecting userspace, the kernel, or (depending on the hardware) both. Rule precedence is implicit and implementation dependent in this API: for example, to set up a read-only flash region with a read-execute kernel .text section, on RISC-V you would first need to configure the larger flash region, and then the .text section. This assumes that the implementation will set up the PMP entries in reverse-order, such that entries added later precede earlier ones.
  • The interface does not adequately account for pre-locked regions in the memory-protection implementation. These regions may alias some of the memory to be protected, hence entry placement relative to those locked regions is important.
  • Looking at the allocate_kernel_region method documentation, statements such as "note that kernel level permissions also apply to apps" are, for instance, simply not true with the ePMP MML mode.
  • The current interface seems to suggest that the kernel is able to re-configure (or at least add regions) to the kernel MPU at runtime. This collides with the semantics exposed by the RISC-V PMP (non-ePMP) which enforces all locked entries for both the kernel and user-space. It requires a special "deny-all" user-mode entry after all other user-mode regions to properly limit user-mode access.

Example of a KernelMPU mis-configuration

These unclear semantics result in behavior such as the following. When running the following libtock-c application on a system without kernel memory protection enabled (for example, the LiteX sim board), the application faults:

int main(void) {
  *((uint32_t *) 0x40000000) = 0xDEADBEEF;
  printf("Hello Fault!\r\n");
  return 0;
}

However, when we now copy the kernel MPU setup code from the OpenTitan EarlGrey board initialization and apply it to the LiteX sim platform, we see the following:

Verilated LiteX+VexRiscv: initialization complete, entering main loop.
Hello Fault!

The above example demonstrates that enabling the KernelMPU actually allows access to all of the configured kernel protection regions from user-mode. This is because locked PMP regions apply to user-mode and kernel mode. The PMP fails to add a "deny-all" fallback user-mode region with a higher priority than all kernel protection regions. Doing so would prevent adding additional kernel-mode regions, which is something the API does not account for. I would argue that this is not expected behavior.

For reference, here is the KernelMPU Initialization Code on the LiteX sim board:

    // These symbols are defined in the linker script.
    extern "C" {
        /// Beginning of the ROM region containing app images.
        static _sapps: u8;
        /// End of the ROM region containing app images.
        static _eapps: u8;
        /// Beginning of the RAM region for app memory.
        static mut _sappmem: u8;
        /// End of the RAM region for app memory.
        static _eappmem: u8;
	/// The start of the kernel stack (Included only for kernel PMP)
        static _sstack: u8;
        /// The end of the kernel stack (Included only for kernel PMP)
        static _estack: u8;
        /// The start of the kernel text (Included only for kernel PMP)
        static _stext: u8;
        /// The end of the kernel text (Included only for kernel PMP)
        static _etext: u8;
	/// The start of the kernel BSS (Included only for kernel PMP)
        static _szero: u8;
        /// The end of the kernel BSS (Included only for kernel PMP)
        static _ezero: u8;
    }

    use kernel::platform::mpu::{self, KernelMPU};

    let mut mpu_config = chip.pmp.new_kernel_config().unwrap();

    // The kernel stack, BSS and relocation data
    chip.pmp
        .allocate_kernel_region(
            &_sstack as *const u8,
            &_ezero as *const u8 as usize - &_sstack as *const u8 as usize,
            mpu::Permissions::ReadWriteOnly,
            &mut mpu_config,
        )
        .unwrap();
    // The kernel text, Manifest and vectors
    chip.pmp
        .allocate_kernel_region(
            &_stext as *const u8,
            &_etext as *const u8 as usize - &_stext as *const u8 as usize,
            mpu::Permissions::ReadExecuteOnly,
            &mut mpu_config,
        )
        .unwrap();
    // The app locations
    chip.pmp.allocate_kernel_region(
        &_sapps as *const u8,
        &_eapps as *const u8 as usize - &_sapps as *const u8 as usize,
        mpu::Permissions::ReadWriteOnly,
        &mut mpu_config,
    );
    // The app memory locations
    chip.pmp.allocate_kernel_region(
        &_sappmem as *const u8,
        &_eappmem as *const u8 as usize - &_sappmem as *const u8 as usize,
        mpu::Permissions::ReadWriteOnly,
        &mut mpu_config,
    );
    // Access to the MMIO devices
    chip.pmp
        .allocate_kernel_region(
            0xf000_0000 as *const u8,
            0x1000_0000,
            mpu::Permissions::ReadWriteOnly,
            &mut mpu_config,
        )
        .unwrap();

    chip.pmp.enable_kernel_mpu(&mut mpu_config);

The need for new interface(s)

To properly support kernel memory protection implementations, we need to re-design the KernelMPU interface. As even our already supported hardware demonstrates, it fails to account for the plethora of different hardware configurations and their associated semantics. As the kernel MPU would be an integral part to the system's security, we need APIs which configure the MPU in a predictable manner, without risking to accidentally weaken the security by exposing spurious memory regions to unprivileged applications.

I hope that we can discuss the set of requirements and different hardware semantics within this issue, and define a set of interfaces which adequately capture these constraints.

@lschuermann lschuermann added bug enhancement rfc Issue designed for discussion and to solicit feedback. risc-v RISC-V architecture labels Aug 9, 2023
@alistair23
Copy link
Contributor

Wow, that is a nasty bug!

  • It does not account for the complex interplay between regions affecting userspace, the kernel, or (depending on the hardware) both.

Yeah, that's fair. Maybe a priority argument could help boards fix this up

  • The interface does not adequately account for pre-locked regions in the memory-protection implementation

That's a hard problem to get right though. The hope is that a board can work around those. In theory the current API could work, as it could probe existing locked regions and then not configure regions already covered by them.

  • The current interface seems to suggest that the kernel is able to re-configure (or at least add regions) to the kernel MPU at runtime. This collides with the semantics exposed by the RISC-V PMP (non-ePMP) which enforces all locked entries for both the kernel and user-space. It requires a special "deny-all" user-mode entry after all other user-mode regions to properly limit user-mode access.

That must just be some bad documentation, because the kernel regions should not be changed at run time. The documentation for enable_kernel_mpu() states this, but that should be clearer.

Overall I think the interface can be improved. One interesting thing would be to update it to support improvements to the ARM MPU so the interface could be used on ARM as well. Obviously fixing the bug you mentioned is critical as well.

Another interesting goal would be to try and enable it more by default. I can't remember if the ARM MPU can support this, but it would be nice to enable at least some basic W^X protections for all boards.

@lschuermann lschuermann changed the title Tracking / Discussion: Kernel Memory Protection RFC: Kernel Memory Protection Aug 18, 2023
lschuermann added a commit to lschuermann/tock that referenced this issue Jan 12, 2024
As discussed in issue tock#3599 [1] and PR tock#3597 [2],
the `KernelMPU` trait is not a particularly good abstraction for
implementing a memory protection mechanism also affecting the
kernel. Some of its issues are:

- It does not account for the complex interplay between regions
  affecting userspace, the kernel, or (depending on the hardware)
  both. Rule precedence is implicit and implementation dependent in
  this API: for example, to set up a read-only flash region with a
  read-execute kernel .text section, on RISC-V you would first need to
  configure the larger flash region, and then the .text section. This
  assumes that the implementation will set up the PMP entries in
  reverse-order, such that entries added later precede earlier ones.

- The interface does not adequately account for pre-locked regions in
  the memory-protection implementation. These regions may alias some
  of the memory to be protected, hence entry placement relative to
  those locked regions is important.

- Looking at the allocate_kernel_region method documentation,
  statements such as "note that kernel level permissions also apply to
  apps" are, for instance, simply not true with the ePMP MML mode.

- The current interface seems to suggest that the kernel is able to
  re-configure (or at least add regions) to the kernel MPU at
  runtime. This collides with the semantics exposed by the RISC-V PMP
  (non-ePMP) which enforces all locked entries for both the kernel and
  user-space. It requires a special "deny-all" user-mode entry after
  all other user-mode regions to properly limit user-mode access.

With the introduction of the `KernelProtectionPMP` and `EarlGreyEPMP`,
which both implement some form of kernel-mode memory protection, but
take their memory regions as arguments in their constructor, we do not
need the `KernelMPU` trait any longer. At some point it might make
sense to resurrect this trait with a clearer and portable set of API
semantics.

[1]: tock#3599
[2]: tock#3597
lschuermann added a commit to lschuermann/tock that referenced this issue Jan 16, 2024
As discussed in issue tock#3599 [1] and PR tock#3597 [2],
the `KernelMPU` trait is not a particularly good abstraction for
implementing a memory protection mechanism also affecting the
kernel. Some of its issues are:

- It does not account for the complex interplay between regions
  affecting userspace, the kernel, or (depending on the hardware)
  both. Rule precedence is implicit and implementation dependent in
  this API: for example, to set up a read-only flash region with a
  read-execute kernel .text section, on RISC-V you would first need to
  configure the larger flash region, and then the .text section. This
  assumes that the implementation will set up the PMP entries in
  reverse-order, such that entries added later precede earlier ones.

- The interface does not adequately account for pre-locked regions in
  the memory-protection implementation. These regions may alias some
  of the memory to be protected, hence entry placement relative to
  those locked regions is important.

- Looking at the allocate_kernel_region method documentation,
  statements such as "note that kernel level permissions also apply to
  apps" are, for instance, simply not true with the ePMP MML mode.

- The current interface seems to suggest that the kernel is able to
  re-configure (or at least add regions) to the kernel MPU at
  runtime. This collides with the semantics exposed by the RISC-V PMP
  (non-ePMP) which enforces all locked entries for both the kernel and
  user-space. It requires a special "deny-all" user-mode entry after
  all other user-mode regions to properly limit user-mode access.

With the introduction of the `KernelProtectionPMP` and `EarlGreyEPMP`,
which both implement some form of kernel-mode memory protection, but
take their memory regions as arguments in their constructor, we do not
need the `KernelMPU` trait any longer. At some point it might make
sense to resurrect this trait with a clearer and portable set of API
semantics.

[1]: tock#3599
[2]: tock#3597
lschuermann added a commit to lschuermann/tock that referenced this issue Jan 18, 2024
As discussed in issue tock#3599 [1] and PR tock#3597 [2],
the `KernelMPU` trait is not a particularly good abstraction for
implementing a memory protection mechanism also affecting the
kernel. Some of its issues are:

- It does not account for the complex interplay between regions
  affecting userspace, the kernel, or (depending on the hardware)
  both. Rule precedence is implicit and implementation dependent in
  this API: for example, to set up a read-only flash region with a
  read-execute kernel .text section, on RISC-V you would first need to
  configure the larger flash region, and then the .text section. This
  assumes that the implementation will set up the PMP entries in
  reverse-order, such that entries added later precede earlier ones.

- The interface does not adequately account for pre-locked regions in
  the memory-protection implementation. These regions may alias some
  of the memory to be protected, hence entry placement relative to
  those locked regions is important.

- Looking at the allocate_kernel_region method documentation,
  statements such as "note that kernel level permissions also apply to
  apps" are, for instance, simply not true with the ePMP MML mode.

- The current interface seems to suggest that the kernel is able to
  re-configure (or at least add regions) to the kernel MPU at
  runtime. This collides with the semantics exposed by the RISC-V PMP
  (non-ePMP) which enforces all locked entries for both the kernel and
  user-space. It requires a special "deny-all" user-mode entry after
  all other user-mode regions to properly limit user-mode access.

With the introduction of the `KernelProtectionPMP` and `EarlGreyEPMP`,
which both implement some form of kernel-mode memory protection, but
take their memory regions as arguments in their constructor, we do not
need the `KernelMPU` trait any longer. At some point it might make
sense to resurrect this trait with a clearer and portable set of API
semantics.

[1]: tock#3599
[2]: tock#3597
lschuermann added a commit to lschuermann/tock that referenced this issue Jan 25, 2024
As discussed in issue tock#3599 [1] and PR tock#3597 [2],
the `KernelMPU` trait is not a particularly good abstraction for
implementing a memory protection mechanism also affecting the
kernel. Some of its issues are:

- It does not account for the complex interplay between regions
  affecting userspace, the kernel, or (depending on the hardware)
  both. Rule precedence is implicit and implementation dependent in
  this API: for example, to set up a read-only flash region with a
  read-execute kernel .text section, on RISC-V you would first need to
  configure the larger flash region, and then the .text section. This
  assumes that the implementation will set up the PMP entries in
  reverse-order, such that entries added later precede earlier ones.

- The interface does not adequately account for pre-locked regions in
  the memory-protection implementation. These regions may alias some
  of the memory to be protected, hence entry placement relative to
  those locked regions is important.

- Looking at the allocate_kernel_region method documentation,
  statements such as "note that kernel level permissions also apply to
  apps" are, for instance, simply not true with the ePMP MML mode.

- The current interface seems to suggest that the kernel is able to
  re-configure (or at least add regions) to the kernel MPU at
  runtime. This collides with the semantics exposed by the RISC-V PMP
  (non-ePMP) which enforces all locked entries for both the kernel and
  user-space. It requires a special "deny-all" user-mode entry after
  all other user-mode regions to properly limit user-mode access.

With the introduction of the `KernelProtectionPMP` and `EarlGreyEPMP`,
which both implement some form of kernel-mode memory protection, but
take their memory regions as arguments in their constructor, we do not
need the `KernelMPU` trait any longer. At some point it might make
sense to resurrect this trait with a clearer and portable set of API
semantics.

[1]: tock#3599
[2]: tock#3597
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug enhancement rfc Issue designed for discussion and to solicit feedback. risc-v RISC-V architecture
Projects
None yet
Development

No branches or pull requests

2 participants