RFC: Kernel Memory Protection #3599

lschuermann · 2023-08-09T15:58:22Z

An increasing number of microcontrollers feature mechanisms to limit memory accesses not only within an unprivileged user-mode, but also in their privileged machine-mode. Examples for these memory protection units include the RISC-V ePMP, (up to a certain extend) the RISC-V PMP, and potentially the MPU in ARM Cortex-M systems with TrustZone-M.

Tock's security and threat-model relies in large parts on the compile-time guarantees of the Rust programming language. However, this does not mean that the kernel is immune to attacks from user-space. Even assuming a correct Rust compiler, the kernel's unsafe code or low-level assembly can introduce vulnerabilities where userspace may cause the kernel to overwrite arbitrary memory locations, or execute arbitrary userspace code in machine-mode. Thus, to reduce these attack surfaces, security-oriented chips such as the OpenTitan EarlGrey SoC utilize their ePMP to limit their machine-mode accessible memory sections and permissions (see #3597).

Issues with the current interface

Tock currently features the kernel::platform::mpu::KernelMPU interface to configure such kernel memory protection regions. However, this interface has a host of issues, for example:

It does not account for the complex interplay between regions affecting userspace, the kernel, or (depending on the hardware) both. Rule precedence is implicit and implementation dependent in this API: for example, to set up a read-only flash region with a read-execute kernel .text section, on RISC-V you would first need to configure the larger flash region, and then the .text section. This assumes that the implementation will set up the PMP entries in reverse-order, such that entries added later precede earlier ones.
The interface does not adequately account for pre-locked regions in the memory-protection implementation. These regions may alias some of the memory to be protected, hence entry placement relative to those locked regions is important.
Looking at the allocate_kernel_region method documentation, statements such as "note that kernel level permissions also apply to apps" are, for instance, simply not true with the ePMP MML mode.
The current interface seems to suggest that the kernel is able to re-configure (or at least add regions) to the kernel MPU at runtime. This collides with the semantics exposed by the RISC-V PMP (non-ePMP) which enforces all locked entries for both the kernel and user-space. It requires a special "deny-all" user-mode entry after all other user-mode regions to properly limit user-mode access.

Example of a `KernelMPU` mis-configuration

These unclear semantics result in behavior such as the following. When running the following libtock-c application on a system without kernel memory protection enabled (for example, the LiteX sim board), the application faults:

int main(void) {
  *((uint32_t *) 0x40000000) = 0xDEADBEEF;
  printf("Hello Fault!\r\n");
  return 0;
}

However, when we now copy the kernel MPU setup code from the OpenTitan EarlGrey board initialization and apply it to the LiteX sim platform, we see the following:

Verilated LiteX+VexRiscv: initialization complete, entering main loop.
Hello Fault!

The above example demonstrates that enabling the KernelMPU actually allows access to all of the configured kernel protection regions from user-mode. This is because locked PMP regions apply to user-mode and kernel mode. The PMP fails to add a "deny-all" fallback user-mode region with a higher priority than all kernel protection regions. Doing so would prevent adding additional kernel-mode regions, which is something the API does not account for. I would argue that this is not expected behavior.

For reference, here is the KernelMPU Initialization Code on the LiteX sim board:

    // These symbols are defined in the linker script.
    extern "C" {
        /// Beginning of the ROM region containing app images.
        static _sapps: u8;
        /// End of the ROM region containing app images.
        static _eapps: u8;
        /// Beginning of the RAM region for app memory.
        static mut _sappmem: u8;
        /// End of the RAM region for app memory.
        static _eappmem: u8;
	/// The start of the kernel stack (Included only for kernel PMP)
        static _sstack: u8;
        /// The end of the kernel stack (Included only for kernel PMP)
        static _estack: u8;
        /// The start of the kernel text (Included only for kernel PMP)
        static _stext: u8;
        /// The end of the kernel text (Included only for kernel PMP)
        static _etext: u8;
	/// The start of the kernel BSS (Included only for kernel PMP)
        static _szero: u8;
        /// The end of the kernel BSS (Included only for kernel PMP)
        static _ezero: u8;
    }

    use kernel::platform::mpu::{self, KernelMPU};

    let mut mpu_config = chip.pmp.new_kernel_config().unwrap();

    // The kernel stack, BSS and relocation data
    chip.pmp
        .allocate_kernel_region(
            &_sstack as *const u8,
            &_ezero as *const u8 as usize - &_sstack as *const u8 as usize,
            mpu::Permissions::ReadWriteOnly,
            &mut mpu_config,
        )
        .unwrap();
    // The kernel text, Manifest and vectors
    chip.pmp
        .allocate_kernel_region(
            &_stext as *const u8,
            &_etext as *const u8 as usize - &_stext as *const u8 as usize,
            mpu::Permissions::ReadExecuteOnly,
            &mut mpu_config,
        )
        .unwrap();
    // The app locations
    chip.pmp.allocate_kernel_region(
        &_sapps as *const u8,
        &_eapps as *const u8 as usize - &_sapps as *const u8 as usize,
        mpu::Permissions::ReadWriteOnly,
        &mut mpu_config,
    );
    // The app memory locations
    chip.pmp.allocate_kernel_region(
        &_sappmem as *const u8,
        &_eappmem as *const u8 as usize - &_sappmem as *const u8 as usize,
        mpu::Permissions::ReadWriteOnly,
        &mut mpu_config,
    );
    // Access to the MMIO devices
    chip.pmp
        .allocate_kernel_region(
            0xf000_0000 as *const u8,
            0x1000_0000,
            mpu::Permissions::ReadWriteOnly,
            &mut mpu_config,
        )
        .unwrap();

    chip.pmp.enable_kernel_mpu(&mut mpu_config);

The need for new interface(s)

To properly support kernel memory protection implementations, we need to re-design the KernelMPU interface. As even our already supported hardware demonstrates, it fails to account for the plethora of different hardware configurations and their associated semantics. As the kernel MPU would be an integral part to the system's security, we need APIs which configure the MPU in a predictable manner, without risking to accidentally weaken the security by exposing spurious memory regions to unprivileged applications.

I hope that we can discuss the set of requirements and different hardware semantics within this issue, and define a set of interfaces which adequately capture these constraints.

The text was updated successfully, but these errors were encountered:

alistair23 · 2023-08-09T17:33:35Z

Wow, that is a nasty bug!

It does not account for the complex interplay between regions affecting userspace, the kernel, or (depending on the hardware) both.

Yeah, that's fair. Maybe a priority argument could help boards fix this up

The interface does not adequately account for pre-locked regions in the memory-protection implementation

That's a hard problem to get right though. The hope is that a board can work around those. In theory the current API could work, as it could probe existing locked regions and then not configure regions already covered by them.

The current interface seems to suggest that the kernel is able to re-configure (or at least add regions) to the kernel MPU at runtime. This collides with the semantics exposed by the RISC-V PMP (non-ePMP) which enforces all locked entries for both the kernel and user-space. It requires a special "deny-all" user-mode entry after all other user-mode regions to properly limit user-mode access.

That must just be some bad documentation, because the kernel regions should not be changed at run time. The documentation for enable_kernel_mpu() states this, but that should be clearer.

Overall I think the interface can be improved. One interesting thing would be to update it to support improvements to the ARM MPU so the interface could be used on ARM as well. Obviously fixing the bug you mentioned is critical as well.

Another interesting goal would be to try and enable it more by default. I can't remember if the ARM MPU can support this, but it would be nice to enable at least some basic W^X protections for all boards.

As discussed in issue tock#3599 [1] and PR tock#3597 [2], the `KernelMPU` trait is not a particularly good abstraction for implementing a memory protection mechanism also affecting the kernel. Some of its issues are: - It does not account for the complex interplay between regions affecting userspace, the kernel, or (depending on the hardware) both. Rule precedence is implicit and implementation dependent in this API: for example, to set up a read-only flash region with a read-execute kernel .text section, on RISC-V you would first need to configure the larger flash region, and then the .text section. This assumes that the implementation will set up the PMP entries in reverse-order, such that entries added later precede earlier ones. - The interface does not adequately account for pre-locked regions in the memory-protection implementation. These regions may alias some of the memory to be protected, hence entry placement relative to those locked regions is important. - Looking at the allocate_kernel_region method documentation, statements such as "note that kernel level permissions also apply to apps" are, for instance, simply not true with the ePMP MML mode. - The current interface seems to suggest that the kernel is able to re-configure (or at least add regions) to the kernel MPU at runtime. This collides with the semantics exposed by the RISC-V PMP (non-ePMP) which enforces all locked entries for both the kernel and user-space. It requires a special "deny-all" user-mode entry after all other user-mode regions to properly limit user-mode access. With the introduction of the `KernelProtectionPMP` and `EarlGreyEPMP`, which both implement some form of kernel-mode memory protection, but take their memory regions as arguments in their constructor, we do not need the `KernelMPU` trait any longer. At some point it might make sense to resurrect this trait with a clearer and portable set of API semantics. [1]: tock#3599 [2]: tock#3597

lschuermann added bug enhancement rfc Issue designed for discussion and to solicit feedback. risc-v RISC-V architecture labels Aug 9, 2023

lschuermann mentioned this issue Aug 9, 2023

arch/rv32i: re-design PMP architecture & implement OpenTitan EarlGrey-specific ePMP #3597

Merged

2 tasks

lschuermann changed the title ~~Tracking / Discussion: Kernel Memory Protection~~ RFC: Kernel Memory Protection Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Kernel Memory Protection #3599

RFC: Kernel Memory Protection #3599

lschuermann commented Aug 9, 2023 •

edited

alistair23 commented Aug 9, 2023

RFC: Kernel Memory Protection #3599

RFC: Kernel Memory Protection #3599

Comments

lschuermann commented Aug 9, 2023 • edited

Issues with the current interface

Example of a KernelMPU mis-configuration

The need for new interface(s)

alistair23 commented Aug 9, 2023

lschuermann commented Aug 9, 2023 •

edited

Example of a `KernelMPU` mis-configuration