arm tlb: guard TLB lockdown count #1247

lsf37 · 2024-05-02T00:37:32Z

lockTLBEntry uses the global tlbLockCount as input without checking bounds. This is fine, because the function is called at most 2 times on unicore platforms, but this is only apparent when checking the entire possible calling context.

Make this bound obvious locally by doing nothing if the function is called with values of tlbLockCount of 2 or greater. This is safe, because TLB lockdown is a performance change only. Also add an assert for debug mode, because we want to know if calling context ever changes.
remove TLB locking for Cortex A9. The code shared the A8 locking sequence with A9, but not all of these instructions are supported on A9. As far as we have been able to tell, they are silently ignored on the A9 platforms that seL4 supports, but it is of course better to remove them in case future platforms behave differently, which they would be allowed to do.

To properly add TLB locking for A9 in the future, the code would need to use the corresponding A9 instructions and take SMP into account for e.g. the sabre platforms, where lockTLBEntry is called 2 times per core, instead of 2 times overall.

These two potential issues were reported by The UK's National Cyber Security Centre (NCSC).

lsf37 · 2024-05-02T04:56:11Z

Well, it looks like the number of times this function is called is not 4 on the sabre platform..

lsf37 · 2024-05-02T05:27:06Z

It's 8, for which the bit operations are also still safe. There is something sub-optimal going on, though: it's 8, because it locks 2 TLB entries and does that each time a core is initialised (4 cores on sabre).

Sabre is the only A9 multicore platform, and for A9 it turns out that the lockdown code is wrong anyway (doesn't do anything as far as we can see), so I'm not going set the bound up, but will instead add another commit for removing A9 lockdown instead. I was going to do that in a separate PR, but with this, it's better if the A9 lockdown removal comes first.

Indanz

Only suggestion I have is to remove it altogether and see if it makes any difference in the benchmark results.

Edit: Or letting the caller pass on a unique identifier instead of using a global var, activate_kernel_vspace() is the only caller.

The call in src/arch/arm/64/kernel/vspace.c and code in src/arch/arm/armv/armv8-a/32/machine_asm.S can also be removed, now it's Cortex-A8 only. Probably want a define around tlbLockCount in src/model/statedata.c too.

Indanz · 2024-05-02T09:42:37Z

Both failing runs failed with:

No log to check boot failure on.

But otherwise all is well.

lsf37 · 2024-05-07T20:51:41Z

Only suggestion I have is to remove it altogether and see if it makes any difference in the benchmark results.

It'd be fine to do this in a separate experiment, but I'd like to keep this one to the issue it is trying to fix. I don't have any data either way if TLB locking is good or bad for performance (and under which work loads), but we do know that removing it for A9 doesn't change anything, because it was never really there.

It may well be the case that it never makes sense (I doubt it, I'm sure it was added for a reason), but it may also be the case that it does make sense on larger TLBs on more recent CPUs, so I don't think the setup as such should be removed.

The code previously used the same instructions for Cortex A8 and A9, but the Cortex A8 instructions are undocumented for A9, and A9 provides a slightly different TLB interface. As far as we can tell, the instructions were simply ignored by the supported A8 platforms, so there was no current correctness issue. Since the instructions had no effect, this commit removes A9 TLB lockdown support. This potential issue was discovered and reported by the UK's National Cyber Security Centre (NCSC). Signed-off-by: Gerwin Klein <gerwin.klein@proofcraft.systems>

lockTLBEntry uses the global tlbLockCount as input without checking bounds. This is fine, because the function is called at most 2 times per core, but this is only apparent when checking the entire possible calling context. Make this bound obvious locally by doing nothing if the function is called with values of tlbLockCount of 2 or greater. This is safe, because TLB lockdown is a performance change only. Also add an assert for debug mode, becase we want to know if calling context ever changes. This potential issue was reported by The UK's National Cyber Security Centre (NCSC). Signed-off-by: Gerwin Klein <gerwin.klein@proofcraft.systems>

lsf37 added the hw-test sel4test hardware builds + runs for this PR label May 2, 2024

lsf37 requested a review from Indanz May 2, 2024 00:38

lsf37 force-pushed the lsf37/tlb branch from 83c7ff0 to fbd2243 Compare May 2, 2024 00:41

Indanz approved these changes May 2, 2024

View reviewed changes

lsf37 force-pushed the lsf37/tlb branch from 29d7bd9 to 099114c Compare May 2, 2024 05:36

lsf37 requested a review from Indanz May 2, 2024 05:37

Indanz approved these changes May 2, 2024

View reviewed changes

lsf37 self-assigned this May 7, 2024

lsf37 added the proof-test run C proofs on PR (use when preprocess test failed) label May 7, 2024

lsf37 added 2 commits May 8, 2024 10:34

Indanz force-pushed the lsf37/tlb branch from 099114c to 6732dbf Compare May 8, 2024 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm tlb: guard TLB lockdown count #1247

arm tlb: guard TLB lockdown count #1247

lsf37 commented May 2, 2024 •

edited

lsf37 commented May 2, 2024

lsf37 commented May 2, 2024

Indanz left a comment •

edited

Indanz commented May 2, 2024

lsf37 commented May 7, 2024

arm tlb: guard TLB lockdown count #1247

Are you sure you want to change the base?

arm tlb: guard TLB lockdown count #1247

Conversation

lsf37 commented May 2, 2024 • edited

lsf37 commented May 2, 2024

lsf37 commented May 2, 2024

Indanz left a comment • edited

Choose a reason for hiding this comment

Indanz commented May 2, 2024

lsf37 commented May 7, 2024

lsf37 commented May 2, 2024 •

edited

Indanz left a comment •

edited