IRQ Fastpath #1227

danshea00 · 2024-03-19T11:31:07Z

Add an IRQ fastpath for aarch64 MCS. This fastpath is similar to the signal fastpath but also handles the case when the destination thread is of higher priority than the interrupted thread, in which case a direct context switch occurs.

seL4Bench IRQUser benchmarks (non context switching benchmark is here):

	Context switching			Non context switching
	Slowpath	Fastpath	Diff	Slowpath	Fastpath	Diff
imx8mm	808 (13)	594 (11)	26%	762 (10)	367 (3)	52%
imx8mq	1365 (3)	625 (3)	54%	1393 (20)	367 (3)	74%
odroidc4	836 (5)	745 (15)	11%	814 (49)	512 (39)	37%
tqma	863 (5)	757 (10)	12%	868 (7)	468 (3)	46%
tx1	810 (5)	735 (10)	9%	767 (6)	489 (5)	36%
zcu106	705 (10)	557 (12)	21%	749 (14)	492 (44)	34%

Indanz · 2024-03-19T12:00:24Z

First impression: There is something funny going on with imx8mq and the speedup isn't big enough to justify a fast path. I would try optimising the normal IRQ path first.

Have you tested with SMP too?

danshea00 · 2024-03-19T12:48:11Z

I agree that something is wrong with the slowpath for the imx8mq - most of the speedup in that case is from using the fastpath version of switchToThread.

I have done some SMP testing. seL4Test passes with the SMP tests enabled.

lsf37 · 2024-03-26T12:53:17Z

Do we know how much slower the slow path becomes with this patch? The additional checks are probably not much, but it would be nice to quantify.

If there is a chance to improve the slow path speed to get to similar results, that would be great of course. @danshea00 can you comment on which bits in here you think brought the highest payoff in performance improvement?

danshea00 · 2024-03-27T07:26:48Z

Most of the speedup is from the simplified scheduling logic. Here is a rough breakdown of where time is going on the imx8mm (update timestamp also includes budget checking/charging):

I just measured the worst case slowpath overhead (from the start of the fastpath to after the final slowpath_irq) to be 120 (8) cycles on the imx8mm. I can add numbers for other boards if that would be useful.

Indanz · 2024-03-27T10:36:48Z

Nice graph, how did you collect those numbers?

In your new test, the while loop in high_prio_fn should be a while (i < N_RUNS), otherwise there is no bound on how far into results[] will be written. I also don't see how high_prio_fn is stopped currently, other than crashing. On non-MCS, I'm not sure how you can distinguish between kernel timer ticks and user space IRQs.

danshea00 · 2024-03-27T12:43:54Z

I used tracepoints to record each section of the slowpath and fastpath. I just ran this with the two benchmarks, so each 'split' is the mean of 110 runs.

Thanks for the feedback on the new benchmark. At the moment the test ends with high_prio_fn waiting on the notification when low_prio_fn exits it's loop and sends on the done_ep. I'll clean things up if I end up making a PR to seL4Bench.

Add an IRQ fastpath for aarch64 MCS. This fastpath is similar to the signal fastpath, but also handles the case when the destination thread is of higher priority than the interrupted thread, in which case a direct context switch occurs. Signed-off-by: Dan Shea <danshea00@gmail.com>

danshea00 force-pushed the irq_fastpath_squashed branch from 712b901 to 2b30202 Compare April 11, 2024 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IRQ Fastpath #1227

IRQ Fastpath #1227

danshea00 commented Mar 19, 2024

Indanz commented Mar 19, 2024

danshea00 commented Mar 19, 2024

lsf37 commented Mar 26, 2024

danshea00 commented Mar 27, 2024 •

edited

Indanz commented Mar 27, 2024

danshea00 commented Mar 27, 2024

IRQ Fastpath #1227

Are you sure you want to change the base?

IRQ Fastpath #1227

Conversation

danshea00 commented Mar 19, 2024

Indanz commented Mar 19, 2024

danshea00 commented Mar 19, 2024

lsf37 commented Mar 26, 2024

danshea00 commented Mar 27, 2024 • edited

Indanz commented Mar 27, 2024

danshea00 commented Mar 27, 2024

danshea00 commented Mar 27, 2024 •

edited