Improve register allocation for calls with safepoint instruction #101596

kotlarmilos · 2024-04-26T08:46:27Z

Description

Mono uses a linear scan register algorithm to assign registers to arguments, and IL locals. Normally this algorithm is a good tradeoff between speed and efficiency. When we were using premptive GC, it would do a good job of placing values into registers so that, for example, if there was an unconditional call and a local needed to live across that call, it would go into a callee-saved register.

The problem is that with cooperative and hybrid GC, every (recursive) function now has what looks like an unconditional call right after the prolog.

As a result a simple recursive factorial function ends up looking like this:

# prolog
mov x26, x0
# safepoint code
<gc_safe_point> # IR opcode for a safepoint; clobbers all caller-saved registers
# other code
...
# recursive function call
sub w0, w26, #0x1
bl gram_Fac__int_    # recursive call
... 
# rest of the function

The safepoint instruction early on entry causes us to shuffle all the arguments into callee-saved registers.

Normally we treat the gc_safe_point as an opaque call-like IR instruction. In the LLVM backened this is what we want - LLVM has its own safepoint lowering pass that is aware that this call is extremely unlikely and allocates registers accordingly.

In the non-LLVM backends, however, this opcode persists all the way into the arch-specific backends where it gets replaced by, essentially:

<if global_gc_flag is unset, jump to continue_label:>
call runtime_gc_safepoint_icall
continue_label: nop

As a result, linear scan sees an unconditional call, but in reality it's very unlikely that we actually do a call here.

The idea is that we should add a lowering pass that replaces gc_safe_point by a conditional branch (that is marked unlikely) and a call earlier - before register allocation - if we're targeting a non-LLVM backend. This would allow linear scan to weight the call accordingly and hopefully keep function arguments in caller-saved registers.

We might not want to do it to every gc safepoint. For example, the ones on back branches might be ok to keep as a single opcode. (So we might for example add a new decomposable_gc_safepoint opcode and only replace that one by jump. then only place it in the prolog, not in back branches)

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-04-26T08:46:51Z

Tagging subscribers to this area: @lambdageek, @steveisok
See info in area-owners.md if you want to be subscribed.

kotlarmilos added the area-Codegen-JIT-mono label Apr 26, 2024

kotlarmilos added this to the Future milestone Apr 26, 2024

lambdageek added the enhancement Product code improvement that does NOT require public API changes/additions label Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve register allocation for calls with safepoint instruction #101596

Improve register allocation for calls with safepoint instruction #101596

kotlarmilos commented Apr 26, 2024 •

edited by lambdageek

dotnet-policy-service bot commented Apr 26, 2024

Improve register allocation for calls with safepoint instruction #101596

Improve register allocation for calls with safepoint instruction #101596

Comments

kotlarmilos commented Apr 26, 2024 • edited by lambdageek

Description

dotnet-policy-service bot commented Apr 26, 2024

kotlarmilos commented Apr 26, 2024 •

edited by lambdageek