You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.
Primarily improve code density and efficiency of function prologues and epilogues. Other load/store operations can also be optimized (e.g. load N variables from memory).
Also consider adding support for pre-decrement and post-increment, and to optionally return (J LR) after a post-increment load (effectively baking the entire epilogue into a single instruction).
Encoding
We could repurpose the currently unused format A zero-opcode, and allow load/store operations of a range of registers (specifying first to last) to/from a memory address that is specified by any register (typically SP in function progogues/epilogues).
V T OP
+-----------+---------+---------+---+---------+---+-------------+
|0 0 0 0 0 0| REGa | REGb |e f| REGc |g h|0 0 0 0 0 0 0|
+-----------+---------+---------+---+---------+---+-------------+
REGa - First register to be loaded/stored.
REGc - Last register to be loaded/stored.
REGb - Register holding the memory address (e.g. SP).
We have four bits (e, f, g and h) that can encode the operation. Here is an example:
e
f
g
h
Operation
0
0
0
0
LDM {REGa-REGc}, [REGb]
0
0
0
1
LDM {REGa-REGc}, [REGb]+
0
0
1
0
STM {REGa-REGc}, [REGb]
0
0
1
1
STM {REGa-REGc}, -[REGb]
0
1
0
0
TBD
0
1
0
1
LDM {REGa-REGc}, [REGb]+, RET(return after load and increment)
0
1
1
0
TBD
0
1
1
1
TBD
1
0
0
0
TBD (vector?)
1
0
0
1
TBD (vector?)
1
0
1
0
TBD (vector?)
1
0
1
1
TBD (vector?)
1
1
0
0
TBD (vector?)
1
1
0
1
TBD (vector?)
1
1
1
0
TBD (vector?)
1
1
1
1
TBD (vector?)
Alternatively we could use the vacant format A & C opcodes 4 & 12 (load & store, respectively), though that is likely to waste encoding space and possibly confuse the interpretation of some bits in the instruction word.
Open questions
Handling LR (and VL)
Since STM/LDM are to be used in function prologues/epilogues, the LR register is likely to be pushed/popped. However LR is currently R30, while the register allocator typically selects R16, R17, ... for callee-saved registers, which makes it impossible to form a useful register range that includes R30 (unless all registers R16-R29 are pushed/popped). There are two simple solutions:
Move LR to a lower register number, e.g. R16.
Use one of the bits in the instruction word to indicate that LR needs to be stored/loaded too.
There is a similar problem with VL (functions that use vector operations need to push VL). This speaks in favor of re-arranging the register numbers so that VL = R16, LR = R17 (for instance).
Another consideration: In a function epilogue we want to load LR as early as possible so that it is available when doing the RET instruction. This also speaks in favor of moving LR to a low register number (assuming that LDM loads the registers from memory in the order that they are listed).
Vector registers
Should we allow load/store multiple vector registers? At least spare room in the encoding to allow for it in a later ISA change, preferably in a way that maps well to how the V field in the instruction word (bits 14 & 15) is interpreted by other vector load/store instructions.
It would be very valuable to have some efficient way to store vector registers along with their vector length, with auto increment/decrement, which is very similar to what the LDM/STM instructions do for scalar registers.
Costs
Another sequencer is required, similar to the vector register sequencer. The simplest design would add a new pipeline stage before ID (just pumping out regular instructions), but a more advanced solution would embed the sequencer as part of the ID stage.
Implementations that support memory exceptions need to support resuming the load/store instruction mid-way.
The text was updated successfully, but these errors were encountered:
Purpose
Primarily improve code density and efficiency of function prologues and epilogues. Other load/store operations can also be optimized (e.g. load N variables from memory).
Also consider adding support for pre-decrement and post-increment, and to optionally return (J LR) after a post-increment load (effectively baking the entire epilogue into a single instruction).
Encoding
We could repurpose the currently unused format A zero-opcode, and allow load/store operations of a range of registers (specifying first to last) to/from a memory address that is specified by any register (typically SP in function progogues/epilogues).
REGa
- First register to be loaded/stored.REGc
- Last register to be loaded/stored.REGb
- Register holding the memory address (e.g. SP).We have four bits (e, f, g and h) that can encode the operation. Here is an example:
LDM {REGa-REGc}, [REGb]
LDM {REGa-REGc}, [REGb]+
STM {REGa-REGc}, [REGb]
STM {REGa-REGc}, -[REGb]
LDM {REGa-REGc}, [REGb]+, RET
(return after load and increment)Alternatively we could use the vacant format A & C opcodes 4 & 12 (load & store, respectively), though that is likely to waste encoding space and possibly confuse the interpretation of some bits in the instruction word.
Open questions
Handling LR (and VL)
Since STM/LDM are to be used in function prologues/epilogues, the LR register is likely to be pushed/popped. However LR is currently R30, while the register allocator typically selects R16, R17, ... for callee-saved registers, which makes it impossible to form a useful register range that includes R30 (unless all registers R16-R29 are pushed/popped). There are two simple solutions:
There is a similar problem with VL (functions that use vector operations need to push VL). This speaks in favor of re-arranging the register numbers so that VL = R16, LR = R17 (for instance).
Another consideration: In a function epilogue we want to load LR as early as possible so that it is available when doing the RET instruction. This also speaks in favor of moving LR to a low register number (assuming that LDM loads the registers from memory in the order that they are listed).
Vector registers
Should we allow load/store multiple vector registers? At least spare room in the encoding to allow for it in a later ISA change, preferably in a way that maps well to how the V field in the instruction word (bits 14 & 15) is interpreted by other vector load/store instructions.
It would be very valuable to have some efficient way to store vector registers along with their vector length, with auto increment/decrement, which is very similar to what the LDM/STM instructions do for scalar registers.
Costs
The text was updated successfully, but these errors were encountered: