Add LDM/STM (LoaD Multiple/STore Multiple) #141

mbitsnbites · 2022-06-28T06:19:23Z

Purpose

Primarily improve code density and efficiency of function prologues and epilogues. Other load/store operations can also be optimized (e.g. load N variables from memory).

Also consider adding support for pre-decrement and post-increment, and to optionally return (J LR) after a post-increment load (effectively baking the entire epilogue into a single instruction).

Encoding

We could repurpose the currently unused format A zero-opcode, and allow load/store operations of a range of registers (specifying first to last) to/from a memory address that is specified by any register (typically SP in function progogues/epilogues).

                                  V             T       OP
+-----------+---------+---------+---+---------+---+-------------+
|0 0 0 0 0 0| REGa    | REGb    |e f| REGc    |g h|0 0 0 0 0 0 0|
+-----------+---------+---------+---+---------+---+-------------+

REGa - First register to be loaded/stored.
REGc - Last register to be loaded/stored.
REGb - Register holding the memory address (e.g. SP).

We have four bits (e, f, g and h) that can encode the operation. Here is an example:

e	f	g	h	Operation
0	0	0	0	`LDM {REGa-REGc}, [REGb]`
0	0	0	1	`LDM {REGa-REGc}, [REGb]+`
0	0	1	0	`STM {REGa-REGc}, [REGb]`
0	0	1	1	`STM {REGa-REGc}, -[REGb]`
0	1	0	0	TBD
0	1	0	1	`LDM {REGa-REGc}, [REGb]+, RET` (return after load and increment)
0	1	1	0	TBD
0	1	1	1	TBD
1	0	0	0	TBD (vector?)
1	0	0	1	TBD (vector?)
1	0	1	0	TBD (vector?)
1	0	1	1	TBD (vector?)
1	1	0	0	TBD (vector?)
1	1	0	1	TBD (vector?)
1	1	1	0	TBD (vector?)
1	1	1	1	TBD (vector?)

Alternatively we could use the vacant format A & C opcodes 4 & 12 (load & store, respectively), though that is likely to waste encoding space and possibly confuse the interpretation of some bits in the instruction word.

Open questions

Handling LR (and VL)

Since STM/LDM are to be used in function prologues/epilogues, the LR register is likely to be pushed/popped. However LR is currently R30, while the register allocator typically selects R16, R17, ... for callee-saved registers, which makes it impossible to form a useful register range that includes R30 (unless all registers R16-R29 are pushed/popped). There are two simple solutions:

Move LR to a lower register number, e.g. R16.
Use one of the bits in the instruction word to indicate that LR needs to be stored/loaded too.

There is a similar problem with VL (functions that use vector operations need to push VL). This speaks in favor of re-arranging the register numbers so that VL = R16, LR = R17 (for instance).

Another consideration: In a function epilogue we want to load LR as early as possible so that it is available when doing the RET instruction. This also speaks in favor of moving LR to a low register number (assuming that LDM loads the registers from memory in the order that they are listed).

Vector registers

Should we allow load/store multiple vector registers? At least spare room in the encoding to allow for it in a later ISA change, preferably in a way that maps well to how the V field in the instruction word (bits 14 & 15) is interpreted by other vector load/store instructions.

It would be very valuable to have some efficient way to store vector registers along with their vector length, with auto increment/decrement, which is very similar to what the LDM/STM instructions do for scalar registers.

Costs

Another sequencer is required, similar to the vector register sequencer. The simplest design would add a new pipeline stage before ID (just pumping out regular instructions), but a more advanced solution would embed the sequencer as part of the ID stage.
Implementations that support memory exceptions need to support resuming the load/store instruction mid-way.

The text was updated successfully, but these errors were encountered:

mbitsnbites added this to the v0.4 milestone Jun 28, 2022

mbitsnbites mentioned this issue Jun 28, 2022

Idea for compressed instructions: Use a special 32-bit VLIW format #103

Open

mbitsnbites added the investigate label Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LDM/STM (LoaD Multiple/STore Multiple) #141

Add LDM/STM (LoaD Multiple/STore Multiple) #141

mbitsnbites commented Jun 28, 2022 •

edited

Add LDM/STM (LoaD Multiple/STore Multiple) #141

Add LDM/STM (LoaD Multiple/STore Multiple) #141

Comments

mbitsnbites commented Jun 28, 2022 • edited

Purpose

Encoding

Open questions

Handling LR (and VL)

Vector registers

Costs

mbitsnbites commented Jun 28, 2022 •

edited