Add facilities for masked vector load/store #108

mbitsnbites · 2020-08-28T13:51:23Z

Masking vector store operations is a very useful feature (e.g. think about 3D frustum culling). It is not clear how it should be implemented in the MRISC32 ISA, but using one vector register as a mask register (essentially treat it as a byte select mask) during the store would probably do the trick.

The mask

We should be able to utilize the fact that we already need three vector register file read ports for other instructions (SEL (bitwise select) and FMA (fused multiply-accumulate)), and use the register file read ports as follows:

1R (scalar) for the base address.
1R (vector) for the data to store.
1R (vector, optional) for the address offset (for scatter store).
1R (vector) for the mask register.

The mask register could be interpreted as a byte mask, allowing it to act on half-words and bytes as well as full words. Thus any s[cc][.h|.b] instruction (along with other logical instructions such as and/or/xor) can be used to produce a valid store mask.

The instruction(s)

There are several alternatives for how to encode & interpret the special "store vector register with mask" instruction. For instance:

Dedicate one (or more?) vector register as a mask register, and call it VM. Just like the scalar VL register, it can be used as a regular register by all instructions (and a HW implementation may chose to keep a separate copy of the relevant bits of the VM register in order to avoid using a regular register read port). The special store-with-mask instruction would then implicitly use that register as the mask. E.g:
a. stw/m v6, [r8, #4] ; Store v6 to address r8, with stride 4, and use vm as the mask register
Split the 32 vector registers into 16+16 registers, where one half (e.g. odd registers, or registers v16-v31) are implicitly used as mask register for the store instruction, e.g. as follows:
a. stw/m v6:v7, [r8, #4] ; Store v6 to address r8, with stride 4, and use v7 as the mask register

We could also repurpose the folding vector mode (VM=01) for masking, since folding is not supported by load/store instructions anyway.

Outstanding questions

For stride based stores, should the address be incremented for non-stored elements?
- There are probably pros and cons with both variants, but incrementing the address for both stored and non-stored elements is simpler to implement in hardware, and has a nice logic to it - especially if we want to support masked bytes / half-words too.
- For scatter stores, the solution is trivial (ignore the addresses for the non-stored elements).
Do we need/want to support other sizes than word (i.e. do we need masked versions of sth and stb in addtion to stw)?
Do we need/want to use masking for other instructions than store?
- All instructions that could raise exceptions (or produce "undefined"/NaN results) are candidates, e.g. load (could raise a page fault) and div/mod/fdiv (division by zero).
- If implemented, what should the result be for masked elements? Preserving the old value could mean that we need to read the original value from the register file (or somewhere else), adding register file read ports and forwarding dependencies. Just zero:ing the result is simpler, and probably the right thing (TM) for some instructions (e.g. loads).
- In a serial (as opposed to parallel) hardware vector implementation, skipping elements can gain performance (elements that are not computed are zero-cycle no-ops). One example is the Convex C series machines from the 80's, apparently. On the other hand, adding support for skipping elements in a parallel implementation is more work and does not improve performance.

The text was updated successfully, but these errors were encountered:

mbitsnbites · 2021-01-30T10:18:49Z

Idea: We don't use the folding vector mode for load/store. Can we repurpose it for masking? E.g. sth/m v3, [r7, v1*2]

Only works for gather-scatter though (although stride based load/store can easily be emulated using gather-scatter in combination with LDEA to produce the address stride).

mbitsnbites changed the title ~~Add facilities for masked vector store~~ Add facilities for masked vector load/store Jan 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add facilities for masked vector load/store #108

Add facilities for masked vector load/store #108

mbitsnbites commented Aug 28, 2020 •

edited

mbitsnbites commented Jan 30, 2021 •

edited

Add facilities for masked vector load/store #108

Add facilities for masked vector load/store #108

Comments

mbitsnbites commented Aug 28, 2020 • edited

The mask

The instruction(s)

Outstanding questions

mbitsnbites commented Jan 30, 2021 • edited

mbitsnbites commented Aug 28, 2020 •

edited

mbitsnbites commented Jan 30, 2021 •

edited