You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.
Masking vector store operations is a very useful feature (e.g. think about 3D frustum culling). It is not clear how it should be implemented in the MRISC32 ISA, but using one vector register as a mask register (essentially treat it as a byte select mask) during the store would probably do the trick.
The mask
We should be able to utilize the fact that we already need three vector register file read ports for other instructions (SEL (bitwise select) and FMA (fused multiply-accumulate)), and use the register file read ports as follows:
1R (scalar) for the base address.
1R (vector) for the data to store.
1R (vector, optional) for the address offset (for scatter store).
1R (vector) for the mask register.
The mask register could be interpreted as a byte mask, allowing it to act on half-words and bytes as well as full words. Thus any s[cc][.h|.b] instruction (along with other logical instructions such as and/or/xor) can be used to produce a valid store mask.
The instruction(s)
There are several alternatives for how to encode & interpret the special "store vector register with mask" instruction. For instance:
Dedicate one (or more?) vector register as a mask register, and call it VM. Just like the scalar VL register, it can be used as a regular register by all instructions (and a HW implementation may chose to keep a separate copy of the relevant bits of the VM register in order to avoid using a regular register read port). The special store-with-mask instruction would then implicitly use that register as the mask. E.g:
a. stw/m v6, [r8, #4] ; Store v6 to address r8, with stride 4, and use vm as the mask register
Split the 32 vector registers into 16+16 registers, where one half (e.g. odd registers, or registers v16-v31) are implicitly used as mask register for the store instruction, e.g. as follows:
a. stw/m v6:v7, [r8, #4] ; Store v6 to address r8, with stride 4, and use v7 as the mask register
We could also repurpose the folding vector mode (VM=01) for masking, since folding is not supported by load/store instructions anyway.
Outstanding questions
For stride based stores, should the address be incremented for non-stored elements?
There are probably pros and cons with both variants, but incrementing the address for both stored and non-stored elements is simpler to implement in hardware, and has a nice logic to it - especially if we want to support masked bytes / half-words too.
For scatter stores, the solution is trivial (ignore the addresses for the non-stored elements).
Do we need/want to support other sizes than word (i.e. do we need masked versions of sth and stb in addtion to stw)?
Do we need/want to use masking for other instructions than store?
All instructions that could raise exceptions (or produce "undefined"/NaN results) are candidates, e.g. load (could raise a page fault) and div/mod/fdiv (division by zero).
If implemented, what should the result be for masked elements? Preserving the old value could mean that we need to read the original value from the register file (or somewhere else), adding register file read ports and forwarding dependencies. Just zero:ing the result is simpler, and probably the right thing (TM) for some instructions (e.g. loads).
In a serial (as opposed to parallel) hardware vector implementation, skipping elements can gain performance (elements that are not computed are zero-cycle no-ops). One example is the Convex C series machines from the 80's, apparently. On the other hand, adding support for skipping elements in a parallel implementation is more work and does not improve performance.
The text was updated successfully, but these errors were encountered:
Idea: We don't use the folding vector mode for load/store. Can we repurpose it for masking? E.g. sth/m v3, [r7, v1*2]
Only works for gather-scatter though (although stride based load/store can easily be emulated using gather-scatter in combination with LDEA to produce the address stride).
mbitsnbites
changed the title
Add facilities for masked vector store
Add facilities for masked vector load/store
Jan 30, 2021
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Masking vector store operations is a very useful feature (e.g. think about 3D frustum culling). It is not clear how it should be implemented in the MRISC32 ISA, but using one vector register as a mask register (essentially treat it as a byte select mask) during the store would probably do the trick.
The mask
We should be able to utilize the fact that we already need three vector register file read ports for other instructions (
SEL
(bitwise select) andFMA
(fused multiply-accumulate)), and use the register file read ports as follows:The mask register could be interpreted as a byte mask, allowing it to act on half-words and bytes as well as full words. Thus any
s[cc][.h|.b]
instruction (along with other logical instructions such asand
/or
/xor
) can be used to produce a valid store mask.The instruction(s)
There are several alternatives for how to encode & interpret the special "store vector register with mask" instruction. For instance:
VM
. Just like the scalarVL
register, it can be used as a regular register by all instructions (and a HW implementation may chose to keep a separate copy of the relevant bits of the VM register in order to avoid using a regular register read port). The special store-with-mask instruction would then implicitly use that register as the mask. E.g:a.
stw/m v6, [r8, #4] ; Store v6 to address r8, with stride 4, and use vm as the mask register
a.
stw/m v6:v7, [r8, #4] ; Store v6 to address r8, with stride 4, and use v7 as the mask register
We could also repurpose the folding vector mode (VM=01) for masking, since folding is not supported by load/store instructions anyway.
Outstanding questions
sth
andstb
in addtion tostw
)?The text was updated successfully, but these errors were encountered: