arch-riscv: remove branching from `vset{i}vl{i}` instruction and allow registers `vl` and `vtype` to be renamed #885

saul44203 · 2024-02-21T13:25:28Z

Currently the vset{i}vl{i} family of instructions is branch-like. This works well for some applications, but for others, especially those where the vset{i}vl{i} instruction is inside a loop which changes the vl/vtype state on every iteration, do not perform well due to constant squashing.

The proposal aims to eliminate the branch-like behavior of the vset{i}vl{i} instructions and, instead, set the vl and vtype registers as renameable, allowing for normal data dependencies to be created with subsequent vector instructions in the execution flow.

@adriaarmejach and I discussed two approaches:

One where both registers are treated as integer registers.
One where both registers pertain to a new class of renameable misc registers.

The later approach seems cleaner (new register class addition would be similar to the changes in commit fed81f3) but it involves more overall code compared with the first one.

Open to discussion.

The text was updated successfully, but these errors were encountered:

powerjg · 2024-02-21T19:59:26Z

I'm going to push back on this solution because the spec says that vset{i}vl{i} is a control dependency, not a data dependency. See https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#9-vector-memory-consistency-model

Correct me if I'm not interpreting the spec correctly :)

aarmejach · 2024-02-23T11:59:02Z

Yes, the spec says vl dependencies are treated as control, which allow speculation to happen as long as vl remains unchanged. However, this is not the case in many applications, specially sparse kernels that typically change vl often, which end up performing worse than scalar versions.

Having vl as a data dependency is more restrictive, does not break spec, and would serialize execution always. So some commercial proposals are actually implementing vl dependencies as data dependencies and allow both vtype and vl CSRs to be renamed. Which is bound to perform well in cases where vl stays constant as well as in sparse applications.

I understand that gem5 should be as close to spec as possible, so if this is not something we want to have upstream we will keep it internal.

powerjg · 2024-02-23T17:08:42Z

It's an intriguing option. I also think looking into speculating on vl would also be an interesting direction.

I think I know the answer to this question, but I'll ask anyway. Do you think it's feasible to make it a configuration-time parameter whether vl is treated as a control or data dependence? It would be quite interesting if it was parameterized like whether the LSQs enforce TSO.

The other direction we can discuss is how to make the ISA spec implementation of these instructions flexible enough to enable both "in core" vector operations (i.e., the vector ops are renamed/executed/etc in the same pipeline as all other instructions) and a vector co-processor design. In this case, we could have different implementations that hit different points in the design space.

I think this needs to be a longer conversation. To have the conversation we could go two ways:

We could discuss it at our next developer meeting.
We could go through a proposal/design document phase.

In other words, I would like to discuss the overall goals/requirements for this model, the possible implementations, and the tradeoffs of these implementations. We could do this either synchronously on zoom or asynchronously with a lot more writing.

aarmejach · 2024-03-08T15:59:02Z

Sorry it took a while to get back to you on this one.

I wanted to understand the amount of changes we had to do to see if the parametric solution is feasible. As we both were anticipating, changes in the isa definition side are necessary and making these parametric would not be easy. But will take a look after we have tested the prototype implementation we have.

The second topic is an interesting one. We are currently focused on scaling our simulations and adding other features (that are being upstreamed), but midterm I think this discussion will be of interest. I will try to get onto a developers meeting (if schedule permits), and we can kick start this discussion and move on to focused zoom sessions if necessary.

ivanaamit · 2024-04-04T18:08:05Z

Hi @aarmejach, we have a developer meeting scheduled for next week on April 11th at 4pm UTC. Would you be available to join? If so, how would you prefer to discuss this topic? Are you interested in preparing a presentation with a couple of slides? Thanks.

aarmejach · 2024-04-05T08:37:20Z

hi @ivanaamit, sure, I'll join the meeting. I have not thought much on the "in core" vs. "co-processor" execution model in gem5 to be honest, as it is not something we are considering at the moment.

Regarding this issue, we have implemented a solution in which vset{i}vl{i} instructions set the flag IsQuiesceFetch, which stalls the fetching/decoding of subsequent instructions up until execution of vset{i}vl{i} is completed (does not need to be committed, just complete the execution). At which point the pipeline can continue fetching/decoding with the updated vl and vtype inside the PC state.

We went this path as it required little changes and goes around the issue that vtype is necessary to decode subsequent instructions correctly. So I think it would solve #860

ivanaamit added the arch-riscv The RISC-V ISA label Mar 5, 2024

ivanaamit mentioned this issue Mar 8, 2024

RVV instruction vmem faults caused by incorrect vl #860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch-riscv: remove branching from `vset{i}vl{i}` instruction and allow registers `vl` and `vtype` to be renamed #885

arch-riscv: remove branching from `vset{i}vl{i}` instruction and allow registers `vl` and `vtype` to be renamed #885

saul44203 commented Feb 21, 2024 •

edited

powerjg commented Feb 21, 2024

aarmejach commented Feb 23, 2024

powerjg commented Feb 23, 2024

aarmejach commented Mar 8, 2024

ivanaamit commented Apr 4, 2024

aarmejach commented Apr 5, 2024

arch-riscv: remove branching from vset{i}vl{i} instruction and allow registers vl and vtype to be renamed #885

arch-riscv: remove branching from vset{i}vl{i} instruction and allow registers vl and vtype to be renamed #885

Comments

saul44203 commented Feb 21, 2024 • edited

powerjg commented Feb 21, 2024

aarmejach commented Feb 23, 2024

powerjg commented Feb 23, 2024

aarmejach commented Mar 8, 2024

ivanaamit commented Apr 4, 2024

aarmejach commented Apr 5, 2024

arch-riscv: remove branching from `vset{i}vl{i}` instruction and allow registers `vl` and `vtype` to be renamed #885

arch-riscv: remove branching from `vset{i}vl{i}` instruction and allow registers `vl` and `vtype` to be renamed #885

saul44203 commented Feb 21, 2024 •

edited