Jiahan CS 6120 Final Project Blog #423

jiahanxie353 · 2023-12-11T18:52:18Z

Closes #410

In this final project, I worked with @michaelmaitland on supporting LLVM GlobalISel for the RISC-V vector extension on part of the vectorized ALU operations.

michaelmaitland · 2023-12-11T19:08:16Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+# Introduction
+
+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector registers have flexible widths, VLEN, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). The increasing vector lengths can pose a challenge to the traditional SIMD extensions as they have to address compatability and support all existing fixed size vector lengths in their ISAs. On the contrary, with the vector lengths agnostic principles, binary code generated by RISC-V assembly is automatically portable between different CPUs.


RISC-V folks more commonly refer to VL as the unique part.

The vl register holds an unsigned integer specifying the number of elements to be updated with results from a vector instruction

This is in contrast to many SIMD approaches where the instruction pneumonic and different sizes of register impose a fixed number of elements processed. For example, the ARM Neon VADD.

On a given hardware implementation, the widths of registers in RISCV are flexible due to LMUL. This is also unique to RISCV.

The VLEN however, is fixed for that hardware implementation.

michaelmaitland · 2023-12-11T19:17:36Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+# Introduction
+
+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector registers have flexible widths, VLEN, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). The increasing vector lengths can pose a challenge to the traditional SIMD extensions as they have to address compatability and support all existing fixed size vector lengths in their ISAs. On the contrary, with the vector lengths agnostic principles, binary code generated by RISC-V assembly is automatically portable between different CPUs.


If a processor implements RISC-V V extension, the vector code is portable to other CPU that support the V extension. However, if a processor implements say the ARM Neon extension, then the NEON code is portable to other ARM processors that support the Neon extension.

The real power of vector length agnostic is when it comes to loops. Take the following example:

for (i = 0; i < N; i++) A[i] = B[i] + C[i]

On RISC-V, we can generate the following (pseudo) code

ph: i = 0 body: vsetvli (N-i) // set the VL as N-i, or VLMAX if N-i is too large for the hardware to process vadd.vv a_ptr, b_ptr, c_ptr i += VL // Increment I by the number of elements processed in this iteration branch_if_done exit, body // decide whether to exit loop exit: ret

On a SIMD architecture, it is more complicated. You need to have two versions of the loop: the vector loop, and a scalar remainder loop. If N-i is smaller than the number of elements processed in the vector loop, then the scalar remainder loop needs to be executed. What is the best size for the number of elements to be processed in the vector loop? It depends on whether N will often be large or small. You may not know this ahead of time.

michaelmaitland · 2023-12-11T19:21:26Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+- `SEW`: Selected Element Width (in bits), set dynamically by the programmers. It sets the width/length of a single element in a vector element. Each vector element can compose multiple single element.
+- `VLEN`: Vector register LENgth (in bits). The number of bits in a single vector register. It is hardware dependent. 
+- `VL`: Vector element Length (in bits) that the programmers actually deal with, which can be treated as the vector operation building blocks. It defines how many elements the vector operations will execute.
+- `LMUL`: The vector Length MULtiplier. It is used for grouping vector registers. It is a power of 2 and it ranges from 1/8 to 8. For instance, when `LMUL=8`, only `v0`, `v8`, `v16`, and `v24` indices are allowed to used, as for example, group `v8` encodes 8 vector elments `v8v9`...`v15`. Note it can also be fraction numbers because sometimes we want to use only parts of the vector registers.


Technically the ABI imposes the fact that v0, v8, v16 can be used, not the RISC-V Vector spec. You could have used v1, v9, v17. The reason the ABI does it the way it does is so that v17-v31 can be used in the caller/callee save scheme. The important part here is that on LMUL 8, the register vN acts as a grouping of vN, vN+1, ..., vN+7 registers. That forces us to partition what registers can be passed and preserve the expected semantics of the instruction.

@michaelmaitland, just to clarify: one important constraint here (aside from the ABI) is that, when LMUL=8, you are not allowed to use v1, v2, v9 at all, right? That is, the only "allowed" registers are at indices $\mathit{LMUL} \cdot k$ for natural numbers $k$? (Otherwise accessing these unaligned registers is an error?)

That is correct. According to the spec:

When LMUL=2, the vector register group contains vector register v n and vector register v n+1, providing twice the vector length in bits. Instructions specifying an LMUL=2 vector register group with an odd-numbered vector register are reserved. When LMUL=4, the vector register group contains four vector registers, and instructions specifying an LMUL=4 vector register group using vector register numbers that are not multiples of four are reserved. When LMUL=8, the vector register group contains eight vector registers, and instructions specifying an LMUL=8 vector register group using register numbers that are not multiples of eight are reserved.

I think my statement above that You could have used v1, v9, v17 is incorrect by this part of the spec.

Got it; thanks!

content/blog/2023-12-11-rvv-llvm-gisel/index.md

michaelmaitland · 2023-12-11T19:41:02Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+```
+The complete chart can be found in [this `RISCV/RISCVRegisterInfo.td` file](https://github.com/llvm/llvm-project/blob/75d6795e420274346b14aca8b6bd49bfe6030eeb/llvm/lib/Target/RISCV/RISCVRegisterInfo.td). And note that `MF` stands for fractional `LMUL` and `M`s are integer `LMUL`s.
+
+Some values are `None` because currently RISC-V vectors assume `VLEN=64`. Take the combination (`MF8`, `i16`) as an example. If we were to write it in terms of LLVM scalable vectors, it would be `nx1/2i16` ((64 x 1/8) / 16 = 1/2), which is illegal. Now consider a legal (`LMUL`, `SEW`) combination: (`i32`, `M4`). Since `VLEN` = 64 and `SEW` = 32, there are 2 basic block elements in a single vector element. And since the grouping factor is 4, there are 2*4 = 8 multiples of elements, hence `nxv8i32 == <vscale x 8 x i32>`.


ditto about basic block being confusing here.

content/blog/2023-12-11-rvv-llvm-gisel/index.md

michaelmaitland · 2023-12-11T19:46:10Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+Let's recall the result produced by the Register Bank Select pass: `%vc:vrb(<vscale x 1 x i8>) = G_ADD %va:vrb(<vscale x 1 x i8>), %vb:vrb(<vscale x 1 x i8>)`. We'd like to use the corresponding MIR of RISC-V vector add instruction to replace the generic add `G_ADD` instruction. Please note that I said "the corresponding MIR" because we will not be generating the actual RISC-V `vadd` or `vsetvli` in the current pass. It's because the process of instruction selection involves transforming code into target-specific MIR/machine instructions. Later down the pipeline, the `RISCVInsertVSETVLI` function, for example, will executed. Additionally, the `RISCVAsmPrinter` will translate MIR into MCInst at later stage, representing the final assembly language form. With that being said, what we actually want to get out of instruction selection pass is in this form: `%vc:vr = PseudoVADD_VV_MF8 %va, %vb, -1, 3 /* e8 */, 3 /* ta, ma */`, where `PseudoVADD_VV_MF8` is vector instruction pseudos for vector-vector add with `LMUL` = 1/8, the position where -1 stands is for the `VL` operand and -1 means `VLMAX`, the first 3 stands for `SEW` as log2(8) = 3, and the second 3 is the encoding for the policy tail agnostic and mask agnostic. RISC-V vector instruction pseudos in LLVM are essentially used for efficiently handling the complex, `vtype`-dependent behavior of vector instructions, such as in register allocation.
+
+Implementation-wise, the [`select` function](https://llvm.org/doxygen/classllvm_1_1InstructionSelector.html#a50058a922d4f75ed765c34742c5066c6) is invoked, which in turns call [the corresponding RISC-V `selectImpl` function](https://github.com/llvm/llvm-project/blob/d96f46dd20157be9c11e16d8bdd3ebf900df41fc/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp#L56). To achieve this final phase, there are essentially four steps to take. First is to identify the vectorized opcode/gMIR; then we create the lowered version of that gMIR using the vector instruction pseudos; and we need to erase the old instruction once the lowered version has been picked; finally we choose a register from the register bank. For this final phase in GlobalISel, LLVM TableGen might generate `selectImpl` and we can use it out-of-the-box; otherwise, we need to implement extra logics to customize the selection pass outlined above. Luckily, TableGen does pick up and we only need implement some helper functions to mesh everything together.


We don't actually choose a register from the register bank. We already have the register. In the case above, its the virtual register %vc. What we're really doing is constraining the type of this virtual register so that register allocation has enough information to convert the virtual register into a physical register.

michaelmaitland · 2023-12-11T19:53:23Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+Let's recall the result produced by the Register Bank Select pass: `%vc:vrb(<vscale x 1 x i8>) = G_ADD %va:vrb(<vscale x 1 x i8>), %vb:vrb(<vscale x 1 x i8>)`. We'd like to use the corresponding MIR of RISC-V vector add instruction to replace the generic add `G_ADD` instruction. Please note that I said "the corresponding MIR" because we will not be generating the actual RISC-V `vadd` or `vsetvli` in the current pass. It's because the process of instruction selection involves transforming code into target-specific MIR/machine instructions. Later down the pipeline, the `RISCVInsertVSETVLI` function, for example, will executed. Additionally, the `RISCVAsmPrinter` will translate MIR into MCInst at later stage, representing the final assembly language form. With that being said, what we actually want to get out of instruction selection pass is in this form: `%vc:vr = PseudoVADD_VV_MF8 %va, %vb, -1, 3 /* e8 */, 3 /* ta, ma */`, where `PseudoVADD_VV_MF8` is vector instruction pseudos for vector-vector add with `LMUL` = 1/8, the position where -1 stands is for the `VL` operand and -1 means `VLMAX`, the first 3 stands for `SEW` as log2(8) = 3, and the second 3 is the encoding for the policy tail agnostic and mask agnostic. RISC-V vector instruction pseudos in LLVM are essentially used for efficiently handling the complex, `vtype`-dependent behavior of vector instructions, such as in register allocation.
+
+Implementation-wise, the [`select` function](https://llvm.org/doxygen/classllvm_1_1InstructionSelector.html#a50058a922d4f75ed765c34742c5066c6) is invoked, which in turns call [the corresponding RISC-V `selectImpl` function](https://github.com/llvm/llvm-project/blob/d96f46dd20157be9c11e16d8bdd3ebf900df41fc/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp#L56). To achieve this final phase, there are essentially four steps to take. First is to identify the vectorized opcode/gMIR; then we create the lowered version of that gMIR using the vector instruction pseudos; and we need to erase the old instruction once the lowered version has been picked; finally we choose a register from the register bank. For this final phase in GlobalISel, LLVM TableGen might generate `selectImpl` and we can use it out-of-the-box; otherwise, we need to implement extra logics to customize the selection pass outlined above. Luckily, TableGen does pick up and we only need implement some helper functions to mesh everything together.


LLVM TableGen always generates selectImpl. The "might" part has to do with whether there are TableGen patterns that exist to help us go from MachineInstr with GlobalIsel GenericOpcode -> SelectionDAG SDNode with SelectionDAG ISD opcode -> MachineInstr with RISC-V opcode.

SelectionDAG implements the ISD -> MachineInstr with RISC-V opcode transformation. In this case G_ADD maps pretty well onto ISD::ADD. So GlobalISel defines an equivalence.

As we spoke before, it would be better if we didn't go through SelectionDAG at all, but this approach makes it easy for architectures to onboard onto GISel, and hopefully in the future we can remove going through SelectionDAG entirely.

michaelmaitland · 2023-12-11T19:53:44Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+# Were You Successful?
+
+This project is a success and we have become one of the first developers to support GlobalISel for the RISC-V vector extension.


And the first developers to support scalable vectors in GISEL for any target.

sampsyo

Thanks for the very detailed and technical writeup! This sounds like a tremendous amount of work, and I'm glad you were able to make some progress. I have marked a few places where the post could be a tiny bit clearer, especially for outsiders who do not know that much about RVV.

One high-level ingredient I'd be interested in, if you feel like adding it (not a requirement): what fundamentally made this project interesting/difficult? What I mean is that you have identified several challenges that come from two sources:

RVV stuff is hard to think about
GlobalISel is somewhat complicated to extend

…but is there anything specific to the intersection between those two things? Like, is there anything about RVV in particular that makes global instruction selection harder? Or is this project approximately equal to the sum of its parts?

sampsyo · 2023-12-17T18:27:09Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector registers have flexible widths, VLEN, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). The increasing vector lengths can pose a challenge to the traditional SIMD extensions as they have to address compatability and support all existing fixed size vector lengths in their ISAs. On the contrary, with the vector lengths agnostic principles, binary code generated by RISC-V assembly is automatically portable between different CPUs.
+
+# What Was the Goal?


You don't need to include the bullet points from the syllabus as section headers in your blog post. This would flow a bit more naturally if you just transitioned directly into a recap of the goal as part of the intro. "In this project, our goal was to…"

sampsyo · 2023-12-17T18:28:05Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector registers have flexible widths, VLEN, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). The increasing vector lengths can pose a challenge to the traditional SIMD extensions as they have to address compatability and support all existing fixed size vector lengths in their ISAs. On the contrary, with the vector lengths agnostic principles, binary code generated by RISC-V assembly is automatically portable between different CPUs.
+
+# What Was the Goal?
+The goal was to support LLVM Global Instruction Selection for RISC-V Vector Extensions on some ALU operations, such as `vadd`, `vsub`, `vand`, `vor`, and `vxor`.


Maybe it would be helpful to include a few words (not even a full sentence) here on what global isel is? I know you get into detail below, but a tiny overview here might help people understand the post's context.

sampsyo · 2023-12-17T18:28:42Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+The most important parameters are undoubtly `SEW`, `VLEN`, `VL`, and `LMUL`; and one of the most interesting and powerful instructions is `vset{i}vl{i}`.
+
+Let's begin with the crucial parameters:
+- `SEW`: Selected Element Width (in bits), set dynamically by the programmers. It sets the width/length of a single element in a vector element. Each vector element can compose multiple single element.


I'm not sure what this means: "Each vector element can compose multiple single element." These sound like the same thing?

sampsyo · 2023-12-17T18:30:37Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+Let's begin with the crucial parameters:
+- `SEW`: Selected Element Width (in bits), set dynamically by the programmers. It sets the width/length of a single element in a vector element. Each vector element can compose multiple single element.
+- `VLEN`: Vector register LENgth (in bits). The number of bits in a single vector register. It is hardware dependent. 


You might want to clarify "it is hardware dependent" here… earlier, you have said that this is flexible ("its vector registers have flexible widths, VLEN"), but here it seems to imply this is a hardware parameter. Maybe clarify which one is right?

sampsyo · 2023-12-17T18:31:35Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+Let's begin with the crucial parameters:
+- `SEW`: Selected Element Width (in bits), set dynamically by the programmers. It sets the width/length of a single element in a vector element. Each vector element can compose multiple single element.
+- `VLEN`: Vector register LENgth (in bits). The number of bits in a single vector register. It is hardware dependent. 
+- `VL`: Vector element Length (in bits) that the programmers actually deal with, which can be treated as the vector operation building blocks. It defines how many elements the vector operations will execute.


Can you clarify whether this must be less than or equal to VLEN?

Also, this bullet point seems to say that it is both in bits and in elements ("how many elements"). Can you clarify which is the actual quantity and which is implied?

content/blog/2023-12-11-rvv-llvm-gisel/index.md

sampsyo · 2023-12-17T18:57:30Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+
+
+# What Were the Hardest Parts?


One more chance to retitle your sections to be more meaningful to the outside world.

sampsyo · 2023-12-17T18:57:55Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+# What Were the Hardest Parts?
+
+Definitely learning the whole LLVM and its GlobalISel infrastructure, and it was also hard to understand the vector length agnostic features/instructions in the RISC-V vector extension.


Can you make this a complete sentence? Again, imagine that the reader is someone who cares about the topic but does not care that this was a 6120 course project.

content/blog/2023-12-11-rvv-llvm-gisel/index.md

sampsyo · 2023-12-17T18:58:47Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+Learning the RISC-V vector extension was also a headache at the beginning because I had to figure out the difference between RISC-V vector extension with standard SIMD vector instructions. Learning the semantic meaning of `vsetvli`, differentiating the concepts of `ELEN`, `VLEN`, and how `SEW`, `LMUL`, `VLMAX` come into play was also confusing.
+
+# Were You Successful?


Maybe this doesn't need a separate section heading from the above discussion?

… power of vl agnostic over traditional simd is when it comes loops

…ed, not the rvv spec

…ate vlmax

… compared to VLEN

jiahanxie353 · 2024-01-08T06:48:05Z

Thanks for all the detailed and constructive feedback @sampsyo @michaelmaitland !

I tried my best to answer the questions. And please let me know if there's anything needed to be further supported/clarified!

michaelmaitland

A small round with minor verbiage changes to be more specific. I think this is the last round of changes I have before approval.

michaelmaitland · 2024-01-16T20:42:23Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+# Introduction
+
+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector instructions can deal with flexible vector lengths, VL, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). Traditional SIMD extensions with fixed vector lengths face challenges when dealing with the changing data sizes. They must maintain compatibility and support all existing fixed-size vector lengths in their instruction set architectures. This often leads to inefficiencies, especially in loop operations where the data size/loop stride may not align perfectly with the fixed vector size, necessitating additional scalar processing for the remaining elements. And the most suitable size for the number of elements to be processed in the vector loop is hard to decide ahead of time. In contrast, the RISC-V Vector Extension eliminate this concern with its vector length agnostic principle. Particularly in loop scenarios, the RISC-V's ability to adaptively handle varying data sizes stands out. For instance, in a simple loop adding two arrays, the RISC-V can dynamically adjust the vector length for each iteration by dynamically setting the vector length. This means it can process as many elements as possible in each pass, depending on the hardware capabilities and the remaining data. This adaptive approach really simplifies the code by eliminating the need for separate scalar loops for the leftover elements.


nit: often -> may

michaelmaitland · 2024-01-16T20:43:24Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+# Introduction
+
+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector instructions can deal with flexible vector lengths, VL, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). Traditional SIMD extensions with fixed vector lengths face challenges when dealing with the changing data sizes. They must maintain compatibility and support all existing fixed-size vector lengths in their instruction set architectures. This often leads to inefficiencies, especially in loop operations where the data size/loop stride may not align perfectly with the fixed vector size, necessitating additional scalar processing for the remaining elements. And the most suitable size for the number of elements to be processed in the vector loop is hard to decide ahead of time. In contrast, the RISC-V Vector Extension eliminate this concern with its vector length agnostic principle. Particularly in loop scenarios, the RISC-V's ability to adaptively handle varying data sizes stands out. For instance, in a simple loop adding two arrays, the RISC-V can dynamically adjust the vector length for each iteration by dynamically setting the vector length. This means it can process as many elements as possible in each pass, depending on the hardware capabilities and the remaining data. This adaptive approach really simplifies the code by eliminating the need for separate scalar loops for the leftover elements.


the RISC-V -> the RISC-V program

michaelmaitland · 2024-01-16T20:44:49Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector instructions can deal with flexible vector lengths, VL, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). Traditional SIMD extensions with fixed vector lengths face challenges when dealing with the changing data sizes. They must maintain compatibility and support all existing fixed-size vector lengths in their instruction set architectures. This often leads to inefficiencies, especially in loop operations where the data size/loop stride may not align perfectly with the fixed vector size, necessitating additional scalar processing for the remaining elements. And the most suitable size for the number of elements to be processed in the vector loop is hard to decide ahead of time. In contrast, the RISC-V Vector Extension eliminate this concern with its vector length agnostic principle. Particularly in loop scenarios, the RISC-V's ability to adaptively handle varying data sizes stands out. For instance, in a simple loop adding two arrays, the RISC-V can dynamically adjust the vector length for each iteration by dynamically setting the vector length. This means it can process as many elements as possible in each pass, depending on the hardware capabilities and the remaining data. This adaptive approach really simplifies the code by eliminating the need for separate scalar loops for the leftover elements.
+
+In this project, our goal was to support LLVM Global Instruction Selection (GlobalISel), a framework that operates on whole function for instruction selection, for the RISC-V Vector Extension on some ALU operations, such as `vadd`, `vsub`, `vand`, `vor`, and `vxor`. Apart from adding support for RISC-V vector types and operations for GlobalISel by going down GlobalISel's pipeline, it's a challenge to bridge the LLVM world (concepts like scalable vector) and the RISC-V world (concepts like vector length and vector register grouping factor) together. And we will showcase how we address the challenge in the following sections.


function -> functions

michaelmaitland · 2024-01-16T20:47:26Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+The most important parameters are undoubtly `SEW`, `VLEN`, `VL`, and `LMUL`; and one of the most interesting and powerful instructions is `vset{i}vl{i}`.
+
+Let's begin with the crucial parameters:
+- `SEW`: Selected Element Width (in bits), set dynamically by the programmers. It sets the width/length of a single element in a vector element/register. Each vector element can compose `VLEN`/`SEW` single elements.


Each vector element can compose `VLEN`/`SEW` single elements. -> Each vector can contain `VLEN`/`SEW elements.

michaelmaitland · 2024-01-16T20:47:57Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+
+Let's begin with the crucial parameters:
+- `SEW`: Selected Element Width (in bits), set dynamically by the programmers. It sets the width/length of a single element in a vector element/register. Each vector element can compose `VLEN`/`SEW` single elements.
+- `ELEN`: The maximum size in bits of a vector element that any operation can produce or consume.


ELEN is missing from list on line 25 above

michaelmaitland · 2024-01-16T20:50:07Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+- `LMUL`: The vector Length MULtiplier. It is used for grouping vector registers. It is a power of 2 and it ranges from 1/8 to 8. For instance, when `LMUL=8`, the ABI imposes that only `v0`, `v8`, `v16`, and `v24` indices are allowed to used, as for example, group `v8` encodes 8 vector elments `v8v9`...`v15`. Note it can also be fraction numbers because sometimes we want to use only parts of the vector registers.
+
+After introducing these basic and the most important parameters, there are still two paramters we will be dealing with, `AVL` and `VLMAX`:
+- `AVL`: Application Vector Length. The application specifies the total number of elements to be processed as a candidate for `VL`.


You introduce AVL above when you talk about VL. Maybe add a note above to (see below about AVL) so the reader knows you will explain AVL.

michaelmaitland · 2024-01-16T20:55:36Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+# A Primer on LLVM Global Instruction Selection
+LLVM Global Instruction Selection ([GlobalISel](https://llvm.org/docs/GlobalISel/index.html)) is a framework that provides a set of reusable passes and utilities for instruction selection — translation from LLVM IR to target-specific Machine IR (MIR). It is "global" in the sense that it operates on the whole function rather than a single basic block. 
+
+GlobalISel is intended to be a replacement for [SelectionDAG](https://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags) and [FastISel](https://llvm.org/doxygen/classllvm_1_1FastISel.html), to solve performance, granularity, and modularity problems. GlobalISel does not need to introduce a new dedicated IR as in SelectionDAG so GlobalISel can provide faster code generation; GlobalISel operates on a function, whereas SelectionDAG only considers a basic block, losing some global optimization opportunities; in addition, GlobalISel enables more code reuse for instruction selection for different targets.


GlobalISel does not need to introduce a new dedicated IR as -> GlobalISel does not introduce a new dedicated IR since it works on the already existing and documented MIR, compared to SelectionDAG which has its own ISD representation.

Also, are you sure that not having a separate IR is a reason for the faster code generation?

michaelmaitland · 2024-01-16T20:56:16Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+# A Primer on LLVM Global Instruction Selection
+LLVM Global Instruction Selection ([GlobalISel](https://llvm.org/docs/GlobalISel/index.html)) is a framework that provides a set of reusable passes and utilities for instruction selection — translation from LLVM IR to target-specific Machine IR (MIR). It is "global" in the sense that it operates on the whole function rather than a single basic block. 
+
+GlobalISel is intended to be a replacement for [SelectionDAG](https://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags) and [FastISel](https://llvm.org/doxygen/classllvm_1_1FastISel.html), to solve performance, granularity, and modularity problems. GlobalISel does not need to introduce a new dedicated IR as in SelectionDAG so GlobalISel can provide faster code generation; GlobalISel operates on a function, whereas SelectionDAG only considers a basic block, losing some global optimization opportunities; in addition, GlobalISel enables more code reuse for instruction selection for different targets.


in addition, GlobalISel enables more code reuse for instruction selection for different targets.

Can you cite your source here?

michaelmaitland · 2024-01-16T21:00:47Z

content/blog/2023-12-11-rvv-llvm-gisel/index.md

+```
+The complete chart can be found in [this `RISCV/RISCVRegisterInfo.td` file](https://github.com/llvm/llvm-project/blob/75d6795e420274346b14aca8b6bd49bfe6030eeb/llvm/lib/Target/RISCV/RISCVRegisterInfo.td). And note that `MF` stands for fractional `LMUL` and `M`s are integer `LMUL`s.
+
+Some values are `None` because currently the LLVM community assumes the RISC-V vectors to have `VLEN=64`. Take the combination (`1/8`, `i16`) as an example. If we were to write it in terms of LLVM scalable vectors, it would be `nx1/2i16` ((64 x 1/8) / 16 = 1/2), which is illegal. Now consider a legal (`LMUL`, `SEW`) combination: (`i32`, `4`). Since `VLEN` = 64 and `SEW` = 32, there are 64/32 = 2 elements that can fit in a single vector element. And since the grouping factor is 4, there are 2*4 = 8 multiples of elements, hence `nxv8i32 == <vscale x 8 x i32>`.


I think some values are N/A, not None

first draft of the report for the final project

6e6e912

michaelmaitland reviewed Dec 11, 2023

View reviewed changes

sampsyo requested changes Dec 17, 2023

View reviewed changes

jiahanxie353 added 18 commits January 7, 2024 09:43

fix some typo and wording

67501e5

correct the real unique part of riscv v extension; emphasize the real…

07a249b

… power of vl agnostic over traditional simd is when it comes loops

qualify that it's the ABI imposes the fact that v0, v8, v16 can be us…

b16530f

…ed, not the rvv spec

reword vector 'basic block' to simply just vector 'element'; recalcul…

f7ed8e4

…ate vlmax

change VF to M for LLVM scalable vectors

09890b9

clarify that lowerCall is not implemented yet

0ca7820

rephrase about G_ADD legality and 'basic block'

92b40b2

PseudoVADD_VV_MF8 is RISCV specific

fae32ac

rephrase the instruction selection implementation part

40d4edf

provide an overview for globalisel

e20aa27

clarify on the vector element and single element part

b34d68f

clarify on the VL part regarding its actual quantity and its quantity…

1ceb011

… compared to VLEN

include a one-sentence summary of combiner

b45a98f

clarify on the llvm scalable vectors vscale part

e0f0a96

add more clarifications on the elen and vlen part

54b8e76

clarify on the elen 32/64 part; rephrase 'what were the hardest parts'

16eb5bf

retitle some sections

87e85ed

add discussion regarding what makes this project interesting/challenging

218544b

michaelmaitland reviewed Jan 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jiahan CS 6120 Final Project Blog #423

Jiahan CS 6120 Final Project Blog #423

jiahanxie353 commented Dec 11, 2023

michaelmaitland Dec 11, 2023

michaelmaitland Dec 11, 2023

michaelmaitland Dec 11, 2023

sampsyo Dec 17, 2023

michaelmaitland Jan 2, 2024 •

edited

sampsyo Jan 3, 2024

michaelmaitland Dec 11, 2023

michaelmaitland Dec 11, 2023

michaelmaitland Dec 11, 2023

michaelmaitland Dec 11, 2023

sampsyo left a comment

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

sampsyo Dec 17, 2023

jiahanxie353 commented Jan 8, 2024

michaelmaitland left a comment

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024

michaelmaitland Jan 16, 2024


		# Introduction

		The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector registers have flexible widths, VLEN, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). The increasing vector lengths can pose a challenge to the traditional SIMD extensions as they have to address compatability and support all existing fixed size vector lengths in their ISAs. On the contrary, with the vector lengths agnostic principles, binary code generated by RISC-V assembly is automatically portable between different CPUs.


		Let's recall the result produced by the Register Bank Select pass: `%vc:vrb(<vscale x 1 x i8>) = G_ADD %va:vrb(<vscale x 1 x i8>), %vb:vrb(<vscale x 1 x i8>)`. We'd like to use the corresponding MIR of RISC-V vector add instruction to replace the generic add `G_ADD` instruction. Please note that I said "the corresponding MIR" because we will not be generating the actual RISC-V `vadd` or `vsetvli` in the current pass. It's because the process of instruction selection involves transforming code into target-specific MIR/machine instructions. Later down the pipeline, the `RISCVInsertVSETVLI` function, for example, will executed. Additionally, the `RISCVAsmPrinter` will translate MIR into MCInst at later stage, representing the final assembly language form. With that being said, what we actually want to get out of instruction selection pass is in this form: `%vc:vr = PseudoVADD_VV_MF8 %va, %vb, -1, 3 /* e8 /, 3 / ta, ma */`, where `PseudoVADD_VV_MF8` is vector instruction pseudos for vector-vector add with `LMUL` = 1/8, the position where -1 stands is for the `VL` operand and -1 means `VLMAX`, the first 3 stands for `SEW` as log2(8) = 3, and the second 3 is the encoding for the policy tail agnostic and mask agnostic. RISC-V vector instruction pseudos in LLVM are essentially used for efficiently handling the complex, `vtype`-dependent behavior of vector instructions, such as in register allocation.

		Implementation-wise, the [`select` function](https://llvm.org/doxygen/classllvm_1_1InstructionSelector.html#a50058a922d4f75ed765c34742c5066c6) is invoked, which in turns call [the corresponding RISC-V `selectImpl` function](https://github.com/llvm/llvm-project/blob/d96f46dd20157be9c11e16d8bdd3ebf900df41fc/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp#L56). To achieve this final phase, there are essentially four steps to take. First is to identify the vectorized opcode/gMIR; then we create the lowered version of that gMIR using the vector instruction pseudos; and we need to erase the old instruction once the lowered version has been picked; finally we choose a register from the register bank. For this final phase in GlobalISel, LLVM TableGen might generate `selectImpl` and we can use it out-of-the-box; otherwise, we need to implement extra logics to customize the selection pass outlined above. Luckily, TableGen does pick up and we only need implement some helper functions to mesh everything together.


		# Were You Successful?

		This project is a success and we have become one of the first developers to support GlobalISel for the RISC-V vector extension.


		The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector registers have flexible widths, VLEN, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). The increasing vector lengths can pose a challenge to the traditional SIMD extensions as they have to address compatability and support all existing fixed size vector lengths in their ISAs. On the contrary, with the vector lengths agnostic principles, binary code generated by RISC-V assembly is automatically portable between different CPUs.

		# What Was the Goal?


		# What Were the Hardest Parts?

		Definitely learning the whole LLVM and its GlobalISel infrastructure, and it was also hard to understand the vector length agnostic features/instructions in the RISC-V vector extension.


		Learning the RISC-V vector extension was also a headache at the beginning because I had to figure out the difference between RISC-V vector extension with standard SIMD vector instructions. Learning the semantic meaning of `vsetvli`, differentiating the concepts of `ELEN`, `VLEN`, and how `SEW`, `LMUL`, `VLMAX` come into play was also confusing.

		# Were You Successful?


		The open [RISC-V instruction set architecture (ISA)](https://riscv.org/technical/specifications/) has an interesting extension, [the RISC-V "V" Vector Extension](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc). The unique part of RISC-V Vector Extension is that its vector instructions can deal with flexible vector lengths, VL, which makes programming in RISC-V Vector Extension agnostic to the vector register sizes. This feature really distinguishes the RISC-V Vector Extension from the traditional SIMD extensions, such as [x86 SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) (with a fixed size 128-bit vector length)/[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) (256-bit), and [Arm NEON](https://developer.arm.com/Architectures/Neon) (128-bit). Traditional SIMD extensions with fixed vector lengths face challenges when dealing with the changing data sizes. They must maintain compatibility and support all existing fixed-size vector lengths in their instruction set architectures. This often leads to inefficiencies, especially in loop operations where the data size/loop stride may not align perfectly with the fixed vector size, necessitating additional scalar processing for the remaining elements. And the most suitable size for the number of elements to be processed in the vector loop is hard to decide ahead of time. In contrast, the RISC-V Vector Extension eliminate this concern with its vector length agnostic principle. Particularly in loop scenarios, the RISC-V's ability to adaptively handle varying data sizes stands out. For instance, in a simple loop adding two arrays, the RISC-V can dynamically adjust the vector length for each iteration by dynamically setting the vector length. This means it can process as many elements as possible in each pass, depending on the hardware capabilities and the remaining data. This adaptive approach really simplifies the code by eliminating the need for separate scalar loops for the leftover elements.

		In this project, our goal was to support LLVM Global Instruction Selection (GlobalISel), a framework that operates on whole function for instruction selection, for the RISC-V Vector Extension on some ALU operations, such as `vadd`, `vsub`, `vand`, `vor`, and `vxor`. Apart from adding support for RISC-V vector types and operations for GlobalISel by going down GlobalISel's pipeline, it's a challenge to bridge the LLVM world (concepts like scalable vector) and the RISC-V world (concepts like vector length and vector register grouping factor) together. And we will showcase how we address the challenge in the following sections.

Jiahan CS 6120 Final Project Blog #423

Are you sure you want to change the base?

Jiahan CS 6120 Final Project Blog #423

Conversation

jiahanxie353 commented Dec 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelmaitland Jan 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sampsyo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiahanxie353 commented Jan 8, 2024

michaelmaitland left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelmaitland Jan 2, 2024 •

edited