



XXXX



University of Thessaly  
Department of Electrical and Computer Engineering

# RISC - V VECTOR EXTENSION (RVV)

Presented by: Charalampos Zachariadis,  
Filippos Markovitsis,  
Stefanos Ziakas,  
Eleni Athanailidi



XXXX

ECE338 – Parallel Computer Architecture





# CONTENTS

01

Project Aim

02

Terminology

03

Design Acceptances

04

Instructions

05

Hardware Implementation

06

Integration

07

Testing & Validation

08

Future Work & Evaluation





## PROJECT AIM

.....



- Explore **vector instructions** for **RISC-V** architectures.
- Implement hardware to execute custom instructions using **Verilog**.
- **Validate** our implementation.

**Ultimate Goal:** A red target icon with an arrow hitting the bullseye.

**Integrate** our designed hardware to an existing RISC-V implementation used currently only for scalar data.

.....



x x x x





# TERMINOLOGY

Vector Register

A register that holds multiple elements with varying size (e.g., 8-bit, 16-bit, 32-bit, 64-bit)

VLEN

The length of the vector registers in bits

VL

Determines the number of elements a vector instruction will operate on during a single execution.



x x x x





# TERMINOLOGY



AVL

Indicates the vector length (VL) the program wants to use  
( $vl \leq avl \leq 2 \cdot vlmax$ )

SEW (Standard Element Width)

Specifies the width of individual elements in the vector register (e.g., 8-bit, 16-bit, 32-bit, 64-bit)

LMUL (Length Multiplier)

Determines the grouping of vector registers to allow different vector lengths

>>>





# DESIGN ACCEPTANCES



Minimal usage of **CSR** registers



Use only the CSR registers  
needed to our instruction set  
(vtype, VL, AVL)





# DESIGN ACCEPTANCES



Minimal usage of **CSR** registers



Use only the CSR registers  
needed to our instruction set  
(vtype, VL, AVL)





# DESIGN ACCEPTANCES

Minimal usage of **CSR** registers



The are accessed only in  
the vsetivli instruction  
**(no need for addresses)**





# DESIGN ACCEPTANCES



- vsetvli, vsetvl instruction  
not supported



- AVL is set from the uimm  
from vsetivli (5 bits)



x x x x





# DESIGN ACCEPTANCES



- truncated vtype register



xxxx





# DESIGN ACCEPTANCES



- Masking Exclusively Done with zeros

SEW = 16

SEW = 16

SEW = 16

SEW = 16



LMUL = 1, VL = 3



x x x x





# DESIGN ACCEPTANCES



- Masking Exclusively Done with zeros

SEW = 16

rvs

SEW = 16

rvs

SEW = 16

rvs

SEW = 16

0000

LMUL = 1, VL = 3



x x x x





# INSTRUCTIONS



- **add, sub, mul, and, or, xor** : Arithmetic and Logical vv, vx, vi operations
- **vsetivli (rd, uimm, vtype)**: Sets the vector length (VL) and vector type (vtype) based on an immediate AVL (Application Vector Length) and vtype.
- (premature) **vsb.v v3, imm(s1)**: Stores single bytes in memory from vRegs





# HARDWARE IMPLEMENTATION



## Vector Register File

- 32 64-bit Registers
- vtype, vl, avl CSR logic





# HARDWARE IMPLEMENTATION



## vsetvl

- Extracts vector config parameters: SEW, LMUL, and AVL from setup instructions
- Activates only when opcode = VR FORMAT and funct3 = VC FORMAT
- Asserts vcsr\_wen to enable updates to vector control registers (vl, vtype)
- Outputs defaults and blocks changes if instruction is not a valid config command





# HARDWARE IMPLEMENTATION



## vl setup

- Determines vector length (VL) based on SEW, LMUL, and requested AVL
- Calculates vlmax using current configuration and limits VL to min(AVL, vlmax)
- Validates SEW and LMUL; vector config only allowed if vcsr\_wen == 1
- Disables vector execution if config is invalid or not permitted





# HARDWARE IMPLEMENTATION



## control\_vunit

- Decodes vector instructions and generates control signals for the vector ALU (vALU)
- Uses opcode, funct3, and funct6 to determine operation type and control behavior
- Sets signals for operand source, write enable, operation type, and grouping
- Disables vector operations if instruction is not recognized or not in vector format





# HARDWARE IMPLEMENTATION



## sign\_ext\_64

- Extends 5-bit and 32-bit signed inputs to 64-bit values
- simm5: Sign bit replicated 59 times to form simm64
- scalar\_in: Sign bit replicated 32 times to form scalar\_in\_64
- Ensures correct signed interpretation for vector operations





# HARDWARE IMPLEMENTATION



## sign\_ext\_64 selector

- Selects first vALU operand for vector-scalar operations
- Chooses between scalar\_in\_64 and simm64, based on valu\_src control signal
  - valu\_src = 0: select scalar input
  - valu\_src = 1: select immediate
- Supports flexible operand handling for different instruction types





# HARDWARE IMPLEMENTATION



## vALU

- executes the logical and arithmetic vector operations
- The operations rely on the SEW
- It executes all the instructions between single vector registers within one cycle





# HARDWARE IMPLEMENTATION

<<<<

## vgrouping\_selector

- Handles register grouping when LMUL > 1 by splitting operations across multiple physical registers
- Uses a counter to track sub-iterations and update register indices (raA, raB, rdest) accordingly
- Asserts a stall while grouped operations are in progress to prevent premature instruction fetch
- Counter and Stall logic



>>>>





# HARDWARE IMPLEMENTATION



## vl masking

- Applies Tail Agnostic Zero Masking to elements beyond given *vl*
- Provides a clean separation of useless and useful bits inside the vector register
- Dynamic, based on given VL

|              |                    |
|--------------|--------------------|
| vALU_Out     | masked_vALU_Out    |
| grouping_cnt | <b>vl</b> _masking |
| vtype_alu    | vl                 |

x x x x





# HARDWARE IMPLEMENTATION



## vcontrol\_bypass

- Resolves data hazards in the vector pipeline using data forwarding logic
- Monitors register dependencies between pipeline stages (IDEX, EXMEM, MEMWB)
- Forwards latest ALU results to avoid stalls when source and destination registers match
- Uses the scalar RISC-V bypass unit to handle dependencies involving the base RegFile





# INTEGRATION





# INTEGRATION





# TESTING & VALIDATION



# Testing AVL, LMUL, SEW & Read-After-Write Dependencies

|                                                   |                    |    |   |    |    |     |     |     |      |      |      |      |       |       |       |        |        |        |         |         |    |
|---------------------------------------------------|--------------------|----|---|----|----|-----|-----|-----|------|------|------|------|-------|-------|-------|--------|--------|--------|---------|---------|----|
| \data[20][31:0] = 1                               | 0                  | 1  | 2 | 3  | 4  | 5   | 6   | 7   | 8    | 9    | 10   | 11   | 12    | 13    | 14    | 15     | 16     | 17     | 18      | 19      | 20 |
| \data[21][31:0] = 10                              | 0                  | 10 |   |    |    |     |     |     |      |      | 20   |      |       |       |       |        |        |        |         |         |    |
| \data[0][63:0] = 00000000000000000000000000000000 | 000000000000000000 |    |   |    |    |     |     |     |      |      |      |      |       |       |       |        |        |        |         |         |    |
| \data[1][63:0] = 0                                | 0                  | 2  | 8 | 22 | 52 | 114 | 240 | 494 | 1004 | 2026 | 4072 | 8166 | 16356 | 32738 | 65504 | 131038 | 262108 | 524250 | 1048536 | 2097110 |    |

# Infinite Loop Combination of regular regs with vectors



University of Thessaly | [www.e-ce.uth.gr](http://www.e-ce.uth.gr)





# TESTING & VALIDATION



```
vs_data_out[31:0] =H  
data_addr[31:0] =88000000  
\charMemory[0][0:511] =
```

|            |          |      |      |          |          |          |      |          |
|------------|----------|------|------|----------|----------|----------|------|----------|
| + ""       | H        | "    | e    | "        | l        | "        | l    | "        |
| + 00000000 | 88000000 | 000+ | 880+ | 88000002 | 88000004 | 88000006 | 880+ | 00000000 |
| .....+     |          | + H  | + He | + Hel    | + Hell   |          |      |          |

Vector Store in memory

x x x x





# FUTURE WORK



- VLUL < 1 (e.g. 1/2, 1/4, 1/8)
- vsetvl and vsetvli implementation
- Support loads from memory and more efficient stores in it
- More instructions supported → complete set of CSR registers in order to adjust their functionality

VALU

x x x x





# EVALUATION



Instructions Exploration



Vector Architecture



Testing & Validation



Zedboard Execution



>>>





XXXX

University of Thessaly  
Department of Electrical and Computer Engineering



# THANK YOU

Presented by: Charalampos Zachariadis,  
Filippos Markovitsis,  
Stefanos Ziakas,  
Eleni Athanailidi

XXXX



ECE338 – Parallel Computer Architecture

