



**University of Minho**

**Master in Industrial Electronic and Computers Engineering**

**Embedded Systems**

**Dedicated Processors Project**

---

## **RISC-V Processor**

---

### **Authors - Group 5**

Bruno Silva A88289

Rafael Pedro A88271

Silviane Correia A90037

Sofia Alves A90263

### **Professors:**

Tiago Gomes

Rui Machado

Sérgio Pereira

May 31, 2022

# Contents

|                                                                       |            |
|-----------------------------------------------------------------------|------------|
| <b>List of Figures</b>                                                | <b>iii</b> |
| <b>Acronyms</b>                                                       | <b>v</b>   |
| <b>1 Introduction</b>                                                 | <b>1</b>   |
| <b>2 RV32I Instructions</b>                                           | <b>2</b>   |
| 2.1 RV32I instruction implementation . . . . .                        | 3          |
| 2.1.1 Implementation of immediate and register instructions . . . . . | 5          |
| 2.1.2 Implementation of branch instructions . . . . .                 | 8          |
| 2.1.3 Implementation of store instructions . . . . .                  | 9          |
| 2.1.4 Implementation of load instructions . . . . .                   | 10         |
| 2.1.5 Implementation of jalr instruction . . . . .                    | 11         |
| 2.1.6 Implementation of lui instruction . . . . .                     | 11         |
| 2.1.7 Implementation of auipc instruction . . . . .                   | 12         |
| 2.2 Tests . . . . .                                                   | 13         |
| 2.2.1 Type R . . . . .                                                | 13         |
| 2.2.1.1 SLL - Shift Left Logical . . . . .                            | 13         |
| 2.2.1.2 SLTU - Set Less Then Unsigned . . . . .                       | 14         |
| 2.2.1.3 XOR . . . . .                                                 | 15         |
| 2.2.1.4 SRL - Shift Right Logical . . . . .                           | 16         |
| 2.2.1.5 SRA - Shift Right Arithmetic . . . . .                        | 17         |
| 2.2.2 Type I . . . . .                                                | 18         |
| 2.2.2.1 SLTI - Set Less Than Immediate . . . . .                      | 18         |
| 2.2.2.2 SLTIU - Set Less Than Immediate Unsigned . . . . .            | 19         |
| 2.2.2.3 XORI - Xor Immediate . . . . .                                | 19         |
| 2.2.2.4 ORI - Or Immediate . . . . .                                  | 20         |
| 2.2.2.5 ANDI - And Immediate . . . . .                                | 21         |
| 2.2.2.6 SLLI - Shift Left Logical Immediate . . . . .                 | 22         |
| 2.2.2.7 SR LI - Shift Right Logical Immediate . . . . .               | 23         |
| 2.2.2.8 SRAI - Shift Right Logical Arithmetic . . . . .               | 24         |
| 2.2.2.9 LB - Load Byte . . . . .                                      | 25         |
| 2.2.2.10 LH - Load Half . . . . .                                     | 25         |
| 2.2.2.11 LW - Load Byte Unsigned . . . . .                            | 26         |
| 2.2.2.12 LBU - Load Byte Unsigned . . . . .                           | 26         |
| 2.2.2.13 LHU - Load Half Unsigned . . . . .                           | 26         |
| 2.2.2.14 JALR - Jump And Link Register . . . . .                      | 27         |
| 2.2.3 Type S . . . . .                                                | 28         |
| 2.2.3.1 SB - Store Byte . . . . .                                     | 28         |
| 2.2.3.2 SH - Store Half . . . . .                                     | 29         |

|                     |                                                |           |
|---------------------|------------------------------------------------|-----------|
| 2.2.4               | Type B . . . . .                               | 30        |
| 2.2.4.1             | BEQ - Branch Equal . . . . .                   | 30        |
| 2.2.4.2             | BNE - Branch Not Equal . . . . .               | 32        |
| 2.2.4.3             | BLT - Branch Less Than . . . . .               | 34        |
| 2.2.4.4             | BGE - Branch Greater Equal . . . . .           | 36        |
| 2.2.4.5             | BLTU - Branch Less Than Unsigned . . . . .     | 38        |
| 2.2.4.6             | BGEU - Branch Greater Equal Unsigned . . . . . | 40        |
| 2.2.5               | Type U . . . . .                               | 42        |
| 2.2.5.1             | LUI - Load Upper Immediate . . . . .           | 42        |
| 2.2.5.2             | AUIPC - Add Upper Immediate to PC . . . . .    | 43        |
| <b>3</b>            | <b>Pipeline Processor</b>                      | <b>44</b> |
| 3.1                 | Introduction to Pipeline . . . . .             | 44        |
| 3.2                 | Hazard unit . . . . .                          | 44        |
| 3.3                 | Stage Registers . . . . .                      | 46        |
| 3.4                 | MUXs . . . . .                                 | 52        |
| 3.5                 | Tests . . . . .                                | 53        |
| 3.6                 | Running the processor in the Zybo . . . . .    | 58        |
| <b>4</b>            | <b>Devices</b>                                 | <b>62</b> |
| <b>5</b>            | <b>Conclusion</b>                              | <b>65</b> |
| <b>Bibliography</b> |                                                | <b>66</b> |

# List of Figures

|    |                                             |    |
|----|---------------------------------------------|----|
| 1  | RISC-V 32-bit instruction formats . . . . . | 3  |
| 2  | RISC-V 32-bit instructions . . . . .        | 4  |
| 3  | Signed ALU implementation . . . . .         | 8  |
| 4  | Signed at unsigned comparisons . . . . .    | 9  |
| 5  | SLL- Instruction Test . . . . .             | 13 |
| 6  | SLTU- Instruction Test . . . . .            | 14 |
| 7  | XOR- Instruction Test . . . . .             | 15 |
| 8  | SRL- Instruction Test . . . . .             | 16 |
| 9  | SRA- Instruction Test . . . . .             | 17 |
| 10 | SLTI- Instruction Test . . . . .            | 18 |
| 11 | SLTIU- Instruction Test . . . . .           | 19 |
| 12 | SLTIU- Instruction Test . . . . .           | 20 |
| 13 | ORI- Instruction Test . . . . .             | 21 |
| 14 | ANDI- Instruction Test . . . . .            | 22 |
| 15 | SLLI - Instruction Test . . . . .           | 23 |
| 16 | SRLI- Instruction Test . . . . .            | 24 |
| 17 | SRAI - Instruction Test . . . . .           | 25 |
| 18 | LOADS - Instructions Test . . . . .         | 27 |
| 19 | JARL - Instruction Test . . . . .           | 28 |
| 20 | SB - Instruction Test . . . . .             | 29 |
| 21 | SH - Instruction Test . . . . .             | 30 |
| 22 | BEQ - Instruction Test Case 1 . . . . .     | 31 |
| 23 | BEQ - Instruction Test Case 2 . . . . .     | 32 |
| 24 | BNE - Instruction Test Case 1 . . . . .     | 33 |
| 25 | BNE - Instruction Test Case 2 . . . . .     | 34 |
| 26 | BLT - Instruction Test Case 1 . . . . .     | 35 |
| 27 | BLT - Instruction Test Case 2 . . . . .     | 36 |
| 28 | BGE - Instruction Test Case 1 . . . . .     | 37 |
| 29 | BGE - Instruction Test Case 2 . . . . .     | 38 |
| 30 | BLTU - Instruction Test Case 1 . . . . .    | 39 |
| 31 | BLTU - Instruction Test Case 2 . . . . .    | 40 |
| 32 | BGEU - Instruction Test Case 1 . . . . .    | 41 |
| 33 | BGEU - Instruction Test Case 2 . . . . .    | 42 |
| 34 | LUI - Instruction Test . . . . .            | 43 |
| 35 | AUIPC - Instruction Test . . . . .          | 43 |
| 36 | Pipeline Timing Diagram . . . . .           | 44 |
| 37 | Dataforward example . . . . .               | 54 |
| 38 | Pipeline processor . . . . .                | 55 |
| 39 | Dataforward example . . . . .               | 56 |
| 40 | Stall example . . . . .                     | 56 |

|    |                                                |    |
|----|------------------------------------------------|----|
| 41 | Flush example . . . . .                        | 57 |
| 42 | Hazard Unit - Flush Test . . . . .             | 58 |
| 43 | Processor running in the Zybo 1 . . . . .      | 59 |
| 44 | Instructions used in the testbench 1 . . . . . | 60 |
| 45 | Simulation 1 . . . . .                         | 60 |
| 46 | Processor running in the Zybo 2 . . . . .      | 61 |
| 47 | Instructions used in the testbench 2 . . . . . | 61 |
| 48 | Hardware block . . . . .                       | 62 |
| 49 | Zybo features . . . . .                        | 62 |
| 50 | Device connectivity test . . . . .             | 64 |
| 51 | Device connectivity test . . . . .             | 64 |

## Acronyms

**RV32I** RISC-V 32-bit integer instruction set

**ISA** Instruction Set Architecture

**PC** Program Counter

# 1 Introduction

In the context of the dedicated processor project subject of the embedded systems course, the group was challenged to add features to a given basic single-cycle processor. Some of the possible features that could be implemented in this project are in the following list:

- ALL RV32I instructions (except fence, ecall, ebreak, csrr-based)
- Type (single-cycle, multicycle, pipeline)
- Add a Peripheral at choice
- Add standard/custom buses/interfaces
- Communicate with PS (for code transfers)
- Add custom IP for Processor control (reset)
- Explore different memory options (BRAMs, DDR, logic (LUTs))

The group decided to focus on three features of the list, which are the implementation of all of the RV32I instructions, adding pipeline and adding peripherals. The following chapters of this report describe the implementation of these features.

## 2 RV32I Instructions

The RISC-V 32-bit integer instruction set (RV32I) forms the core of RISC-V instruction set. In the Instruction Set Architecture (ISA), there are six instruction types:

- R-type (register-type);
- I-type (immediate);
- S/B-type (store/branch);
- U/J-type (upper immediate/jump);

These instructions operate on operands, that can be stored in registers or memory, or they can be constants/immediates stored in the instruction itself.

Since retrieving operands from memory takes a long time and the instructions must run fast, the architecture provides 32 registers to store commonly used operands. These are kept in a multiported memory called a register file. There are 31 general-purpose registers  $x_1\text{--}x_{31}$ , which hold integer values and the register  $x_0$  is hardwired to the constant 0.

As mentioned before the RISC-V instructions can use immediate operands, they are called immediates because their values are immediately available from the instruction and do not require a register or memory access.

Data can also be stored in memory to have more than 32 variables (the 32 registers). The register file is small and fast and the memory is larger and slower. In this architecture, instructions operate exclusively on registers, so data stored in memory must be moved to a register before it can be processed. With this combination the program can access a large amount of data fairly quickly. RISC-V uses a byte-addressable memory, each byte in memory has a unique address, and a 32-bit word consists of four 8-bit bytes, so each word address is a multiple of 4.

Each instruction is 32 bits (4 bytes) long, because of this the address of subsequent instructions increases by four. The Program Counter (PC), maintains track of the current instruction. After each instruction completes, the PC holds the memory address of the current instruction and increments by four so that the processor can read or fetch the next instruction from memory.

## 2.1 RV32I instruction implementation

The assignment started from the basis of a single-cycle processor with some instructions already implemented, being the cases of the instructions:

```
JAL  BEQ  LW   SW   ADDI  ADD  SUB  SLT
      OR   AND
```

The task here is to implement the rest of the instructions:

|       |       |      |      |      |      |      |      |
|-------|-------|------|------|------|------|------|------|
| LUI   | AUIPC | JALR | SRA  | BNE  | BLT  | BGE  | BLTU |
| BGEU  | LB    | LH   | LBU  | LHU  | SB   | SH   | SLTI |
| SLTIU | XORI  | ORI  | ANDI | SLLI | SRLI | SRAI | SLL  |
| SLTU  | XOR   | SRL  |      |      |      |      |      |

For the implementation of the remaining instructions it is necessary to take into account the RISC-V 32-bit instruction formats, since every instruction is classified in a specific type and accordingly to this the format of its instruction is different, so it is mandatory to consult Figures 1 and 2. Note that the RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same position in all formats to simplify decoding.

| 31:25                           | 24:20  | 19:15  | 14:12  | 11:7                  | 6:0    |        |
|---------------------------------|--------|--------|--------|-----------------------|--------|--------|
| funct7                          | rs2    | rs1    | funct3 | rd                    | op     | R-Type |
| imm <sub>11:0</sub>             |        | rs1    | funct3 | rd                    | op     | I-Type |
| imm <sub>11:5</sub>             | rs2    | rs1    | funct3 | imm <sub>4:0</sub>    | op     | S-Type |
| imm <sub>12,10:5</sub>          | rs2    | rs1    | funct3 | imm <sub>4:1,11</sub> | op     | B-Type |
| imm <sub>31:12</sub>            |        |        |        | rd                    | op     | U-Type |
| imm <sub>20,10:1,11,19:12</sub> |        |        |        | rd                    | op     | J-Type |
| fs3                             | funct2 | fs2    | fs1    | funct3                | fd     | op     |
| 5 bits                          | 2 bits | 5 bits | 5 bits | 3 bits                | 5 bits | 7 bits |

Figure 1: RISC-V 32-bit instruction formats

| op            | funct3 | funct7   | Type | Instruction          | Description                   | Operation                                                   |
|---------------|--------|----------|------|----------------------|-------------------------------|-------------------------------------------------------------|
| 0000011 (3)   | 000    | -        | I    | lb rd, imm(rs1)      | load byte                     | $rd = \text{SignExt}([\text{Address}]_{7:0})$               |
| 0000011 (3)   | 001    | -        | I    | lh rd, imm(rs1)      | load half                     | $rd = \text{SignExt}([\text{Address}]_{15:0})$              |
| 0000011 (3)   | 010    | -        | I    | lw rd, imm(rs1)      | load word                     | $rd = [\text{Address}]_{31:0}$                              |
| 0000011 (3)   | 100    | -        | I    | lbu rd, imm(rs1)     | load byte unsigned            | $rd = \text{ZeroExt}([\text{Address}]_{7:0})$               |
| 0000011 (3)   | 101    | -        | I    | lhu rd, imm(rs1)     | load half unsigned            | $rd = \text{ZeroExt}([\text{Address}]_{15:0})$              |
| 0010011 (19)  | 000    | -        | I    | addi rd, rs1, imm    | add immediate                 | $rd = rs1 + \text{SignExt}(imm)$                            |
| 0010011 (19)  | 001    | 0000000* | I    | slli rd, rs1, uimm   | shift left logical immediate  | $rd = rs1 \ll uimm$                                         |
| 0010011 (19)  | 010    | -        | I    | slti rd, rs1, imm    | set less than immediate       | $rd = (rs1 < \text{SignExt}(imm))$                          |
| 0010011 (19)  | 011    | -        | I    | sltiu rd, rs1, imm   | set less than imm. unsigned   | $rd = (rs1 < \text{SignExt}(imm))$                          |
| 0010011 (19)  | 100    | -        | I    | xori rd, rs1, imm    | xor immediate                 | $rd = rs1 \wedge \text{SignExt}(imm)$                       |
| 0010011 (19)  | 101    | 0000000* | I    | srlti rd, rs1, uimm  | shift right logical immediate | $rd = rs1 \gg uimm$                                         |
| 0010011 (19)  | 101    | 0100000* | I    | srai rd, rs1, uimm   | shift right arithmetic imm.   | $rd = rs1 \ggg uimm$                                        |
| 0010011 (19)  | 110    | -        | I    | ori rd, rs1, imm     | or immediate                  | $rd = rs1   \text{SignExt}(imm)$                            |
| 0010011 (19)  | 111    | -        | I    | andi rd, rs1, imm    | and immediate                 | $rd = rs1 \& \text{SignExt}(imm)$                           |
| 0010111 (23)  | -      | -        | U    | auipc rd, upimm      | add upper immediate to PC     | $rd = \{upimm, 12'b0\} + PC$                                |
| 0100011 (35)  | 000    | -        | S    | sb rs2, imm(rs1)     | store byte                    | $[\text{Address}]_{7:0} = rs2_{7:0}$                        |
| 0100011 (35)  | 001    | -        | S    | sh rs2, imm(rs1)     | store half                    | $[\text{Address}]_{15:0} = rs2_{15:0}$                      |
| 0100011 (35)  | 010    | -        | S    | sw rs2, imm(rs1)     | store word                    | $[\text{Address}]_{31:0} = rs2$                             |
| 0110011 (51)  | 000    | 0000000  | R    | add rd, rs1, rs2     | add                           | $rd = rs1 + rs2$                                            |
| 0110011 (51)  | 000    | 0100000  | R    | sub rd, rs1, rs2     | sub                           | $rd = rs1 - rs2$                                            |
| 0110011 (51)  | 001    | 0000000  | R    | sll rd, rs1, rs2     | shift left logical            | $rd = rs1 \ll rs2_{4:0}$                                    |
| 0110011 (51)  | 010    | 0000000  | R    | slt rd, rs1, rs2     | set less than                 | $rd = (rs1 < rs2)$                                          |
| 0110011 (51)  | 011    | 0000000  | R    | sltu rd, rs1, rs2    | set less than unsigned        | $rd = (rs1 < rs2)$                                          |
| 0110011 (51)  | 100    | 0000000  | R    | xor rd, rs1, rs2     | xor                           | $rd = rs1 \wedge rs2$                                       |
| 0110011 (51)  | 101    | 0000000  | R    | srl rd, rs1, rs2     | shift right logical           | $rd = rs1 \gg rs2_{4:0}$                                    |
| 0110011 (51)  | 101    | 0100000  | R    | sra rd, rs1, rs2     | shift right arithmetic        | $rd = rs1 \ggg rs2_{4:0}$                                   |
| 0110011 (51)  | 110    | 0000000  | R    | or rd, rs1, rs2      | or                            | $rd = rs1   rs2$                                            |
| 0110011 (51)  | 111    | 0000000  | R    | and rd, rs1, rs2     | and                           | $rd = rs1 \& rs2$                                           |
| 0110111 (55)  | -      | -        | U    | lui rd, upimm        | load upper immediate          | $rd = \{upimm, 12'b0\}$                                     |
| 1100011 (99)  | 000    | -        | B    | beq rs1, rs2, label  | branch if =                   | $\text{if } (rs1 == rs2) \text{ PC} = \text{BTA}$           |
| 1100011 (99)  | 001    | -        | B    | bne rs1, rs2, label  | branch if ≠                   | $\text{if } (rs1 \neq rs2) \text{ PC} = \text{BTA}$         |
| 1100011 (99)  | 100    | -        | B    | blt rs1, rs2, label  | branch if <                   | $\text{if } (rs1 < rs2) \text{ PC} = \text{BTA}$            |
| 1100011 (99)  | 101    | -        | B    | bge rs1, rs2, label  | branch if ≥                   | $\text{if } (rs1 \geq rs2) \text{ PC} = \text{BTA}$         |
| 1100011 (99)  | 110    | -        | B    | bltu rs1, rs2, label | branch if < unsigned          | $\text{if } (rs1 < rs2) \text{ PC} = \text{BTA}$            |
| 1100011 (99)  | 111    | -        | B    | bgeu rs1, rs2, label | branch if ≥ unsigned          | $\text{if } (rs1 \geq rs2) \text{ PC} = \text{BTA}$         |
| 1100111 (103) | 000    | -        | I    | jalr rd, rs1, imm    | jump and link register        | $\text{PC} = rs1 + \text{SignExt}(imm), rd = \text{PC} + 4$ |
| 1101111 (111) | -      | -        | J    | jal rd, label        | jump and link                 | $\text{PC} = \text{JTA}, \quad rd = \text{PC} + 4$          |

Figure 2: RISC-V 32-bit instructions

Based on the table in the Figure 2 and the single-cycle processor, the alterations started in the main decoder. It was necessary to add the opcodes that were left and also to increase the size of some of the control signals to support all the RV32I instructions:

- the *ImmSrc* was changed to 3 bits;
- the *ResultSrc* was changed to 3 bits;
- the *Jump* was changed to 2 bits;

These alterations were also made throughout the project and the current main decoder is the presented below.

```

1 module maindec (
2     op ,
3     ResultSrc ,
4     MemWrite ,
5     Branch ,
6     ALUSrc ,
7     RegWrite ,
8     Jump ,
9     ImmSrc ,
10    ALUOp
11 );
12     input wire [6:0] op;
13     output wire [2:0] ResultSrc;
14     output wire MemWrite;
15     output wire Branch;
16     output wire ALUSrc;
17     output wire RegWrite;
18     output wire [1:0] Jump;
19     output wire [2:0] ImmSrc;
20     output wire [1:0] ALUOp;
21     reg [13:0] controls;
22
23     assign {RegWrite, ImmSrc, ALUSrc, MemWrite, ResultSrc, Branch,
24             ALUOp, Jump} = controls;
25     always @(*)
26         case (op)
27             7'b0000011: controls = 14'b1_000_1_0_001_0_00_00;
28             7'b0100011: controls = 14'b0_001_1_1_000_0_00_00;
29             7'b0110011: controls = 14'b1_xxx_0_0_000_0_10_00;
30             7'b1100011: controls = 14'b0_010_0_0_000_1_01_00;
31             7'b1100111: controls = 14'b1_100_0_0_010_0_00_11;
32             7'b0010011: controls = 14'b1_000_1_0_000_0_10_00;
33             7'b1101111: controls = 14'b1_011_0_0_010_0_00_01;
34             7'b0110111: controls = 14'b1_101_1_0_011_0_00_00;
35             7'b0010111: controls = 14'b1_101_1_0_100_0_00_00;
36             default: controls = 14'b0_000_0_0_000_0_00_00;
37         endcase
38 endmodule

```

Listing 1: maindec.v

### 2.1.1 Implementation of immediate and register instructions

The immediate and register instructions are very similar with regard to the operations they perform, the main difference is that the second source of immediate type instructions is an immediate and in the register type instructions it is a register.

Both of these types support: logical instructions, shift instructions and set less than instructions. The set less than instructions can be signed or unsigned. The instructions operations are all treated as signed unless the set less than unsigned

instructions.

To implement the I-type and R-type instructions it was necessary to change the implementation of the ALU decoder. The number of bits of the *ALUControl* was increased by one, the updated ALUControl has now 4 bits, to support all of the operations that are needed and this alteration was also made throughout the project.

```
1 output reg [3:0] ALUControl;
```

The logical instructions that are present in the RV32I are *or*, *xor* and *and*. The shift instructions that are present in the RV32I are *sll*(shift left logical), *slli*(shift left logical immediate),*srl*(shift right logical),*srlti*(shift right logical immediate),*sra*(shift right arithmetic) and *srai*(shift right arithmetic immediate). The set less then instructions that are present in the RV32I are *sltu*(set less then unsigned), *stli*(set less than immediate) and *stliu*(set less than immediate unsigned).

In the ALU decoder according to the *funct3* of these instructions the ALUControl sign associated with each of them was assigned. It should be noted that the same type of instructions(shift, logical and set less then) on the I-types and R-types have the same *funct3* and are viewed in the same way by the ALU.

It should also be noted that the *add* and *sub* have the same *funct3* and the *shift right arithmetic* and *shift right logical* also have the same *funct3*.

To distinguish the *add* and *sub* it was implemented this, that when it is a *sub* the *RtypeSub* is set.

```
1 assign RtypeSub = funct7b5 & opb5;
```

To distinguish the *shift right arithmetic* and *shift right logical* it was implemented this, that when it is a *shift right arithmetic* the *funct7b5* is set.

```
1 if (funct7b5)
2 ALUControl = 4'b1000; // shift right arithmetic
3 else
4 ALUControl = 4'b0111; // shift right logical
```

The updated implementation of the ALU decoder is presented below.

```
1 always @(*)
2   case (ALUOp)
3     2'b00: ALUControl = 4'b0000;
4     2'b01: ALUControl = 4'b0001;
5     default:
6       case (funct3)
7         3'b000:
8           if (RtypeSub)
9             ALUControl = 4'b0001; // sub
10          else
11            ALUControl = 4'b0000; // add
12         3'b001: ALUControl = 4'b0110; // shift left
13           logical
```

```

13          3'b010: ALUControl = 4'b0101; // set less than
14          3'b011: ALUControl = 4'b1001; // set less than
15      unsigned
16          3'b100: ALUControl = 4'b0100; // xor
17          3'b101:
18              if (funct7b5)
19                  ALUControl = 4'b1000; // shift right
20      arithmetic
21          else
22              ALUControl = 4'b0111; // shift right
23      logical
24          3'b110: ALUControl = 4'b0011; // or
25          3'b111: ALUControl = 4'b0010; // and
26          default: ALUControl = 4'bxxxx;
27      endcase
28  endcase

```

Listing 2: *aludecoder.v*

The next step was to edit the ALU, since it was unsigned and the instructions should be all treated as signed unless the instructions was unsigned. So it was necessary to add the keyword *signed* to the sources for the ALU.

```

1  input wire signed [31:0] SrcA;
2  input wire signed [31:0] SrcB;

```

But there are the unsigned instructions and to handle them is also added unsigned sources for the ALU.

```

1  wire [31:0] SrcA_u;
2  wire [31:0] SrcB_u;

```

The signed value is converted in an unsigned in order to set the unsigned sources when needed.

```

1  assign SrcA_u = SrcA;
2  assign SrcB_u = SrcB;

```

According to the instruction the sources will be chosen.

Finally according to the ALUControl defined in the ALU decoder the ALU implements the operations.

```

1  always @(*)
2      case (ALUControl)
3          4'b0000: ALUResult = SrcA + SrcB; // add
4          4'b0001: ALUResult = SrcA - SrcB; // sub
5          4'b0010: ALUResult = SrcA & SrcB; // and
6          4'b0011: ALUResult = SrcA | SrcB; // or
7          4'b0100: ALUResult = SrcA ^ SrcB; // xor
8          4'b0101: ALUResult = SrcA < SrcB; // set less than
9          4'b0110: ALUResult = SrcA << SrcB; // shift left
10      logical

```

```

10      4'b0111: ALUResult = SrcA >> SrcB; // shift right
11      logical
12      4'b1000: ALUResult = SrcA >>> SrcB; // shift right
13      arithmetic
14      4'b1001: ALUResult = SrcA_u < SrcB_u; // set less than
unsigned
      default: ALUResult = ALUResult;
      endcase

```

## 2.1.2 Implementation of branch instructions

The branch instructions are responsible of comparing the value between 2 registers and decide to jump or not based on the outcome of the comparison. In order to implement the branch instructions it was necessary to implement a better ALU first. In the ALU it was added 4 flags that would identify if the result is zero (Zero), if the operation overflowed (Overflow), if there is carry (Carry) and if the result was negative as illustrated in the figure.



Figure 3: Signed ALU implementation

```

1 assign Verflow = ~(SrcB_u[31] ^ SrcA_u[31] ^ ALUControl[0]) & (
2   SrcA_u[31] ^ Sum[31]) & ALUControl[1];
      assign Carry = ~ALUControl[1] & Cout;

```

```

3      assign Negative = ALUResult[31];
4      assign Zero = ALUResult == 32'b00000000000000000000000000000000;

```

The next step was to create a mux that checks for a condition associated with the branch type. For that the following table was implemented in the control unit.

**Table 5.2 Signed and unsigned comparisons**

| Comparison | Signed                                     | Unsigned            |
|------------|--------------------------------------------|---------------------|
| =          | Z                                          | Z                   |
| $\neq$     | $\bar{Z}$                                  | $\bar{Z}$           |
| <          | $N \oplus V$                               | $\bar{C}$           |
| $\leq$     | $Z + (N \oplus V)$                         | $Z + \bar{C}$       |
| >          | $\bar{Z} \bullet (\bar{N} \oplus \bar{V})$ | $\bar{Z} \bullet C$ |
| $\geq$     | $(\bar{N} \oplus \bar{V})$                 | C                   |

Figure 4: Signed at unsigned comparisons

```

1  mux8 #(32) pcSrcMux(
2      (Branch & Zero) | Jump[0],
3      (Branch & ~Zero) | Jump[0],
4      (Branch) | Jump[0],
5      (Branch) | Jump[0],
6      (Branch & Negative ^ Verflow) | Jump[0],
7      (Branch & ~(Negative ^ Verflow)) | Jump[0],
8      (Branch & Carry) | Jump[0],
9      (Branch & ~Carry) | Jump[0],
10     funct3,
11     PCSrc[0]
12 );

```

After the alu was implemented we can implement the rest of the branch set. For that we must take the comparison result and connect it to the PCSrc so it jumps whenever the condition is true or keeps the program flow if the branch is not taken.

### 2.1.3 Implementation of store instructions

The store instructions are responsible of taking a value from a register and move it inside the RAM. In order to store a value from a register into the dmem the value must be loaded from the regfile and moved into the dmem. Since there are different types of store the store mode selection should be made as late as possible (inside

the dmem) in order to not interfere with other instructions. With that in mind we had to first distinguish the store type, the parameter that defines the store type is the funct3, but since the name was too generic for this application we assigned the funct3 BUS to a new BUS named LoadStrType (this BUS will be useful both for store and load)

```
1 assign LdStrType=funct3;
```

Next, a switch case was created inside the dmem to select the type of store instruction.

```
1 always @(posedge clk)
2     if (we)
3         if(we==1'b1) begin
4             case(LdStrType)
5                 3'b000: RAM[a[31:2]] <= {24'b0,wd[7:0]};
6                 3'b001: RAM[a[31:2]] <= {16'b0,wd[15:0]};
7                 3'b010: RAM[a[31:2]] <= wd[31:0];
8                 3'b100: RAM[a[31:2]] <= {RAM[a[31:2]][31:8],wd
9                     [7:0]};
10                3'b101: RAM[a[31:2]] <= {RAM[a[31:2]][31:16],wd
11                     [15:0]};
12                    default: RAM[a[31:2]] <= 32'b0;
13                endcase
14            end else begin
15                RAM[a[31:2]] <= wd;
16            end
17        end
18    end
```

With that implemented is now possible to select how the value will be stored inside the dmem.

#### 2.1.4 Implementation of load instructions

The load instructions are responsible of taking a value from the RAM and storing it inside a register in the regfile. As with the store instructions, in the load instructions we will implement the load selection mode in the furthest stage of the instruction (inside the regfile). For that it will be necessary to use the LoadStrType previously mentioned. The regfile switch case is implemented the same way as the one in the dmem.

```
1 always @(posedge clk)
2     if (we3)
3         if(ResultSrc==2'b01) begin
4             case(LdStrType)
5                 3'b000: rf[a3] <= {24'b0,wd3[7:0]};
6                 3'b001: rf[a3] <= {16'b0,wd3[15:0]};
7                 3'b010: rf[a3] <= wd3[31:0];
8                 3'b100: rf[a3] <= {rf[a3][31:8],wd3[7:0]};
9                 3'b101: rf[a3] <= {rf[a3][31:16],wd3[15:0]};
10                default: rf[a3] <= 32'b0;
```

```

11         endcase
12     end else begin
13         rf[a3]=wd3;
14     end

```

### 2.1.5 Implementation of jalr instruction

The jalr instruction performs a jump and saves the sum of the original PC with a value from a register in a given register. In order to implement the instruction its first necessary to implement the sum of the PC with the register value, for that it was used an adder with the inputs of PC and the register value (from the regfile). The output PCImm is then added as an entry in the resultmux to be able to be selected when the instruction is called. The result is then written back into the regfile.

```

1 adder PCaddimm(
2     PC,
3     ImmExt,
4     PCImm
5 );
6
6 mux5 #(32) resultmux(
7     ALUResult,
8     ReadData,
9     PCPlus4,
10    ImmExt,
11    PCImm,
12    ResultSrc,
13    Result
14 );

```

The jump side of the jalr occurs exactly the same way as the already implemented jal

### 2.1.6 Implementation of lui instruction

The lui instruction is a special type of load that takes a 20bit immediate and appends zeros at the end, then saves the result in a given register. In order to implement this instruction we first added a control state for the extender that extracts the 20bit immediate from the instruction.

```

1 always @(*)
2     case (immsrc)
3         3'b000: immext = {{20 {instr[31]}}, instr[31:20]};
4         3'b001: immext = {{20 {instr[31]}}, instr[31:25], instr
5             [11:7]};
6         3'b010: immext = {{20 {instr[31]}}, instr[7], instr
7             [30:25], instr[11:8], 1'b0};
8         3'b011: immext = {{12 {instr[31]}}, instr[19:12], instr
9             [20], instr[30:21], 1'b0};

```

```

7         3'b100: immext = {20'b0, instr[31:20]};
8         3'b101: immext = {instr[31:12], 12'b0};
9         default: immext = 32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxx;
10        endcase

```

Then we added the option to take the result directly from the immext, allowing to save in the given register.

```

1 mux5 #(32) resultmux(
2     ALUResult,
3     ReadData,
4     PCPlus4,
5     ImmExt,
6     PCIImm,
7     ResultSrc,
8     Result
9 );

```

### 2.1.7 Implementation of auipc instruction

The auipc instruction is an extension from the lui, instead of saving the value directly in a register the value is added to the PC and then saved in the given register. For that we created an adder that sums the immediate with the PC and then the PCIImm was added to the resultmux.

```

1 adder PCaddimm(
2     PC,
3     ImmExt,
4     PCIImm
5 );

```

  

```

1 mux5 #(32) resultmux(
2     ALUResult,
3     ReadData,
4     PCPlus4,
5     ImmExt,
6     PCIImm,
7     ResultSrc,
8     Result
9 );

```

with that the result can be stored in the given register.

## 2.2 Tests

### 2.2.1 Type R

#### 2.2.1.1 SLL - Shift Left Logical

| Instruction    | Operation                        |
|----------------|----------------------------------|
| sll rd,rs1,rs2 | $rd = rs1 \ll rs2 \text{ (4:0)}$ |

In this instruction the test was made of set 2 in register 9, set 3 in register 4, and the respective result of the shift left in register 6. In this case the result should be b0000000000000000000000000000000010000.

**addi x9,x0,2**

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

**addi x4,x0,3**

Expected Data Address: 4

Expected Write Data: 3

Machine Code: 0x00300213

**sll x6,x9, x4**

Expected Data Address: 6

Expected Write Data: b0000000000000000000000000000000010000

Machine Code: 0x00449333



Figure 5: SLL- Instruction Test

### 2.2.1.2 SLTU - Set Less Than Unsigned

| Instruction       | Operation          |
|-------------------|--------------------|
| sltu rd, rs1, rs2 | $rd = (rs1 < rs2)$ |

In this instruction the test was made of set -5 in register 9, set 4 in register 4, and the respective result of the comparison in register 6. In this case the result should be 0. Since sltiu will see the numbers as unsigned in reality -5 will be seen as 4294967291. This will make the result of the comparison 0, since 4294967291 is not less than 4.

**addi x9,x0,-5**

Expected Data Address: 9

Expected Write Data: -5

Machine Code: 0xFFB00493

**addi x4,x0,4**

Expected Data Address: 4

Expected Write Data: 4

Machine Code: 0x00400213

**sltu x6,x9, x4**

Expected Data Address: 6

Expected Write Data: 0

Machine Code: 0x0044B333



Figure 6: SLTU- Instruction Test

### 2.2.1.3 XOR

| Instruction      | Operation             |
|------------------|-----------------------|
| xor rd, rs1, rs2 | rd = rs1 $\wedge$ rs2 |

A xor was made with the value of 2 placed in register 9 and a 3 placed in register 4 and the result was stored in the register 6.

**addi x9,x0,2**

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

**addi x4,x0,3**

Expected Data Address: 4

Expected Write Data: 3

Machine Code: 0x00300213

**xor x6,x9, x4**

Expected Data Address: 6

Expected Write Data: 1

Machine Code: 0x0044C333



Figure 7: XOR- Instruction Test

#### 2.2.1.4 SRL - Shift Right Logical

| Instruction      | Operation              |
|------------------|------------------------|
| srl rd, rs1, rs2 | $rd = rs1 >> rs2(4:0)$ |

In this instruction the test was made of set 10 in register 9, set 3 in register 4, and the respective result of the shift right in register 6. In this case the result should be b00000000000000000000000000000001.

**addi x9,x0,10**

Expected Data Address: 9

Expected Write Data: 10

Machine Code: 0x00A00493

**addi x4,x0,3**

Expected Data Address: 4

Expected Write Data: 3

Machine Code: 0x00300213

**srl x6,x9, x4**

Expected Data Address: 6

Expected Write Data: b00000000000000000000000000000001

Machine Code: 0x0044D333



Figure 8: SRL- Instruction Test

### 2.2.1.5 SRA - Shift Right Arithmetic

| Instruction      | Operation               |
|------------------|-------------------------|
| sra rd, rs1, rs2 | $rd = rs1 >>> rs2(4:0)$ |

addi x9,x0,-2

Expected Data Address: 9

Expected Write Data: -2

Machine Code: 0xFFE00493

addi x4,x0,3

Expected Data Address: 4

Expected Write Data: 3

Machine Code: 0x00300213

sra x6,x9, x4

Expected Data Address: 6

Machine Code: 0x4044D333



Figure 9: SRA- Instruction Test

## 2.2.2 Type I

### 2.2.2.1 SLTI - Set Less Than Immediate

| Instruction       | Operation                          |
|-------------------|------------------------------------|
| slti rd, rs1, imm | $rd = (rs1 < \text{SignExt}(imm))$ |

In order to test this instruction, a comparison was made between the value of -5, which was set in register 9 and -4, in the immediate. Since -5 is smaller than -4 the instruction should set the destination register to 1, which in this case was 6.

To be able to test this instruction, it was necessary to use a sum instruction to set register 9 to a value of -5.

**addi x9,x0,-5**

Machine Code: 0xFFB00493

Expected Data Address: 9

Expected Write Data: -5

**slti x6, x9, -4**

Machine Code: 0xFFC4A313

Expected Data Address: 6

Expected Write Data: 1



Figure 10: SLTI- Instruction Test

### 2.2.2.2 SLTIU - Set Less Than Immediate Unsigned

| Instruction        | Operation                          |
|--------------------|------------------------------------|
| sltiu rd, rs1, imm | $rd = (rs1 < \text{SignExt}(imm))$ |

This test was performed between the value of -5, which was set in register 9, and 4, in the immediate. Since sltiu will see the numbers as unsigned in reality -5 will be seen as 4294967291. This will make the result of the comparison 0, since 4294967291 is not less than 4.

**addi x9,x0,-5**

Expected Data Address: 9

Expected Write Data: -5

Machine Code: 0xFFB00493

**sltiu x6, x9, 4**

Expected Data Address: 6

Expected Write Data: 0

Machine Code: 0x0044B313



Figure 11: SLTIU- Instruction Test

### 2.2.2.3 XORI - Xor Immediate

A xor was made with the value of 2 placed in register 9 and an immediate with the value -4 and the result was stored in the register 6.

| Instruction       | Operation                               |
|-------------------|-----------------------------------------|
| xori rd, rs1, imm | $rd = (rs1 \wedge \text{SignExt}(imm))$ |

**addi x9,x0,2**

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

**xori x6, x9, -4**

Expected Data Address: 6

Expected Write Data: 11111111111111111111111111111110

Machine Code: 0xFFC4C313



Figure 12: SLTIU- Instruction Test

#### 2.2.2.4 ORI - Or Immediate

| Instruction      | Operation                           |
|------------------|-------------------------------------|
| ori rd, rs1, imm | $rd = rs1 \mid \text{SignExt}(imm)$ |

An or was made between the value 2 in register 9 and the immediate value -4 and the result was stored in the register 6.

**addi x9,x0,2**

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

ori x6, x9, -4

Expected Data Address: 6

Machine Code: 0xFFC4E313



Figure 13: ORI- Instruction Test

### 2.2.2.5 ANDI - And Immediate

| Instruction       | Operation                           |
|-------------------|-------------------------------------|
| andi rd, rs1, imm | $rd = (rs1 \& \text{SignExt}(imm))$ |

An and was made between the value 2 in register 9 and the immediate value -4 and the result was stored in the register 6.

addi x9,x0,2

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

andi x6, x9, -4

Expected Data Address: 6

Expected Write Data: 0

Machine Code: 0xFFC4F313



Figure 14: ANDI- Instruction Test

#### 2.2.2.6 SLLI - Shift Left Logical Immediate

| Instruction        | Operation             |
|--------------------|-----------------------|
| slli rd, rs1, uimm | $rd = (rs1 \ll uimm)$ |

In this instruction the test was made of set 2 in register 9, set 4 in immediate, and the respective result of the shift left in register 6. In this case the result should be b00000000000000000000000000000000100000.

**addi x9,x0,2**

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

**slli x6, x9, 4**

Expected Data Address: 6

Expected Write Data: 00000000000000000000000000000000100000

Machine Code: 0x00449313



Figure 15: SLLI - Instruction Test

#### 2.2.2.7 SRLI - Shift Right Logical Immediate

| Instruction        | Operation             |
|--------------------|-----------------------|
| srli rd, rs1, uimm | $rd = (rs1 \gg uimm)$ |

In this instruction the test was made of set 2 in register 9, set 4 in immediate, and the respective result of the shift right in register 6. In this case the result should be 0.

**addi x9,x0,2**

Expected Data Address: 9

Expected Write Data: 2

Machine Code: 0x00200493

**srli x6, x9, 4**

Expected Data Address: 6

Expected Write Data: 000000000000

Machine Code: 0x0044D313



Figure 16: SRLI- Instruction Test

#### 2.2.2.8 SRAI - Shift Right Logical Arithmetic

| Instruction        | Operation              |
|--------------------|------------------------|
| srai rd, rs1, uimm | $rd = (rs1 \ggg uimm)$ |

In this instruction the test was made of set -2 in register 9, set 4 in immediate, and the respective result of the shift right in register 6. In this case the result should be b1111111111, because the arithmetic shift makes the shift and replace the positions needed according with the signal of the number, so if it is negative it adds 1s and if it is positive it adds 0s.

**addi x9,x0,-2**

Expected Data Address: 9

Expected Write Data: -2

Machine Code: 0xFFE00493

**srai x6, x9, 4**

Expected Data Address: 6

Expected Write Data: 111111111111

Machine Code: 0x4044D313



Figure 17: SRAI - Instruction Test

### 2.2.2.9 LB - Load Byte

| Instruction     | Operation                                   |
|-----------------|---------------------------------------------|
| lb rd, imm(rs1) | $rd = \text{SignExt}([\text{Address}] 7:0)$ |

lb x5, 2(x0)

Machine Code: 0x00220283

Expected Register Value: 0x00000123 → 0x000000BC

### 2.2.2.10 LH - Load Half

| Instruction     | Operation                                      |
|-----------------|------------------------------------------------|
| lh rd, imm(rs1) | $rd = \text{SignExt}([\text{Address}]_{15:0})$ |

lh x9, 2(x0)

Machine Code: 0x00221303

Expected Register Value: 0x00000123 → 0x0000FABC

### 2.2.2.11 LW - Load Byte Unsigned

| Instruction     | Operation           |
|-----------------|---------------------|
| lw rd, imm(rs1) | rd = [Address] 31:0 |

**lw x7, 2(x0)**

Machine Code: 0x00222383

Expected Register Value: 0x00000123 → 0xFFFFFABC

### 2.2.2.12 LBU - Load Byte Unsigned

| Instruction      | Operation                    |
|------------------|------------------------------|
| lbu rd, imm(rs1) | rd = ZeroExt([Address] 7:0 ) |

**lbu x8, 2(x0)**

Machine Code: 0x00224403

Expected Register Value: 0x00000123 → 0x000001BC

### 2.2.2.13 LHU - Load Half Unsigned

| Instruction      | Operation                    |
|------------------|------------------------------|
| lhu rd, imm(rs1) | rd = ZeroExt([Address] 15:0) |

**lhu x6, 2(x0)**

Machine Code: 0x00225483

Expected Register Value: 0x00000123 → 0x00000ABC



Figure 18: LOADS - Instructions Test

#### 2.2.2.14 JALR - Jump And Link Register

| Instruction       | Operation                                     |
|-------------------|-----------------------------------------------|
| jalr rd, rs1, imm | $PC = rs1 + \text{SignExt}(imm), rd = PC + 4$ |

**jalr x6, x1, 0x8**

Expected Data Address: 0x6

Expected Write Data: 0x8

Expected PC: 0xC

Machine Code: 0x00808367



Figure 19: JARL - Instruction Test

### 2.2.3 Type S

#### 2.2.3.1 SB - Store Byte

| Instruction      | Operation                             |
|------------------|---------------------------------------|
| sb rs2, imm(rs1) | [Address]((rs1+imm)(7:0)) = rs2 (7:0) |

**sb x1, 2(x0)**

Machine Code: 0x00220123

Expected Register Value: 0x000000BC



Figure 20: SB - Instruction Test

### 2.2.3.2 SH - Store Half

| Instruction      | Operation                               |
|------------------|-----------------------------------------|
| sh rs2, imm(rs1) | [Address]((rs1+imm)(15:0)) = rs2 (15:0) |

**sh x1, 2(x0)**

Machine Code: 0x00220123

Expected Register Value: 0x0000FABC



Figure 21: SH - Instruction Test

## 2.2.4 Type B

To test the branch instructions it was decided to make 2 test benches for each instruction. The test benches were made following the typology presented below.

```

1
2 Test Case 1:
3
4 ADDI X2, X0, #005      #Machine Code: 0x00500113
5 ADDI X3, X0, #00C      #Machine Code: 0x00C00193
6 ## Bench Instruction
7 ADDI X1, X0, #000      #Machine Code: 0x00000093
8 ADDI X4, X0, #009      #Machine Code: 0x00900213
9
10 Test Case 2:
11
12 ADDI X2, X0, #FFB      #Machine Code: 0xFFB00113
13 ADDI X3, X0, #00C      #Machine Code: 0x00C00193
14 ## Bench Instruction
15 ADDI X1, X0, #000      #Machine Code: 0x00000093
16 ADDI X4, X0, #009      #Machine Code: 0x00900213

```

### 2.2.4.1 BEQ - Branch Equal

| Instruction         | Operation                |
|---------------------|--------------------------|
| beq rs1, rs2, label | if (rs1 == rs2) PC = BTA |

beq x2, x3, 0x08

Machine Code: 0x00310463

**Case 1: X2 = 5, X3 = 0x0C**

Expected Result: Not Taken

Expected PC: 0x08



Figure 22: BEQ - Instruction Test Case 1

**Case 2: X2 = -5, X3 = 0x0C**

Expected Result: Not Taken

Expected PC: 0x08



Figure 23: BEQ - Instruction Test Case 2

#### 2.2.4.2 BNE - Branch Not Equal

| Instruction         | Operation                      |
|---------------------|--------------------------------|
| bne rs1, rs2, label | if ( $rs1 \neq rs2$ ) PC = BTA |

**bne x2, x3, 0x08**

Machine Code: 0x00311463

**Case 1: X2 = 5, X3 = 0x0C**

Expected Result: Taken

Expected PC: 0x16



Figure 24: BNE - Instruction Test Case 1

### Case 2: X2 = - 5, X3 = 0x0C

Expected Result: Taken

Expected PC: 0x16



Figure 25: BNE - Instruction Test Case 2

#### 2.2.4.3 BLT - Branch Less Than

| Instruction         | Operation               |
|---------------------|-------------------------|
| blt rs1, rs2, label | if (rs1 < rs2) PC = BTA |

**blt x2, x3, 0x08**

Machine Code: 0x00314463

**Case 1: X2 = 5, X3 = 0x0C**

Expected Result: Taken

Expected PC: 0x16



Figure 26: BLT - Instruction Test Case 1

### Case 2: X2 = - 5, X3 = 0x0C

Expected Result: Taken

Expected PC: 0x16



Figure 27: BLT - Instruction Test Case 2

#### 2.2.4.4 BGE - Branch Greater Equal

| Instruction         | Operation                      |
|---------------------|--------------------------------|
| bge rs1, rs2, label | if ( $rs1 \geq rs2$ ) PC = BTA |

**bge x2, x3, 0x08**

Machine Code: 0x00315463

**Case 1: X2 = 5, X3 = 0x0C**

Expected Result: Not Taken

Expected PC: 0x08



Figure 28: BGE - Instruction Test Case 1

### Case 2: X2 = -5, X3 = 0x0C

Expected Result: Not Taken

Expected PC: 0x08



Figure 29: BGE - Instruction Test Case 2

#### 2.2.4.5 BLTU - Branch Less Than Unsigned

| Instruction          | Operation                   |
|----------------------|-----------------------------|
| bltu rs1, rs2, label | if ( $rs1 < rs2$ ) PC = BTA |

**bltu x2, x3, 0x08**

Machine Code: 0x00316463

**Case 1: X2 = 5, X3 = 0x0C**

Expected Result: Taken

Expected PC: 0x16



Figure 30: BLTU - Instruction Test Case 1

### Case 2: X2 = -5, X3 = 0x0C

Expected Result: Not Taken

Expected PC: 0x08



Figure 31: BLTU - Instruction Test Case 2

#### 2.2.4.6 BGEU - Branch Greater Equal Unsigned

| Instruction         | Operation                      |
|---------------------|--------------------------------|
| bge rs1, rs2, label | if ( $rs1 \geq rs2$ ) PC = BTA |

**bgeu x2, x3, 0x08**

Machine Code: 0x00317463

**Case 1: X2 = 5, X3 = 0x0C**

Expected Result: Not Taken

Expected PC: 0x08



Figure 32: BGEU - Instruction Test Case 1

**Case 2: X2 = -5, X3 = 0x0C**

Expected Result: Taken

Expected PC: 0x16



Figure 33: BGEU - Instruction Test Case 2

## 2.2.5 Type U

### 2.2.5.1 LUI - Load Upper Immediate

| Instruction   | Operation         |
|---------------|-------------------|
| lui rd, upimm | rd = upimm, 12'b0 |

This instruction adds 12 zeros to the 20 bit immediate, in this test the 20 bit immediate was 0xABCD and the result was stored in the register x2.

**LUI x2, 0xABCD**

Expected Data Address: 0x2

Expected Write Data: 0xABCD000

Machine Code: 0xABCD137



Figure 34: LUI - Instruction Test

#### 2.2.5.2 AUIPC - Add Upper Immediate to PC

| Instruction     | Operation                |
|-----------------|--------------------------|
| auipc rd, upimm | $rd = upimm, 12'b0 + PC$ |

This instruction adds 12 zeros to the 20 bit immediate and adds it to the PC, in this test the 20 bit immediate was 0xABCD and the result was stored in the register x2.

#### AUIPC x4, 0xABCD

Expected Data Address: 0x4

Expected Write Data: 0xABCD004

Machine Code: 0xABCD217



Figure 35: AUIPC - Instruction Test

# 3 Pipeline Processor

## 3.1 Introduction to Pipeline

Pipelining is a way to implement temporal parallelism and it is commonly used in modern processors as it significantly increases the processor performance at low cost. Adding pipelining to a processor increases the throughput (instructions executed per second). However, the more stages the pipeline has, the higher the latency (time needed for a given instruction to execute completely) will be. Adding more than 8 pipeline stages is counter-productive to the Risc-V processor when trying to increase performance.

To add pipelining to the single-cycle processor that was previously implemented, it was decided to add 5 stages, which will be Fetch, Decode, Execute, Memory Read/Write and Writeback (the same ones considered in single-cycle). The timing diagram for the implemented pipelined processor is represented in Figure 36.

The following sub chapters describe the way the group managed to add pipelining to the single-cycle processor.



Figure 36: Pipeline Timing Diagram

## 3.2 Hazard unit

The hazard unit is a sector of a pipeline processor responsible for managing pipeline hazards. Those can occur when a data by one instruction that is still not fully saved is requested by another instruction, or when a branch (or jump) is being processed and the program flow changes (the jump is taken).

To solve these hazards there are 3 run time solutions. It can either data forward a value from a late stage of the pipeline back to an earlier stage in order to fulfill an instruction request without losing clock cycles, it can stall the newest instructions in order to allow the previous ones to calculate a value required in one of the new instructions or it can flush the newest instructions whenever there is a branch that jumps to an unpredicted state (this processor predicts all branches as not taken while

the jump condition is unknown).

According with the pipelined model of a riscv processor it will be required the following inputs and outputs in the hazard unit:

```
1 module hazardunit(
2     clk,
3     reset,
4
5     StallF,
6     StallD,
7     FlushD,
8     FlushE,
9     ForwardAE,
10    ForwardBE,
11
12    Rs1D,
13    Rs2D,
14    RdE,
15    Rs1E,
16    Rs2E,
17    PCSrcE,
18    ResultSrcE,
19    RdM,
20    RegWriteM,
21    RdW,
22    RegWriteW
23 );
```

Starting from the data forward solution (least time expensive solution) it can be applied to either the srcA or the srcB of the ALU and can be provided either from the result of the ALU (forward 1 cycle) or the result of the operation (forward 2 cycles).

The data forward of 1 cycle occurs when the destiny register from the instruction in the Memory access stage ( $\text{Rs1E} == \text{RdM}$  or  $\text{Rs2E} == \text{RdM}$ ), while the data forward of 2 cycles occurs when the destiny register from the instruction in the Write back stage ( $\text{Rs1E} == \text{RdW}$  or  $\text{Rs2E} == \text{RdW}$ ).

An aditional condition there was necessary to add is to prevent the data forward from happening when the destiny register is x0 because that address is locked at the const value of 0.

```
1
2 assign ForwardAE[1]=(((Rs1E==RdM) & RegWriteM) & ~(Rs1E==5'b00000));
3
4 assign ForwardAE[0]=(((Rs1E==RdW) & RegWriteW) & ~(Rs1E==5'b00000));
```

```

5
6 assign ForwardBE[1]=(((Rs2E==RdM) & RegWriteM) & ~(Rs2E==5'b00000));
7
8 assign ForwardBE[0]=(((Rs2E==RdW) & RegWriteW) & ~(Rs2E==5'b00000));

```

The stall solution is used when it is not possible to use the data forward to solve a data hazard. This solution stalls early stage instructions in order to allow the old ones to process a value required by the new ones. This solution may be used in coordination with the data forward.

Stalls occur when a load instruction (those take 5 cycles to obtain the loaded value) is still in the Execution stage of the pipeline and the destiny value is requested by an instruction in the Decode stage of the pipeline.

Stalls occur when the destiny register of a load instruction in the Execution stage is the same as a source register of an instruction in the decoding stage.

The stall affects the fetch and decode registers by holding all their values.

```

1
2 assign lwStall=(ResultSrcE==3'b001) & ((Rs1D==RdE) | (Rs2D==RdE));
3
4 assign StallF=lwStall;
5
6 assign StallD=lwStall;

```

The last solution is the flush. That is applied when a jump or a branch is executed and its function is to clean the instructions currently in the pipeline that aren't suppose to be executed when the jump occurs.

A flush occurs when the program counter is set to a given value instead of incrementing 4 addresses and when a stall occurs.

```

1
2 assign FlushD=PCSrcE;
3
4 assign FlushE=lwStall | PCSrcE;

```

### 3.3 Stage Registers

Stage registers are responsible for dividing the processor in stages and save the state of an instruction from stage to stage. They must be able to respond synchronously to the clock cycle and assure the coherence of the instruction order.

Stage registers must be able to be reseted (to a known state) and, in some cases, even flushed (reseted to zero) or stalled (keep the same value from cycle to cycle).

The first register is the Fetch register and it was already half implemented from the single cycle processor, we just had to integrate the StallF control.

The inputs, outputs and control variables for this register are listed bellow:

```
1 module flop (
2   clk,
3   reset,
4   StallF,
5   d,
6   q
7 );
8 );
```

$\text{PCF}' \Leftrightarrow d$   $\text{PCF} \Leftrightarrow q$

The next step was to implement the resister accordingly to it's function.

In the image the q and d represent PCF and PCF' respectively (the default names were kept).

```
1 always @ (posedge clk or posedge reset)
2   if (StallF)
3     q <= q;
4   else if (reset)
5     q <= 0;
6   else
7     q <= d;
```

The second register is the Decode register and it is responsible for dividing the Fetch and Decode stages. It was fully implemented by us.

The inputs, outputs and control variables for this register are listed bellow:

```
1
2 module pipelineregD(
3   clk,
4   reset,
5
6   StallD,
7   FlushD,
8
9   InstrF,
10  PCF,
```

```

11 PCPlus4F ,
12
13 InstrD ,
14 PCD ,
15 PCPlus4D
16 );

```

The next step was to implement the register accordingly to it's function.

```

1
2 always @ (posedge clk)
3     if (reset) begin
4         InstrD <= 0;
5         PCD <= 0;
6         PCPlus4D <= 0;
7     end else if (StallD) begin
8         InstrD <= InstrD;
9         PCD <= PCD;
10        PCPlus4D <= PCPlus4D;
11    end else if (FlushD) begin
12        InstrD <= 0;
13        PCD <= 0;
14        PCPlus4D <= 0;
15    end else begin
16        InstrD <= InstrF;
17        PCD <= PCF;
18        PCPlus4D <= PCPlus4F;
19    end

```

The third register is the Execute register and it is responsible for dividing the Decode and Execute stages. It was fully implemented by us.

Note that it was important to also pipeline the control unit. The funct3 was implemented out of what is represented in the pipeline diagram because in the single cycle processor we required that BUS to select the type of branch in the branch instructions.

The inputs, outputs and control variables for this register are listed bellow:

```

1
2 module pipelineregE(
3 clk ,
4 reset ,
5
6 FlushE ,
7
8 PCPlus4D ,
9 ImmExtd ,
10 RdD ,
11 Rs2D ,
12 Rs1D ,

```

```

13 PCD ,
14 RD2D ,
15 RD1D ,
16
17 ALUSrcD ,
18 ALUControlD ,
19 BranchD ,
20 JumpD ,
21 MemWriteD ,
22 ResultSrcD ,
23 RegWriteD ,
24 funct3D ,
25
26 PCPlus4E ,
27 ImmExtE ,
28 RdE ,
29 Rs2E ,
30 Rs1E ,
31 PCE ,
32 RD2E ,
33 RD1E ,
34
35 ALUSrcE ,
36 ALUControlE ,
37 BranchE ,
38 JumpE ,
39 MemWriteE ,
40 ResultSrcE ,
41 RegWriteE ,
42 funct3E
43 );

```

The next step was to implement the register accordingly to it's function.

```

1
2 always @ (posedge clk)
3     if (reset) begin
4         PCPlus4E <= 0;
5         ImmExtE <= 0;
6         RdE <= 0;
7         Rs2E <= 0;
8         Rs1E <= 0;
9         PCE <= 0;
10        RD2E <= 0;
11        RD1E <= 0;
12
13        ALUSrcE <= 0;
14        ALUControlE <= 0;
15        BranchE <= 0;
16        JumpE <= 0;
17        MemWriteE <= 0;

```

```

18         ResultSrcE <= 0;
19         RegWriteE <= 0;
20         funct3E <= 0;
21     end else if (FlushE) begin
22         PCPlus4E <= 0;
23         ImmExtE <= 0;
24         RdE <= 0;
25         Rs2E <= 0;
26         Rs1E <= 0;
27         PCE <= 0;
28         RD2E <= 0;
29         RD1E <= 0;
30
31         ALUSrcE <= 0;
32         ALUControlE <= 0;
33         BranchE <= 0;
34         JumpE <= 0;
35         MemWriteE <= 0;
36         ResultSrcE <= 0;
37         RegWriteE <= 0;
38         funct3E <= 0;
39     end else begin
40         PCPlus4E <= PCPlus4D;
41         ImmExtE <= ImmExtD;
42         RdE <= RdD;
43         Rs2E <= Rs2D;
44         Rs1E <= Rs1D;
45         PCE <= PCD;
46         RD2E <= RD2D;
47         RD1E <= RD1D;
48
49         ALUSrcE <= ALUSrcD;
50         ALUControlE <= ALUControlD;
51         BranchE <= BranchD;
52         JumpE <= JumpD;
53         MemWriteE <= MemWriteD;
54         ResultSrcE <= ResultSrcD;
55         RegWriteE <= RegWriteD;
56         funct3E <= funct3D;
57     end

```

The fourth register is the Memory (access) register and it is responsible for dividing the Execute and Memory stages. It was fully implemented by us. In this register we also had to pipeline the control unit

The inputs, outputs and control variables for this register are listed below:

```

1
2 module pipelineregM(
3   clk,
4   reset,

```

```

5
6 ALUResultE ,
7 WriteDataE ,
8 RdE ,
9 PCPlus4E ,
10
11 RegWriteE ,
12 ResultSrcE ,
13 MemWriteE ,
14
15 ALUResultM ,
16 WriteDataM ,
17 RdM ,
18 PCPlus4M ,
19
20 RegWriteM ,
21 ResultSrcM ,
22 MemWriteM
23 );

```

The next step was to implement the register accordingly to it's function.

```

1
2 always @ (posedge clk) begin
3     if (reset) begin
4         ALUResultM <= 0;
5         WriteDataM <= 0;
6         RdM <= 0;
7         PCPlus4M <= 0;
8
9         RegWriteM <= 0;
10        ResultSrcM <= 0;
11        MemWriteM <= 0;
12    end else begin
13        ALUResultM <= ALUResultE;
14        WriteDataM <= WriteDataE;
15        RdM <= RdE;
16        PCPlus4M <= PCPlus4E;
17
18        RegWriteM <= RegWriteE;
19        ResultSrcM <= ResultSrcE;
20        MemWriteM <= MemWriteE;
21    end
22 end

```

The last register is the Write (back) register and it is responsible for dividing the Memory and Write stages. It was fully implemented by us.  
In this register we had to pipeline the control unit.

The inputs, outputs and control variables for this register are listed bellow:

```

1
2 module pipelineregW(
3   clk,
4   reset,
5
6   ALUResultM,
7   ReadDataM,
8   RdM,
9   PCPlus4M,
10
11  RegWriteM,
12  ResultSrcM,
13
14  ALUResultW,
15  ReadDataW,
16  RdW,
17  PCPlus4W,
18
19  RegWriteW,
20  ResultSrcW
21 );

```

The next step was to implement the register accordingly to its function.

```

1
2 always @(posedge clk) begin
3   if (reset) begin
4     ALUResultW <= 0;
5     ReadDataW <= 0;
6     RdW <= 0;
7     PCPlus4W <= 0;
8     RegWriteW <= 0;
9     ResultSrcW <= 0;
10  end else begin
11    ALUResultW <= ALUResultM;
12    ReadDataW <= ReadDataM;
13    RdW <= RdM;
14    PCPlus4W <= PCPlus4M;
15    RegWriteW <= RegWriteM;
16    ResultSrcW <= ResultSrcM;
17  end
18 end

```

### 3.4 MUXs

In order to implement the dataforward it's necessary to add MUXs to the sources of the ALU in order to select from where the values are loaded. The possible source options are:

- RD1E/RD2E
- ALUResultM
- ResultW

Taking this in mind the MUXs were created by using the variables ForwardAE and ForwardBE as selectors.

```

1   mux3 #(32) AEmux(
2     RD1E,
3     Result,
4     ALUResultM,
5     ForwardAE,
6     SrcA
7   );
8

```

```

1   mux3 #(32) BEmux(
2     RD2E,
3     Result,
4     ALUResultM,
5     ForwardBE,
6     WriteData
7   );
8

```

Note that SrcB has a control MUX between the hazard MUX and the ALU. For that reason the output of the muxBE is WriteDataE instead of SrcB directly.

### 3.5 Tests

In order to validate the pipeline implementation it was necessary to run some testbenches with hazards. That way it would be possible to detect if the hazard unit is working as intended and if the instructions are shifting through the pipeline stages without collisions or unintentional delays (stall is an intentional and required delay).

The first test will validate the dataforward solution for an data hazard. In order to test, we'll be using the example from the recomended bibliography.



Figure 7.54 Abstract pipeline diagram illustrating forwarding

Figure 37: Dataforward example

As can be analysed from the example the instructions "sub", "or" and "and" require the value from a register (x8) that is being actualized in the first instruction (add), however that value requires the full 5 cycles in order to be written back into x8 and be able to be used. this can be solved with dataforward.

In the first situation the instruction "sub" is in the Execution stage and the x8 value is already available in the Memory stage, so the hazard unit enables the mux that allows the ALUResultE to be loaded directly into the SrcA avoiding any hazard.

In the second example the x8 value is in the Write back stage while the or instruction is requesting it in the Execution stage. As with the previous example the hazard unit will load the result directly into the SrcA in order to prevent the hazard.

In the last example the x8 value is still in the Write back stage but is being requested in the regfile, so internally the regfile must be able to dataforward internally the x8 register in order to prevent any time losses. That can be achieved with the following code:

```

1
2 01200113
3 01300193
4 01400213
5 01500293
6 01600313
7 01700393
8 01800413
9 01900493

```

this attribution saves the clock cycle of saving the value in the register and then load it again.

In order to test the implementation first we will set the registers to a known value to analyse the testbench easily. The registers from x2 to x9 will all be set from 0x12 to 0x19 in increments of 1.

Converting to machine code:

```

1
2 01200113
3 01300193
4 01400213
5 01500293
6 01600313
7 01700393
8 01800413
9 01900493

```

Then we convert the example instructions:

```

1
2 00520433
3 40340133
4 008364B3
5 002473B3

```

By running the simulation we acquire the following result:



Figure 38: Pipeline processor

As we can observe from the simulation the value calculated in the first instruction ( $0x14 + 0x15 = 0x29$ ) is data forwarded from the Memory stage back to the Execution stage in order to fulfill the SrcA in the 2nd instruction.

In order to validate the data hazard solving using stalls there was executed a second test. this test was also taken from the recommended bibliography.



Figure 7.57 Abstract pipeline diagram illustrating stall to solve hazards

Figure 39: Dataforward example

The hazards are visible in the "and" and "or" instructions. In order to take the value from  $x_7$  and use it in the next instructions we require an extra cycle because the load instruction only disponibilizes the value in the Memory stage when it takes the value from the dmem. Since the datapath shifting occurs to all the stages simultaneously it is necessary to implement a control for this hazard that prevents certain stages from advancing while the elder instructions are still finalizing being executed. When finally the load instruction reaches the Write back stage, then the value can be dataforwarded as seen in the previous test.

For simulation purposes we converted the instructions into machine code

```

1 0402A383
2 00338433
3 00736133
4 402381B3

```

After the simulation was ran we obtained the following chart.



Figure 40: Stall example

As we can observe the instructions in the Decode and Execution stages stall as well as the PC stops, for one cycle, during the 7th cycle (PC=0x28). This allows the result to acquire the value of x8 (in this example: 0x29) and dataforward it to the SrcB.

The third test will validate the flush solution for a control hazard. The test had as reference an example from the recommended bibliography.



Figure 7.59 Abstract pipeline diagram illustrating flushing when a branch is taken

Figure 41: Flush example

The *beq* instruction represent a control hazard because the branch decision has not been taken by the time the next instruction is retrieved.

Predicting whether the branch will be taken and then executing instructions based on that prediction is an alternative to blocking the pipeline. If the forecast was incorrect, the processor can discard the instructions once the branch decision is ready. The two instructions after the branch must be flushed by clearing the pipeline registers for those instructions if the branch should have been taken. In the test made there was just an instruction made after the *beq* and it should be flushed because the branch will be taken.

For the test the registers x3 and x2 were set to 12 and the machine code for these instructions was the following:

```

1 00C00113
2 00C00193

```

Then based on the example, the instructions made were:

```

beq x3,x2,8
addi x2, x0, 5
addi x9, x0, 14

```

That in machine code are respectively:

```

1 00218463
2 00500113
3 00E00493

```

By running the simulation we acquire the following result:



Figure 42: Hazard Unit - Flush Test

If the branch was taken it should execute the instruction that sets register x9 to 14. And since x3 and x2 are equal the branch was taken, the instruction that was after the *beq* was flushed and the register x9 was set to 14, proving that the Hazard Unit and the Pipeline are working well.

### 3.6 Running the processor in the Zybo

After the pipeline processor being implemented we created a constraint file in order to set some constraints in the board. The first constraints were to set the proprieties for all the wires and declare the pin voltage for every wire signal.

```

1 set_property IOSTANDARD LVCMOS33 [get_ports clk]
2 set_property IOSTANDARD LVCMOS33 [get_ports MemWrite]
3 set_property IOSTANDARD LVCMOS33 [get_ports reset]
4 set_property IOSTANDARD LVCMOS33 [get_ports success]
```

A wire called "success" was added because, in order to check if the processor is running the program successfully we could output the DataAdr and WriteData BUSs, but it would require mapping a lot of ports and it wouldn't be perceptible to see if succeeded. Because of that we implemented a condition for the success pin that sets it true when the 100th position ( $25 \times 4$ bytes) of the RAM is 25, that will prove that when we run the given testbench and the success pin is set to true, the processor is running correctly in the zybo.

```

1 always @(*)
2     if (RAM[25]==25)
3         successful <=1;
4     else if (reset)
5         successful <=0;
```

After set the properties for every wire it was necessary to assign the respective ports.

```

1 set_property PACKAGE_PIN K17 [get_ports clk]
2 set_property PACKAGE_PIN Y16 [get_ports reset]
3 set_property PACKAGE_PIN D18 [get_ports MemWrite]
4 set_property PACKAGE_PIN G14 [get_ports success]
```

The clock signal was mapped in the Zybo's clk port while the reset signal was mapped in the first push-button and the last two signals were mapped in the first two wires.

After the hardware was mapped it was necessary to create and set a clock for the processor. With that we created a clock with 10us and 50% duty cycle which has a higher period than the critical path time plus the contamination time.

```

1 create_clock -period 10000.000 -name clk -waveform {0.000 5000.000}
[get_ports clk]
```

With that the processor was ready to run in the board.

First we tested with the code that was given with the testbench.



Figure 43: Processor running in the Zybo 1



Figure 44: Instructions used in the testbench 1



Figure 45: Simulation 1

As it can be seen in the images, the processor is running the code correctly turning on the sucess led at the end.

Next we cleared the instruction responsible for setting the 100th byte of the RAM as 25.



Figure 46: Processor running in the Zybo 2



Figure 47: Instructions used in the testbench 2

As it can be seen in the images, the processor did not write the value in the correct position of the RAM, so it wasn't successful.

This proves that the processor can run the instructions as successfully as it can be seen in the simulations.

## 4 Devices

In order to implement devices a new IP was created. The said IP has the function to interface with peripherals. In this case we will be interfacing with the GPIO peripherals (pushbuttons, switches and LEDs).



Figure 48: Hardware block

In order to interface with the **dmem** there are two BUSes, one for the input peripherals (switches and pushbuttons) and one for output peripherals (LEDs), the first one supports 3 pushbuttons (the two buttons mapped in the MultiplexedI/O weren't used as well as the BTN3 that was used for reset) and 4 switches, while the secound interface supports 4 LEDs (the fifth is MultiplexedI/O, so we didn't use).

### Features

- 650MHz dual-core Cortex-A9 processor
- DDR3 memory controller with 8 DMA channels
- High-bandwidth peripheral controllers: 1G Ethernet, USB 2.0, SDIO
- Low-bandwidth peripheral controller: SPI, UART, CAN, I2C
- Reprogrammable logic equivalent to Artix-7 FPGA
  - 4,400 logic slices, each with four 6-input LUTs and 8 flip-flops
  - 240 KB of fast block RAM
  - Two clock management tiles, each with a phase-locked loop (PLL) and mixed-mode clock manager (MMCM)
  - 80 DSP slices
  - Internal clock speeds exceeding 450MHz
  - On-chip analog-to-digital converter (XADC)
- ZYNQ XC7Z010-1CLG400C
- 512MB x32 DDR3 w/ 1050Mbps bandwidth
- Dual-role (Source/Sink) HDMI port
- 16-bits per pixel VGA source port
- Trinode (1Gbit/100Mbit/10Mbit) Ethernet PHY
- MicroSD slot (supports Linux file system)
- OTG USB 2.0 PHY (supports host and device)
- External EEPROM (programmed with 48-bit globally unique EUI-48/64™ compatible identifier)
- Audio codec with headphone out, microphone and line in jacks
- 128Mb Serial Flash w/ QSPI interface
- On-board JTAG programming and UART to USB converter
- **GPIO: 6 pushbuttons, 4 slide switches, 5 LEDs**
- Six Pmod ports (1 processor-dedicated, 1 dual analog/digital, 3 high-speed differential, 1 logic-dedicated)

Figure 49: Zybo features

On the other hand we also have the external ports for the peripherals (pushbuttons, switches and LEDs).

The first step in order to connect the device interface is to reserve a memory address. For that we reserved the address 0x0. The first 4 bits will be connected to the LEDs, the 4 next bits connected to the switches, the 3 following bits connected to the push buttons and lastly 21 bits reserved for future use.

```
1 if(a[31:2]==0) begin //GPIO Peripheral
2   // [31:11] - reserved (future use)
3   // [10:4] - input peripherals (pushbuttons+switches) read only
4   // [3:0] - output peripherals (LEDs) read+write
5   RAM[a[31:2]][3:0] <= wd[3:0];
6 end
```

As it can be seen by the implementation the input and output pins are allowed to be read, but only the output pins can be written. For that reason, when we try to write to this address only the 4 less significative bits are overwritten.

On the other hand, when we read we can only read the non-reserved pins.

```
1 assign rd = a[31:2] ? RAM[a[31:2]] : {21'b00000000000000000000000000000000, RAM
2   [a[31:2]][10:0]};
```

Lastly the device module is responsible to connect to the peripherals.

```
1 assign GPIOInterfaceIn={pushbuttons,switches};
2 assign LEDs=GPIOInterfaceOut;
```

In order to test the implementation we assigned the LED bus with the number 0x5 and generated the bitstream to test on the board.

```
1 always @(*)
2   RAM[0][3:0]=5;
```



Figure 50: Device connectivity test



Figure 51: Device connectivity test

As expected the LEDs 0 and 2 turned on and the LEDs 1 and 3 were kept off. That confirms that the peripherals are physically mapped in the memory through the device module.

This feature can be used by programs to communicate with the peripherals and interact with external stimulus.

## 5 Conclusion

This project represented a landmark in our knowledge about processors. The pipeline represented an interesting challenge, trying to conciliate all the instructions and organizing into stages granting a better throughput.

The integration of peripherals in the project allowed the processor to connect with the external world opening the possibility to integrate in a greater system.

With this project the group faced the opportunity to apply and analyse the flow of programs in an hardware system created from basic components into a processing unit.

## Bibliography

- [1] Sarah L Harris and David Money Harris. *Digital Design and Computer Architecture*. RISC-V Edition. Katey Birtcher, 2022. ISBN: 9780128200643.
- [2] David A. Patterson and John L. Hennessy. *Computer Organization and Design*. RISC-V Edition. Morgan Imprint Publishers, 2018. ISBN: 9780128122754.
- [3] David A. Patterson and John L. Hennessy. *Computer Architecture*. 6th edition. Morgan Imprint Publishers, 2019. ISBN: 9780128119051.