



**Faculty of Engineering & Technology Electrical &  
Computer Engineering Department**

**COMPUTER ARCHITECTURE-ENCS4370**

*Project #2*

*Multi-Cycle RISC Processor*

---

**Prepared by:**

|                              |                    |                   |
|------------------------------|--------------------|-------------------|
| <b>Name:</b> Francis Miadi   | <b>ID:</b> 1210100 | <b>Section:</b> 1 |
| <b>Name:</b> Miar Taweelel   | <b>ID:</b> 1210447 | <b>Section:</b> 1 |
| <b>Name:</b> Leena Abuhammad | <b>ID:</b> 1211460 | <b>Section:</b> 1 |

**Instructor:** Dr. Ayman Hroub

**Hand in Date:** 14/01/2025

## Abstract

This project presents the design and verification of a 16-bit multicycle RISC processor. The processor has a 16-bit program counter (PC), eight 16-bit general-purpose registers, a 16-bit return register (RR) for function calls, and Performance registers that keep track on various program execution metrics. This design uses Word-addressable memories to store data and instructions in two distinct physical memories, supporting three different instruction types: R-, I-, and J-type. Each instruction is carried out by the multicycle architecture in several steps: fetch, decode, execute, access memory, and write back. To ensure the datapath and control path functionality, boolean equations and control signals were created. A testbench for the primary structural module and waveform simulations for every module were used for verification in order to confirm their functionality. Active-HDL was used for RTL design, while Canva was employed for drawing the datapath. This report provides insights into design, implementation, and testing, supported by diagrams and test cases.

## **Partners Contribution**

**The Table below shows the contribution of each team member in the project  
(Note: all of us contributed in implementing separate Functional units that were  
used in the structural Module):**

| Name    | Contribution                                                                                        |
|---------|-----------------------------------------------------------------------------------------------------|
| Francis | <b>Full Micro-Architecture diagram<br/>Design (Data and Control paths and<br/>Signals) + Report</b> |
| Miar    | <b>Full design verification, Debugging,<br/>fixing, and testing (testBench) +<br/>Report</b>        |
| Leena   | <b>Structural Module implementation +<br/>FSM Diagram + Report</b>                                  |

# Table of Contents

|                                               |            |
|-----------------------------------------------|------------|
| <b>Table of Figures.....</b>                  | <b>VI</b>  |
| <b>List of Tables .....</b>                   | <b>VII</b> |
| <b>1. Theory .....</b>                        | <b>1</b>   |
| <b>2. Pre-implementation .....</b>            | <b>2</b>   |
| <b>2.1 Instruction types and format .....</b> | <b>2</b>   |
| <b>2.1.1 R-Type (Register type).....</b>      | <b>2</b>   |
| <b>2.1.2 I-Type (Immediate type).....</b>     | <b>2</b>   |
| <b>2.1.3 J-Type (Jump Type).....</b>          | <b>3</b>   |
| <b>2.2 Instructions Encoding.....</b>         | <b>3</b>   |
| <b>2.2.1 R-Type (Register Type) .....</b>     | <b>3</b>   |
| <b>2.2.2 I-Type (Immediate Type) .....</b>    | <b>4</b>   |
| <b>2.2.3 J-Type (Jump Type).....</b>          | <b>6</b>   |
| <b>2.3 Functional Units .....</b>             | <b>7</b>   |
| <b>2.3.1 Register File.....</b>               | <b>7</b>   |
| <b>2.3.2 ALU .....</b>                        | <b>7</b>   |
| <b>2.3.3 Memories .....</b>                   | <b>8</b>   |
| <b>2.3.4 Instruction Segmentation .....</b>   | <b>8</b>   |
| <b>2.3.5 Branch Adder .....</b>               | <b>9</b>   |
| <b>2.3.6 Extender.....</b>                    | <b>9</b>   |
| <b>2.3.7 Concatenation Unit .....</b>         | <b>9</b>   |
| <b>2.3.8 Incrementer .....</b>                | <b>9</b>   |
| <b>2.4 Registers.....</b>                     | <b>10</b>  |
| <b>2.4.1 Program Counter (PC) .....</b>       | <b>10</b>  |
| <b>2.4.2 Instruction Register (IR) .....</b>  | <b>10</b>  |
| <b>2.4.3 Operand Registers.....</b>           | <b>11</b>  |
| <b>2.4.4 ALU Output Register.....</b>         | <b>11</b>  |
| <b>2.4.5 Memory Data Register.....</b>        | <b>11</b>  |
| <b>2.4.6 Return Register .....</b>            | <b>11</b>  |
| <b>2.4.7 Iterations Register.....</b>         | <b>12</b>  |
| <b>2.4.8 Performance Registers.....</b>       | <b>12</b>  |
| <b>3. Implementation .....</b>                | <b>13</b>  |

|                                                              |    |
|--------------------------------------------------------------|----|
| <b>3.1 Micro-Architecture Diagram .....</b>                  | 13 |
| <b>3.2 Control Units and Signals .....</b>                   | 16 |
| <b>3.2.1 Main Control Unit.....</b>                          | 16 |
| <b>3.2.2 ALU Control Unit .....</b>                          | 19 |
| <b>3.2.3 PC Control Unit .....</b>                           | 19 |
| <b>4. RTL .....</b>                                          | 20 |
| <b>4.1 PC module.....</b>                                    | 20 |
| <b>4.2 Instruction memory .....</b>                          | 20 |
| <b>4.3 IR_Seg .....</b>                                      | 21 |
| <b>4.4 IR_Reg .....</b>                                      | 21 |
| <b>4.5 Reg_File .....</b>                                    | 22 |
| <b>4.6 ALU .....</b>                                         | 22 |
| <b>4.7 Data_Memory.....</b>                                  | 23 |
| <b>4.8 Branch Adder.....</b>                                 | 23 |
| <b>4.9 Extender.....</b>                                     | 24 |
| <b>4.10 Reg.....</b>                                         | 24 |
| <b>4.11 regMux2to1.....</b>                                  | 25 |
| <b>4.12 Mux5to1 and Mux2to1 .....</b>                        | 25 |
| <b>4.13 Pc Control.....</b>                                  | 26 |
| <b>4.14 ALU Control.....</b>                                 | 27 |
| <b>4.15 Main Control.....</b>                                | 28 |
| <b>4.16 Iterations .....</b>                                 | 29 |
| <b>4.17 Performance registers.....</b>                       | 30 |
| <b>5. Main modules simulation .....</b>                      | 31 |
| <b>5.1 ALU Waves.....</b>                                    | 31 |
| <b>5.2 Reg_file Waves .....</b>                              | 32 |
| <b>5.3 Instruction memory Waves .....</b>                    | 33 |
| <b>5.4 Data memory Waves.....</b>                            | 34 |
| <b>6. Simulation and testing .....</b>                       | 35 |
| <b>6.1 Instruction Set .....</b>                             | 35 |
| <b>6.2 Test Bench Structure .....</b>                        | 36 |
| <b>6.3 Test Bench Execution Output .....</b>                 | 38 |
| <b>6.4 Test Bench Execution Output after adding FOR.....</b> | 47 |

|                                               |    |
|-----------------------------------------------|----|
| <b>6.5 Performance Registers Output .....</b> | 48 |
| <b>6.6 Multi-Cycle Scheduling .....</b>       | 51 |
| <b>7. Conclusion .....</b>                    | 53 |
| <b>8. References .....</b>                    | 54 |

## Table of Figures

|                                                        |    |
|--------------------------------------------------------|----|
| Figure 1.1 – Multi-cycle Processor flow chart [1]..... | 1  |
| Figure 2.3.1.1 – Register File.....                    | 7  |
| Figure 2.3.2.1 – ALU .....                             | 7  |
| Figure 2.3.3.1 – Instruction and Data Memories .....   | 8  |
| Figure 2.3.4.1 – Instruction Segmentation Unit.....    | 8  |
| Figure 2.3.5.1 – Instruction Segmentation Unit.....    | 9  |
| Figure 2.3.6.1 – Extender.....                         | 9  |
| Figure 2.3.7.1 – Concatenation Unit.....               | 9  |
| Figure 2.3.8.1 – Incrementer.....                      | 9  |
| Figure 2.4.1.1 – Program Counter (PC).....             | 10 |
| Figure 2.4.2.1 – Instruction Register (IR).....        | 10 |
| Figure 2.4.3.1 – Operand Registers .....               | 11 |
| Figure 2.4.4.1 – ALU Output Register .....             | 11 |
| Figure 2.4.5.1 – Memory Data Register .....            | 11 |
| Figure 2.4.6.1 – Return Register .....                 | 11 |
| Figure 2.4.7.1 – Iterations Register.....              | 12 |
| Figure 2.4.8.1 – Performance Registers.....            | 12 |
| Figure 3.1.1 – Micro-Architecture Diagram .....        | 13 |
| Figure 3.2.1.1 – Processor finite state machine .....  | 17 |

## **List of Tables**

|                                                              |           |
|--------------------------------------------------------------|-----------|
| <b>Table 2.2.1 – Instructions and Stages.....</b>            | <b>6</b>  |
| <b>Table 3.2.1.1 – Main Control Signals truth table.....</b> | <b>16</b> |
| <b>Table 3.2.1.2 – FSM summarization table.....</b>          | <b>18</b> |
| <b>Table 3.2.2.1 – ALU Control Signals truth table.....</b>  | <b>19</b> |
| <b>Table 3.2.3.1 – PC Control Signals truth table.....</b>   | <b>19</b> |

## 1. Theory

Multi-cycle data path break up instructions into separate steps. It reduces average instruction time. Each step takes a single clock cycle each functional unit can be used more than once in an instruction, as long as it is used in different clock cycles. It reduces the amount of hardware needed. [1]



*Figure 1.1 – Multi-cycle Processor flow chart [1]*

The five stages of the multicycle processor are:

1. **Fetch Instruction:** Retrieves the instruction from the instruction memory using the address stored in the Program Counter (PC).
2. **Decode:** Decodes the instruction to determine the required operations and fetches data and operands from registers.
3. **Execution:** Executes arithmetic, computes memory addresses.
4. **Memory Access:** Accesses data memory for load/store instructions.
5. **Write Back:** Writes results into registers for instructions that have a destination register.

The stages above are separated by clocked registers, to store each stage's result. [1]

## 2. Pre-implementation

Before starting implementing the data control paths and the RTL, a plan has to be made to ensure the correctness of each implemented phase in the future. This plan is the Pre-implementation phase, it includes studying the Instruction types and their format, Instructions Encoding, determining the required combinational and sequential functional units, analyze each unit and determine its inputs, outputs, and whether there is a clock demand or not in it.

After that, control signals for each instruction were determined, and Boolean expressions for each signal were formulated. This ensured accurate control logic to guide and control a route for the data to flow through the data path components during execution to generate correct results.

### 2.1 Instruction types and format

#### 2.1.1 R-Type (Register type)

|                           |                       |                       |                       |                             |
|---------------------------|-----------------------|-----------------------|-----------------------|-----------------------------|
| <i>Opcode<sup>4</sup></i> | <i>Rd<sup>3</sup></i> | <i>Rs<sup>3</sup></i> | <i>Rt<sup>3</sup></i> | <i>Function<sup>3</sup></i> |
|---------------------------|-----------------------|-----------------------|-----------------------|-----------------------------|

- **Opcode:** 4-bit opcode
- **Rd:** 3-bit destination register
- **Rs:** 3-bit first source register
- **Rt:** 3-bit second register
- **Function:** 3-bit operand to differentiate between instructions with the same opcode

#### 2.1.2 I-Type (Immediate type)

|                           |                       |                       |                              |
|---------------------------|-----------------------|-----------------------|------------------------------|
| <i>Opcode<sup>4</sup></i> | <i>Rs<sup>3</sup></i> | <i>Rt<sup>3</sup></i> | <i>Immediate<sup>6</sup></i> |
|---------------------------|-----------------------|-----------------------|------------------------------|

- **Opcode:** 4-bit opcode
- **Rs:** 3-bit source register
- **Rt:** 3-bit destination register
- **immediate:** 6-bit operand, its value is zero-extended for logical instructions and sign-extended for all other instructions

### 2.1.3 J-Type (Jump Type)

| <i>Opcod</i> <sup>4</sup> | <i>Offset</i> <sup>9</sup> | <i>Function</i> <sup>3</sup> |
|---------------------------|----------------------------|------------------------------|
|---------------------------|----------------------------|------------------------------|

- **Opcod:** 4-bit opcode
- **Offset:** 9-bit operand for jump address calculation
- **Function:** 3-bit operand to differentiate between instructions with the same opcode

## 2.2 Instructions Encoding

The implemented processor supports the following subset of instructions, in order to determine the path that each instruction should go through, and what stages it will enter, each instruction has to be encoded to exactly determine its operands and what route it should take.

### 2.2.1 R-Type (Register Type)

| <i>Instruction</i>    | <i>Meaning</i>                                                            | <i>Opcod</i> | <i>Function</i> |
|-----------------------|---------------------------------------------------------------------------|--------------|-----------------|
| <i>AND Rd, Rs, Rt</i> | $\text{Reg}(\text{Rd}) = \text{Reg}(\text{Rs}) \& \text{Reg}(\text{Rt})$  | 0000         | 000             |
| <i>ADD Rd, Rs, Rt</i> | $\text{Reg}(\text{Rd}) = \text{Reg}(\text{Rs}) + \text{Reg}(\text{Rt})$   | 0000         | 001             |
| <i>SUB Rd, Rs, Rt</i> | $\text{Reg}(\text{Rd}) = \text{Reg}(\text{Rs}) - \text{Reg}(\text{Rt})$   | 0000         | 010             |
| <i>SLL Rd, Rs, Rt</i> | $\text{Reg}(\text{Rd}) = \text{Reg}(\text{Rs}) \ll \text{Reg}(\text{Rt})$ | 0000         | 011             |
| <i>SRL Rd, Rs, Rt</i> | $\text{Reg}(\text{Rd}) = \text{Reg}(\text{Rs}) \gg \text{Reg}(\text{Rt})$ | 0000         | 100             |

- ❖ **IF:**  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$ ,  $\text{pc} = \text{pc}+1$
- ❖ **ID:**  $\text{Reg\_A} \leftarrow \text{Reg}(\text{Rs})$ ,  $\text{Reg\_B} \leftarrow \text{Reg}(\text{Rt})$
- ❖ **EX:**  $\text{ALU\_output} \leftarrow \text{OPCODE} (\text{Reg\_A}, \text{Reg\_B})$
- ❖ **WB:**  $\text{Reg}(\text{Rd}) \leftarrow \text{ALU\_output}$

### 2.2.2 I-Type (Immediate Type)

| <i>Instruction</i>      | <i>Meaning</i>                                                                  | <i>Opcode</i> | <i>Function</i> |
|-------------------------|---------------------------------------------------------------------------------|---------------|-----------------|
| <i>ANDI Rt, Rs, Imm</i> | $\text{Reg}(\text{Rt}) = \text{Reg}(\text{Rs}) \& \text{zero\_ext}(\text{Imm})$ | 0010          | NA              |
| <i>ADDI Rt, Rs, Imm</i> | $\text{Reg}(\text{Rt}) = \text{Reg}(\text{Rs}) + \text{zero\_ext}(\text{Imm})$  | 0011          | NA              |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$ ,  $\text{pc} = \text{pc}+1$
- ❖ ID:  $\text{Reg\_A} \leftarrow \text{Reg}(\text{Rs})$ ,  $\text{Reg\_B} \leftarrow \text{zero\_ext}(\text{Imm})$
- ❖ EX:  $\text{ALU\_output} \leftarrow \text{OPCODE}(\text{Reg\_A}, \text{Reg\_B})$
- ❖ WB:  $\text{Reg}(\text{Rd}) \leftarrow \text{ALU\_output}$

| <i>Instruction</i>    | <i>Meaning</i>                                                                             | <i>Opcode</i> | <i>Function</i> |
|-----------------------|--------------------------------------------------------------------------------------------|---------------|-----------------|
| <i>LW Rt, Imm(Rs)</i> | $\text{Reg}(\text{Rt}) = \text{Mem}(\text{Reg}(\text{Rs}) + \text{sign\_ext}(\text{Imm}))$ | 0100          | NA              |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$ ,  $\text{pc} = \text{pc}+1$
- ❖ ID:  $\text{Reg\_A} \leftarrow \text{Reg}(\text{Rs})$ ,  $\text{Reg\_B} \leftarrow \text{sign\_ext}(\text{Imm})$
- ❖ EX:  $\text{ALU\_output} \leftarrow \text{ADD}(\text{Reg\_A}, \text{Reg\_B})$
- ❖ Mem:  $\text{Data\_out} \leftarrow \text{Data\_Mem}[\text{ALU\_output}]$
- ❖ WB:  $\text{Reg}(\text{Rt}) \leftarrow \text{Data\_out}$

| <i>Instruction</i>    | <i>Meaning</i>                                                                             | <i>Opcode</i> | <i>Function</i> |
|-----------------------|--------------------------------------------------------------------------------------------|---------------|-----------------|
| <i>SW Rt, Imm(Rs)</i> | $\text{Mem}(\text{Reg}(\text{Rs}) + \text{sign\_ext}(\text{Imm})) = \text{Reg}(\text{Rt})$ | 0101          | NA              |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$ ,  $\text{pc} = \text{pc}+1$
- ❖ ID:  $\text{Reg\_A} \leftarrow \text{Reg}(\text{Rs})$ ,  $\text{Reg\_B} \leftarrow \text{sign\_ext}(\text{Imm})$
- ❖ EX:  $\text{ALU\_output} \leftarrow \text{ADD}(\text{Reg\_A}, \text{Reg\_B})$
- ❖ Mem:  $\text{Data\_Mem}[\text{ALU\_output}] \leftarrow \text{Reg}(\text{Rt})$

| <i>Instruction</i>     | <i>Meaning</i>                                                              | <i>Opcode</i> | <i>Function</i> |
|------------------------|-----------------------------------------------------------------------------|---------------|-----------------|
| <i>BEQ Rs, Rt, Imm</i> | if (Reg(Rs) == Reg(Rt))<br>Next PC = branch target<br>else Next PC = PC + 1 | 0110          | NA              |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$
- ❖ ID:  $\text{Reg\_A} \leftarrow \text{Reg}(\text{Rs}), \text{Reg\_B} \leftarrow \text{Reg}(\text{Rt})$
- ❖ EX:  $\text{Zero} \leftarrow \text{subtract}(\text{Reg\_A}, \text{Reg\_B})$
- ❖ if ( $\text{Zero} == 1$ )
  - $\text{PC} = \text{PC} + \text{sign\_ext}(\text{Imm})$
  - else  $\text{PC} = \text{PC} + 1$

| <i>Instruction</i>     | <i>Meaning</i>                                                              | <i>Opcode</i> | <i>Function</i> |
|------------------------|-----------------------------------------------------------------------------|---------------|-----------------|
| <i>BNE Rs, Rt, Imm</i> | if (Reg(Rs) != Reg(Rt))<br>Next PC = branch target<br>else Next PC = PC + 1 | 0111          | NA              |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$
- ❖ ID:  $\text{Reg\_A} \leftarrow \text{Reg}(\text{Rs}), \text{Reg\_B} \leftarrow \text{Reg}(\text{Rt})$
- ❖ EX:  $\text{Zero} \leftarrow \text{subtract}(\text{Reg\_A}, \text{Reg\_B})$
- ❖ if ( $\text{Zero} == 0$ )
  - $\text{PC} = \text{PC} + \text{sign\_ext}(\text{Imm})$
  - else  $\text{PC} = \text{PC} + 1$

| <i>Instruction</i> | <i>Meaning</i>                                                                                    | <i>Opcode</i> | <i>Function</i> |
|--------------------|---------------------------------------------------------------------------------------------------|---------------|-----------------|
| <i>FOR Rs, Rt</i>  | if ( $\text{Reg}(\text{Rt}) != 0$ )<br>Next PC = $\text{Reg}(\text{Rs})$<br>else Next PC = PC + 1 | 1000          | NA              |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$
- ❖ ID:  $\text{PC} \leftarrow \text{Reg}(\text{Rs}), \text{iterations} \leftarrow \text{Reg}(\text{Rt})$

### 2.2.3 J-Type (Jump Type)

| <i>Instruction</i> | <i>Meaning</i>        | <i>Opcode</i> | <i>Function</i> |
|--------------------|-----------------------|---------------|-----------------|
| <i>JMP Offset</i>  | Next PC = jump target | 0001          | 000             |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$
- ❖ ID:  $\text{PC} \leftarrow \text{PC}[15:9] \parallel 9\text{-bit offset}$

| <i>Instruction</i> | <i>Meaning</i>                                      | <i>Opcode</i> | <i>Function</i> |
|--------------------|-----------------------------------------------------|---------------|-----------------|
| <i>CALL Offset</i> | Next PC = jump target<br>PC + 1 is stored on the RR | 0001          | 001             |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$
- ❖ ID:  $\text{PC} \leftarrow \text{PC}[15:9] \parallel 9\text{-bit offset}, \text{RR} \leftarrow \text{PC} + 1$

| <i>Instruction</i> | <i>Meaning</i>            | <i>Opcode</i> | <i>Function</i> |
|--------------------|---------------------------|---------------|-----------------|
| <i>RET</i>         | Next PC = value of the RR | 0001          | 010             |

- ❖ IF:  $\text{Instruction} \leftarrow \text{Instruction\_Mem}[\text{PC}]$
- ❖ ID:  $\text{PC} \leftarrow \text{RR}$

The Concept of Multi-Cycle Processor implies that different instruction types require different number of stages as shown in the instructions encoding above, which is also illustrated in the table below:

| <b>Instruction</b> | <b>Number of stages</b> | <b>Stages</b>       |
|--------------------|-------------------------|---------------------|
| ALU                | 4                       | IF, ID, EX, WB      |
| Load               | 5                       | IF, ID, EX, MEM, WB |
| Store              | 4                       | IF, ID, EX, MEM     |
| Branch             | 3                       | IF, ID, EX          |
| Jump               | 2                       | IF, ID              |

## 2.3 Functional Units

After analyzing the instruction types, formats, and encoding, it was concluded that functional units are required to perform the operations defined in the instruction set. Additionally, registers and memories are needed to store instructions, data, and results. To achieve this, the following components were identified as essential.

### 2.3.1 Register File



There are eight 16-bit general-purpose registers in the Register File. It can read from two registers and write to one register at the same time since it has two read busses (A and B) and one write bus. It has five inputs and two outputs, RA and RB are inputs that receive the addresses of the registers that their data is needed and transferred on Bus\_A and Bus\_B respectively as outputs, RW is the input that takes the address of the register that its data is wanted to be written during the Write Back (WB) stage, in which the control unit generates a write enable signal (EN) that controls whether or not data is written to the register file, if it is enabled, then the data from the input Bus\_W will be written on the register with the address RW.

---

*Figure 2.3.1.1 – Register File*

### 2.3.2 ALU



The ALU has 3 inputs and two outputs, it receives inputs A and B, to perform the operation that is determined by the input (OP) which is generated by the ALU control unit, This ALU provides 5 operations: ADD, AND, SUB, SLL, and SRL. When the operation is done, two outputs will be generated, the result of the operation and the ZERO flag which will be 1 if the result of the operation equals to zero, otherwise it will be 0.

---

*Figure 2.3.2.1 – ALU*

### 2.3.3 Memories



The program instructions that the processor will execute are stored in the instruction memory. This memory has two inputs and one output, it receives an Address as an input which is the value of the PC, if the enable (En) is active then the correct instruction will be fetched from the Instruction Memory during the Instruction Fetch (IF) stage.



The data memory is used to store and retrieve data required during program execution. This memory has four inputs and one output. It receives an Address input to specify the memory location, a Data\_in input for storing values, and control signals (W\_EN for write enable and R\_EN for read enable) to choose the operating mode. If R\_EN is active, the data at the input address will be the output of the memory through Data\_out. If W\_EN is active, the value provided at Data\_in will be written to the specified address. This process occurs during the Memory Access (MEM) stage.

*Figure 2.3.3.1 – Instruction and Data Memories*

### 2.3.4 Instruction Segmentation



The Instruction Segmentation (IR\_Seg) unit divides its input into segments of variant sizes based on the instruction type input (IR\_Type). For IR\_Type = 0 (R\_Type), it segments the instruction (excluding the opcode) as follows: Rd (first 3 bits), Rs (second 3 bits), Rt (third 3 bits), and Function (last 3 bits), with imm = 0 and offset = 0. Similarly, it segments I\_Type and J\_Type instructions when IR\_Type = 1 or 2, respectively.

*Figure 2.3.4.1 – Instruction Segmentation Unit*

### 2.3.5 Branch Adder



The Branch Adder takes the current program counter (PC) and a sign extended immediate value as inputs to compute the branch target address during the decode stage. This calculated target is used later in the execution stage if the branch is determined to be taken.

*Figure 2.3.5.1 – Instruction Segmentation Unit*

### 2.3.6 Extender



The Extender takes a 6 bits immediate and extends it to a 16 bits immediate. The type of extension (zero or sign extension) is determined by the EXT\_Op input. If EXT\_Op = 0 (zero extension), 10 zeros are added to the immediate. If EXT\_Op = 1 (sign extension), the 10 bits added are zeros for a positive immediate or ones for a negative immediate.

*Figure 2.3.6.1 – Extender*

### 2.3.7 Concatenation Unit



The concatenation unit receives the offset and PC [15:9] and concatenates them to calculate the jump target address as follows: jump target address = PC [15:9] || offset.

*Figure 2.3.7.1 – Concatenation Unit*

### 2.3.8 Incrementer



The incrementer is a simple unit that receives the current PC and adds 1 to it.

*Figure 2.3.8.1 – Incrementer*

## 2.4 Registers

Each processor requires a program counter register, and the multi cycle processor requires more registers besides the program counter, registers that are between the stages to hold the output of the previous stage in order to use in the next stage at the next clock cycle, and in this project there is an additional required registers, they are called the performance register and the return back register. In total, our processor includes 22 registers, distributed as follows: 8 registers in the register file and 14 additional registers.

### 2.4.1 Program Counter (PC)



This Register is called the program counter (PC) register, it is a 16 bit register that stores the address of the instruction to be fetched from the instruction memory during the Instruction Fetch stage (IF). At the beginning of each IF cycle, it will be updated to point to the next instruction to be fetched, i.e. the PC register holds the address of the current instruction until the it ends, After that the pc will store the next instruction address.

*Figure 2.4.1.1 – Program Counter (PC)*

### 2.4.2 Instruction Register (IR)



This register is called the Instruction Register (IR\_Reg). It stores the instruction fetched from the instruction memory during the Instruction Fetch (IF) stage. Once the instruction is loaded into the IR\_Reg, it divides the instruction into two outputs: the first 4 bits, which represent the opcode, and the remaining bits, which are sent to the Decode stage as an input to the Instruction segmentation unit. The IR\_Reg holds the instruction until it is processed in the next stages.

*Figure 2.4.2.1 – Instruction Register (IR)*

### 2.4.3 Operand Registers



The operand registers, Reg\_A and Reg\_B, store the values of Bus\_A and Bus\_B, respectively, which are outputs from the register file during the Decode stage. These registers then forward their values to the multiplexer that controls the ALU inputs.

*Figure 2.4.3.1 – Operand Registers*

### 2.4.4 ALU Output Register



The ALU\_out register stores the result of the operation performed by the ALU during the Execute stage (EX). This register holds the computed value temporarily and forwards it to the next stage which depends on the instruction type.

*Figure 2.4.4.1 – ALU Output Register*

### 2.4.5 Memory Data Register



The Memory Data Register (MDR) exists in the Write Back stage (WB) but holds the output from the Memory Access stage (Mem). It stores this data to be written to a register in the register file and forwards it to the multiplexer that controls the Bus\_W source for the write back operation.

*Figure 2.4.5.1 – Memory Data Register*

### 2.4.6 Return Register



This is a 16 bit register that stores the return address during function call when the Return\_EN is High.

*Figure 2.4.6.1 – Return Register*

## 2.4.7 Iterations Register



The Iteration Register receives a number that represent the number of iteration for the “for” instruction, this value can only be written on it when the Enable signal is High, it also has a built in decrementer that decrements the value by one with each executed instruction in the loop.

*Figure 2.4.7.1 – Iterations Register*

## 2.4.8 Performance Registers



The Performance Registers Unit increments the appropriate register according to the type of instruction by using the opcode of each instruction. Total executed instructions, total load instructions, total store instructions, total ALU instructions, total control instructions, and total clock cycles are among the metrics it monitors. The performance of the CPU is tracked by updating each of these parameters appropriately.

*Figure 2.4.8.1 – Performance Registers*

### 3. Implementation

#### 3.1 Micro-Architecture Diagram

All of the discussed and explained components in the Pre-implementation part will now be assembled to make the Micro-Architecture Diagram as shown in Figure [3.1.1] below, which consists of Data and control paths:



Figure 3.1.1 – Micro-Architecture Diagram

The diagram above represents the data and control path for the designed multi cycle processor, it consists of functional units for performing operations on the instruction fields, registers to store temporary data throughout the instruction processing, and control units to control the data flow in the data path. The processor operates in five stages: Instruction Fetch (IF), Instruction Decode (ID), Instruction Execute (EX), Memory Access (MEM), and Write Back (WB). Instructions progress through these stages as follows:

➤ **Instruction Fetch (IF):**

The PC register contains the address of the instruction to be fetched from the instruction memory, this address has different sources, in which the PC register's input is a  $5 \times 1$  mux's output, the inputs of this mux are PC +1, RR value, Branch target, jump target, and Rs, allowing the PC to be any of these five values. Once the address is selected, the instruction is fetched and stored in the IR\_Reg to ensure the value stability for the next stage.

➤ **Instruction Decode (ID)**

The stored Value in the IR\_Reg divides the instruction into two outputs: the first 4 bits, which represent the opcode, and the remaining bits, which are sent to the Decode stage as an input to the Instruction segmentation unit, which will divide its input into segments of variant sizes based on the instruction type input (IR\_Type). For IR\_Type = 0 (R\_Type), it segments the instruction (excluding the opcode) as follows: Rd (first 3 bits), Rs (second 3 bits), Rt (third 3 bits), and Function (last 3 bits), with imm = 0 and offset = 0. Similarly, it segments I\_Type and J\_Type instructions when IR\_Type = 1 or 2, respectively, with the appropriate dividing based on each instruction format. After segmentation, the Register File processes the decoded instruction by reading the values from the source registers (Rs and Rt) and forwarding them to the operand registers, Reg\_A and Reg\_B. These registers hold the values for use in the Execute (EX) stage. Additionally, the branch target address will be calculated in this stage, this calculated target is used later in the execution stage if the branch is determined to be taken.

➤ **Instruction Execution (EX):**

The ALU has two inputs, one of them is the value of Reg\_A, and the other one is an outputs of a 2x1 muxe, the mux's inputs are either an extended immediate or the stored value in Reg\_B, the ALU also has a ALU\_ctrl input that determine the operation that will be performed on its inputs. After finishing the operation, two outputs will be generated, ALU output and ZERO output, the ALU output is the result of the performed operation on the two operands, and the ZERO is an output that becomes one when the result of the performed operation equals zero. At the end of the execution stage, the result will be stored in the ALU\_Out register to ensure the value stability for the next stage.

➤ **Memory Access (MEM)**

This stage is for instructions that has access to the Data memory, i.e. Load and Store instructions, in these instructions case, the ALU\_Out register will be holding a calculated address by the ALU in the EX stage, this address will be an input for the Data memory to read from, or write into it. The read or write operation is determined by Write Enable (mem\_W) and Read Enable (mem\_R). in the case of the Store instruction, the mem\_W will be active, thus, the address will be used to write into the Data memory at that address. For the Load instruction, the mem\_R will be active, thus, the address will be used to read from. The value in that address will be the output of the memory (Data\_out), and this output will be stored in the memory data register (MDR) for further usage in the write back stage (WB).

➤ **Write Back (WB)**

This stage is for writing on a destination register in the register file while the input Write\_EN is active, using a mux that chooses the data to be from the ALU\_out register or the MDR, based on its selection line.

## 3.2 Control Units and Signals

All of the explained stages above need a manager to control the data flow and the paths that it should take, this manager is the control unit. Actually, there are three control units, Main control, ALU control, and the PC control.

### 3.2.1 Main Control Unit

This Unit controls the whole data path control signals, it turns components on and off, provides selection lines data to muxes, and changes the mode of operation for some components. All of these signals are generated based on three received inputs, which are the Opcode, function, and ZERO, these three values determine the path that instruction will go through in order to be processed, and the signals are shown in the table below with their corresponding instruction:

|        | Instruction | Opcode | Func     | Zero                      | IR_type    | Iteration_EN | Return_EN | Dest_reg |
|--------|-------------|--------|----------|---------------------------|------------|--------------|-----------|----------|
| Rtype  | And         | "0000" | "000"    | 0                         | 0 (R-type) | 0            | 0         | 0 (Rd)   |
|        | Add         | "0000" | "001"    | x                         | 0 (R-type) | 0            | 0         | 0 (Rd)   |
|        | Sub         | "0000" | "010"    | x                         | 0 (R-type) | 0            | 0         | 0 (Rd)   |
|        | SLL         | "0000" | "011"    | x                         | 0 (R-type) | 0            | 0         | 0 (Rd)   |
|        | SRL         | "0000" | "100"    | x                         | 0 (R-type) | 0            | 0         | 0 (Rd)   |
| I-type | ANDI        | "0010" | 111 (NA) | x                         | 1(I-type)  | 0            | 0         | 1(Rt)    |
|        | ADDI        | "0011" | 111 (NA) | x                         | 1(I-type)  | 0            | 0         | 1(Rt)    |
|        | LW          | "0100" | 111 (NA) | x                         | 1(I-type)  | 0            | 0         | 1(Rt)    |
|        | SW          | "0101" | 111 (NA) | x                         | 1(I-type)  | 0            | 0         | X        |
|        | BEQ         | "0110" | 111 (NA) | should be 1 after the sub | 1(I-type)  | 0            | 0         | X        |
|        | BNE         | "0111" | 111 (NA) | should be 0 after the sub | 1(I-type)  | 0            | 0         | X        |
|        | FOR         | "1000" | 111 (NA) | x                         | 1(I-type)  | 1            | 0         | X        |
| J-Type | JMP Offset  | "0001" | "000"    | x                         | 2 (J-type) | 0            | 0         | X        |
|        | CALL Offset | "0001" | "001"    | x                         | 2 (J-type) | 0            | 1         | X        |
|        | RET         | "0001" | "010"    | x                         | 2 (J-type) | 0            | 0         | X        |

|        | Instruction | Opcode | Func     | Zero                      | Write_EN | EXT_Op     | Write_Src    | ALU_SrcB      | mem_W | mem_R |
|--------|-------------|--------|----------|---------------------------|----------|------------|--------------|---------------|-------|-------|
| Rtype  | And         | "0000" | "000"    | 0                         | 1        | 0 (Zero)   | 1 (From ALU) | 1 (Reg_B)     | 0     | 0     |
|        | Add         | "0000" | "001"    | x                         | 1        | 1 (signed) | 1 (From ALU) | 1 (Reg_B)     | 0     | 0     |
|        | Sub         | "0000" | "010"    | x                         | 1        | 1 (signed) | 1 (From ALU) | 1 (Reg_B)     | 0     | 0     |
|        | SLL         | "0000" | "011"    | x                         | 1        | 0 (Zero)   | 1 (From ALU) | 1 (Reg_B)     | 0     | 0     |
|        | SRL         | "0000" | "100"    | x                         | 1        | 0 (Zero)   | 1 (From ALU) | 1 (Reg_B)     | 0     | 0     |
| I-type | ANDI        | "0010" | 111 (NA) | x                         | 1        | 0 (Zero)   | 1 (From ALU) | 0 (immediate) | 0     | 0     |
|        | ADDI        | "0011" | 111 (NA) | x                         | 1        | 1 (signed) | 1 (From ALU) | 0 (immediate) | 0     | 0     |
|        | LW          | "0100" | 111 (NA) | x                         | 1        | 1 (signed) | 0 (from Mem) | 0 (immediate) | 0     | 1     |
|        | SW          | "0101" | 111 (NA) | x                         | 0        | 1 (signed) | X            | 0 (immediate) | 1     | 0     |
|        | BEQ         | "0110" | 111 (NA) | should be 1 after the sub | 0        | 1 (signed) | X            | 1 (Reg_B)     | 0     | 0     |
|        | BNE         | "0111" | 111 (NA) | should be 0 after the sub | 0        | 1 (signed) | X            | 1 (Reg_B)     | 0     | 0     |
|        | FOR         | "1000" | 111 (NA) | x                         | 0        | X          | X            | X             | 0     | 0     |
| J-Type | JMP Offset  | "0001" | "000"    | x                         | X        | X          | X            | X             | X     | X     |
|        | CALL Offset | "0001" | "001"    | x                         | X        | X          | X            | X             | X     | X     |
|        | RET         | "0001" | "010"    | x                         | X        | X          | X            | X             | X     | X     |

Table 3.2.1.1 – Main Control Signals truth table

## ❖ Boolean Functions

- Iteration\_En == FOR
- Return\_EN = CALL Offset
- Dest\_Reg = ANDI + ADDI + LW
- Write\_EN =  $\overline{SW} + BEQ + BNE + FOR$
- EXT\_Op =  $\overline{And + SLL + SRL + ANDI}$
- Write\_Src =  $\overline{LW}$
- ALU\_SrcB =  $\overline{ANDI + ADDI + LW + SW}$
- mem\_W = SW
- mem\_R = LW

## ❖ State Diagram



Figure 3.2.1.1 – Processor finite state machine

➤ **Summary Table of the Instruction Execution Flow:**

| State     | Description                              | Key Actions                                         |
|-----------|------------------------------------------|-----------------------------------------------------|
| <b>0</b>  | Instruction Fetch                        | PC increments by 1, fetches the instruction.        |
| <b>1</b>  | Instruction Decode/Fetch/BTA Calculation | Decodes instruction, determines operation type.     |
| <b>2</b>  | Address Calculation                      | Calculates address for memory operations (LW/SW).   |
| <b>3</b>  | Load Memory Access                       | Reads data from memory if LW.                       |
| <b>4</b>  | LW Result Stored in Register File        | Stores the loaded data into the specified register. |
| <b>5</b>  | Store Memory Access                      | Writes data to memory if SW.                        |
| <b>6</b>  | ALU Computation (I-Type)                 | Executes ADDI, ANDI instructions using ALU.         |
| <b>7</b>  | ALU Result Stored to Register File       | Stores ALU result back to register file.            |
| <b>8</b>  | ALU Computation (R-Type)                 | Executes ADD, AND, SUB, etc., using ALU.            |
| <b>9</b>  | For Computation                          | Iterates based on Rt value, exits loop if zero.     |
| <b>10</b> | Branch Completion                        | Determines if branch (BEQ, BNE) is taken.           |
| <b>11</b> | Call Completion                          | Stores return address and jumps to subroutine.      |
| <b>12</b> | Return Completion                        | Returns to the saved address after subroutine.      |

**Table 3.2.1.2 – FSM summarization table**

### 3.2.2 ALU Control Unit

The ALU in this project provides 5 operations: ADD, AND, SUB, SLL, and SRL. The ALU Control Unit controls the operation of the ALU based on the instruction and the kind of logical or arithmetic calculation requirement, it receives the Opcode and the function as inputs, and generates ALU\_Ctrl as an output as follows:

|        | Instruction | Opcode | Func     | ALU_Ctrl |
|--------|-------------|--------|----------|----------|
| Rtype  | And         | "0000" | "000"    | "001"    |
|        | Add         | "0000" | "001"    | "010"    |
|        | Sub         | "0000" | "010"    | "011"    |
|        | SLL         | "0000" | "011"    | "100"    |
|        | SRL         | "0000" | "100"    | "101"    |
| I-type | ANDI        | "0010" | 111 (NA) | "001"    |
|        | ADDI        | "0011" | 111 (NA) | "010"    |
|        | LW          | "0100" | 111 (NA) | "010"    |
|        | SW          | "0101" | 111 (NA) | "010"    |
|        | BEQ         | "0110" | 111 (NA) | "011"    |
|        | BNE         | "0111" | 111 (NA) | "011"    |
|        | FOR         | "1000" | 111 (NA) | x        |
| J-Type | JMP Offset  | "0001" | "000"    | x        |
|        | CALL Offset | "0001" | "001"    | x        |
|        | RET         | "0001" | "010"    | x        |

Table 3.2.2.1 – ALU Control Signals truth table

### 3.2.3 PC Control Unit

After finishing processing each instruction, the next instruction has to be fetched from the instruction memory in the IF stage, the PC register's input is connected to a 5x1 mux to provide different PCs updates based on the instruction type, and that's the PC control Unit role, in which it takes the Opcode, Function, and the IR\_Type to generate an output that controls the 5x1 mux as follows:

|        | Instruction | Opcode | Func  | IR_type    | Zero | PC_Ctrl                                |
|--------|-------------|--------|-------|------------|------|----------------------------------------|
| Rtype  | And         |        |       |            |      |                                        |
|        | Add         |        |       |            |      |                                        |
|        | Sub         | x      | x     | 0 (R-type) | x    | "011"                                  |
|        | SLL         |        |       |            |      |                                        |
|        | SRL         |        |       |            |      |                                        |
| I-type | ANDI        | "0010" |       | 1(I-type)  | x    |                                        |
|        | ADDI        | "0011" | x     | 1(I-type)  | x    |                                        |
|        | LW          | "0100" |       | 1(I-type)  | x    | "011"                                  |
|        | SW          | "0101" |       | 1(I-type)  | x    |                                        |
|        | BEQ         | "0110" |       | 1(I-type)  | 0    | "011"                                  |
|        | BNE         | "0111" | x     | 1(I-type)  | 1    | "001"                                  |
|        | FOR         | "1000" |       | 1(I-type)  | x    | "011" if iterations == 0<br>else "100" |
| J-Type | JMP Offset  | "0001" | "000" | 2 (J-type) | x    | "000"                                  |
|        | CALL Offset | "0001" | "001" | 2 (J-type) | x    | "000"                                  |
|        | RET         | "0001" | "010" | 2 (J-type) | x    | "010"                                  |

Table 3.2.3.1 – PC Control Signals truth table

## 4. RTL

### 4.1 PC module

```
1 module PC(
2     input wire clk,           // Clock signal
3     input wire [15:0] new_address, // 16-bit input address
4     output reg [15:0] pc_out // 16-bit output: Current PC value
5 );
6
7     initial begin
8         pc_out = 16'h0000;
9     end
10
11    initial begin
12        @(posedge clk);
13        pc_out = 16'h0000;
14        forever@(posedge clk)
15            pc_out <= new_address;
16    end
17
18
19 endmodule
```

This PC module models a **Program Counter** is a critical part of a processor responsible for managing the flow of instructions. It determines which instruction to fetch next and updates its value during normal execution, jumps, or branches.

### 4.2 Instruction memory

```
1 module instruction_memory (
2     input [15:0] address,      // 16-bit address input
3     input fetch_en,           // Enable signal
4     output reg [15:0] instruction // 16-bit instruction output
5 );
6
7     reg [15:0] instruction_array [0:255]; // 256 words of 16-bit instructions
8     `include "Constants.v"
9     initial begin
10
11         // R-type:
12         instruction_array[0] = {Constants.SLL, 3'b100, 3'b101, 3'b001, 3'b011}; // rd=1, rs=2, rt=3, func=0
13         instruction_array[1] = {Constants.SLL, 3'b100, 3'b101, 3'b101, 3'b001, 3'b011}; // rd=1, rs=2, rt=3, func=0
14         //instruction_array[1] = {Constants.ADD, 3'b001, 3'b010, 3'b011, 3'b000}; // rd=1, rs=2, rt=3, func=0
15         instruction_array[2] = {Constants.SLL, 3'b100, 3'b101, 3'b001, 3'b011}; // rd=4, rs=5, rt=6, func=3
16
17         // I-type:
18         instruction_array[3] = {Constants.SLL, 3'b100, 3'b101, 3'b001, 3'b011}; // rd=1, rs=2, rt=3, func=0
19
20         //instruction_array[3] = {Constants.ANDI, 3'b010, 3'b011, 6'b0000111}; // rs=2, rt=3, imm=7
21         instruction_array[4] = {Constants.SLL, 3'b100, 3'b101, 3'b001, 3'b011}; // rs=5, rt=6, imm=2
22
23         // J-type:
24         instruction_array[5] = {Constants.SLL, 3'b100, 3'b101, 3'b001, 3'b011}; // offset=10, func=0
25         instruction_array[6] = {Constants.CALL, 9'b00000100, 3'b001}; // offset=20, func=1
26
27     end
28
29     always @(*) begin
30         if (fetch_en) begin
31             instruction = instruction_array[address];
32         end else begin
33             instruction = 16'h0000;
34         end
35     end
36
37 endmodule
```

The instruction\_memory stores 256 preloaded 16-bit instructions. It outputs the instruction at a given address when fetch\_en is high; otherwise, it outputs 16'h0000. This module enables instruction fetching for processors, controlled by the fetch\_en signal.

### 4.3 IR\_Seg

```
1 module IR_Seg(
2     input wire [11:0] input_bits, // 12-bit input
3     input wire [1:0] IR_type,   // 2-bit IR type
4     output reg [2:0] rd,       // 3-bit destination register
5     output reg [2:0] rs,       // 3-bit source register
6     output reg [2:0] rt,       // 3-bit target register
7     output reg [2:0] func,    // 3-bit function code
8     output reg [5:0] imm,     // 6-bit immediate value
9     output reg [8:0] offset   // 9-bit offset value
10 );
11
12 always @(*) begin
13     if (IR_type == 2'b00) begin //Rtype
14         rd = input_bits[11:8];
15         rs = input_bits[8:6];
16         rt = input_bits[5:3];
17         func = input_bits[2:0];
18         imm = 6'b0;
19         offset = 9'b0;
20     end
21     else if (IR_type == 2'b01) begin //Itype
22         rd = 3'b0;
23         rs = input_bits[11:9];
24         rt = input_bits[8:6];
25         func = 3'b0;
26         imm = input_bits[5:0];
27         offset = 9'b0;
28     end
29     else if (IR_type == 2'b10) begin //Jtype
30         rd = 3'b0;
31         rs = 3'b0;
32         rt = 3'b0;
33         func = input_bits[2:0];
34         imm = 6'b0;
35         offset = input_bits[11:8];
36     end
37     else begin
38         rd = 3'b0;
39         rs = 3'b0;
40         rt = 3'b0;
41         func = 3'b0;
42         imm = 6'b0;
43         offset = 9'b0;
44     end
45 end
46 end
47
```

**Function:** Decodes the instruction based on its type (R-type, I-type, or J-type). **RTL:** Splits the 12-bit input\_bits into separate fields like rd, rs, rt, func, imm, or offset based on IR\_type. Default values are assigned for invalid types.

### 4.4 IR\_Reg

```
1 module IR_Reg(
2     input wire clk,           // Clock signal
3     input wire [15:0] instruction, // 16-bit instruction input
4     output reg [3:0] opcode,    // 4-bit opcode (MSBs of instruction)
5     output reg [11:0] rest      // Remaining 12 bits of the instruction
6 );
7
8 always @ (posedge clk) begin
9     opcode <= instruction[15:12]; // Extract the 4 MSBs as opcode
10    rest <= instruction[11:0];    // Extract the remaining 12 bits
11 end
12
13 endmodule
```

**Function:** Separates the opcode and the remaining bits of the instruction. **RTL:** Captures the 4 MSBs (opcode) and 12 LSBs (rest) from a 16-bit instruction on each positive clock edge.

## 4.5 Reg\_File

```
1 | module reg_file (
2 |   input clk,
3 |   input write_en,
4 |   input [2:0] RA, RB, RW,
5 |   input [15:0] bus_w,
6 |   output reg [15:0] bus_A, bus_B
7 | );
8 |
9 | reg [15:0] regArray [0:7] = `{
10 |   16'h0000, // 0
11 |   16'h000A, // 10
12 |   16'hFFF0, // -16 (two's complement of 16: 0xFFFF - 16 + 1 = 0xFFFF)
13 |   16'h0014, // 20
14 |   16'hFEC, // -20 (two's complement of 20: 0xFFFF - 20 + 1 = 0xFFEC)
15 |   16'h0010, // 16
16 |   16'hFFF6, // -10 (two's complement of 10: 0xFFFF - 10 + 1 = 0xFFFF)
17 |   16'h000B // 11
18 | };
19 |
20 |
21 |
22 |
23 |
24 |   always @(*) begin //The output is taken asynchronously
25 |     bus_A = regArray[RA];
26 |     bus_B = regArray[RB];
27 |   end
28 |
29 |   always @(posedge clk) begin
30 |
31 |     if (write_en)
32 |       regArray[RW] <= bus_w;
33 |
34 |   end
35 |
36 |
37 |
38 | endmodule
39 |
```

**Function:** Implements a register file with read and write capabilities. **RTL:** Uses asynchronous reads for bus\_A and bus\_B based on RA and RB and synchronous writes to registers when write\_en is active.

## 4.6 ALU

```
1 | module ALU (
2 |   input [2:0] alu_ctrl,
3 |   input signed [15:0] a, b,
4 |   output zero, overflow,neg,
5 |   output reg signed [15:0] alu_out
6 | );
7 |
8 |
9 | reg [1:0] carry;
10 | assign zero = (alu_out == 0); // zero flag
11 | assign overflow = (carry[0] ^ carry[1]); // overflow flag
12 | assign neg = alu_out [15]; // neg flag
13 |
14 | always @(*)begin
15 |
16 |   case (alu_ctrl)
17 |
18 |     3'd1 : alu_out = a & b; //AND
19 |     3'd2 : begin
20 |       {carry[0], alu_out[14:0]} = a[14:0] + b[14:0]; //ADD
21 |       {carry[1], alu_out[15]} = a[15] + b[15] + carry[0];
22 |     end
23 |
24 |     3'd3 : alu_out = a - b; // SUB
25 |     3'd4 : alu_out = a << b[3:0]; //SLL
26 |     3'd5 : alu_out = a >> b[3:0]; //SLR
27 |     default: alu_out = 16'b0;
28 |
29 |   endcase
30 |
31 | end
32 |
33 | endmodule
34 |
```

**Function:** Performs arithmetic and logic operations based on alu\_ctrl. **RTL:** Implements operations like AND, ADD, SUB, SLL, and SLR with flags (zero, overflow, and neg) for specific conditions.

## 4.7 Data\_Memory

```
1  module data_memory (
2     input [15:0] address, // 16-bit address input
3     input [15:0] data_in, // 16-bit data input
4     input mem_W,        // Memory write enable
5     input mem_R,        // Memory read enable
6     input clk,           // Clock signal
7     output reg [15:0] data_out // 16-bit data output
8 );
9
10
11 reg [15:0] memory [0:255]; // 256 words of 16-bit memory
12
13
14 initial begin
15     memory[0] = 16'h0000;
16     memory[1] = 16'h0034;
17     memory[2] = 16'h0078;
18     memory[3] = 16'h00BC;
19
20 end
21
22 // Read operation (combinational logic)
23 always @(*) begin
24     if (mem_R) begin
25         data_out = memory[address];
26     end else begin
27         // Default output value when not reading
28         data_out = 16'h0000;
29     end
30 end
31
32 // Write operation (synchronized with the clock)
33 always @ (posedge clk) begin
34     if (mem_W) begin
35         memory[address] <= data_in;
36     end
37 end
38
39 endmodule
40
41
```

**Function:** Simulates data memory with read and write functionality. **RTL:** Uses combinational logic for reads (mem\_R) and synchronous logic for writes (mem\_W) to a 256-word memory.

## 4.8 Branch Adder

```
1 module BranchAdder(
2     input wire [15:0] pc,      // 16-bit Program Counter
3     input wire [15:0] offset, // 16-bit Offset
4     output wire [15:0] branch_address // 16-bit Branch Address
5 );
6
7 assign branch_address = pc + offset; // Add PC and Offset
8
9 endmodule
10
```

**Function:** Calculates the branch target address. **RTL:** Adds the pc value and offset using simple addition logic

## 4.9 Extender

```
2 module extender (
3     input [5:0] imm,
4     input sign_ext,
5     output reg [15:0] extImm
6 );
7
8
9     always @(*) begin
10         if (sign_ext && imm[5])
11             extImm = {10'b1111111111, imm};
12         else if (sign_ext && ~imm[5])
13             extImm = {10'b0000000000, imm};
14
15     end
16
17 endmodule
18
```

**Function:** Extends a 6-bit immediate value to 16 bits (sign or zero extension). **RTL:** Uses conditional logic to perform sign extension or zero extension based on the sign\_ext input.

## 4.10 Reg

```
1 module Reg(
2     input wire clk,           // Clock signal
3     input wire [15:0] in,    // 16-bit input
4     output reg [15:0] out   // 16-bit output
5 );
6
7
8     always @ (posedge clk) begin
9         out <= in;
10    end
11
12 endmodule
13
14 module Reg1(
15     input wire en,
16     input wire clk,           // Clock signal
17     input wire [15:0] in,    // 16-bit input
18     output reg [15:0] out   // 16-bit output
19 );
20
21
22     always @ (posedge clk && en) begin
23         out <= in;
24    end
25
26 endmodule
27
```

**Function:** Both are 16-bit storage registers that update their output (out) on the positive clock edge (clk).

- **Reg Module:** Updates unconditionally on every clock cycle, always capturing the input (in).
- **Reg1 Module:** Updates conditionally, capturing the input (in) only when the enable signal (en) is high. If en is low, the register retains its current value

## 4.11 regMux2to1

```
1 module RegMux2to1(
2     input wire [2:0] in0,      // 3-bit input 0
3     input wire [2:0] in1,      // 3-bit input 1
4     input wire sel,           // 1-bit selector
5     output reg [2:0] out      // 3-bit output
6 );
7
8
9     always @(*) begin
10         case (sel)
11             1'b0: out = in0; //Rd
12             1'b1: out = in1; //Rt
13         endcase
14     end
15
16 endmodule
```

**Function:** A 2-to-1 multiplexer that selects between two 3-bit inputs (in0 and in1) based on a 1-bit selector signal (sel). **RTL:**

- If sel = 0, the output (out) is assigned the value of in0 (corresponding to Rd).
- If sel = 1, the output (out) is assigned the value of in1 (corresponding to Rt).

## 4.12 Mux5to1 and Mux2to1

```
1 module Mux5to1(
2     input wire [15:0] in0,    // 16-bit input 0
3     input wire [15:0] in1,    // 16-bit input 1
4     input wire [15:0] in2,    // 16-bit input 2
5     input wire [15:0] in3,    // 16-bit input 3
6     input wire [15:0] in4,    // 16-bit input 4
7     input wire [2:0] sel,     // 3-bit selector
8     output reg [15:0] out     // 16-bit output
9 );
10
11
12     always @(*) begin
13         case (sel)
14             3'b000: out = in0; // input 0 =Jump (pc[15:9] || 9 bit offset )
15             3'b001: out = in1; // input 1 = branch taken (pc = pc + extended (6 bit immediate) )
16             3'b010: out = in2; // input 2 = RET (pc= Return address)
17             3'b011: out = in3; // input 3 = Pc = pc+1
18             3'b100: out = in4; // input 4 = For
19             default: out = in3;
20         endcase
21     end
22
23 endmodule
```

---

```
1 module Mux2to1(
2     input wire [15:0] in0,    // 16-bit input 0
3     input wire [15:0] in1,    // 16-bit input 1
4     input wire sel,           // 1-bit selector
5     output reg [15:0] out     // 16-bit output
6 );
7
8
9     always @(*) begin
10         case (sel)
11             1'b0: out = in0;
12             1'b1: out = in1;
13         endcase
14     end
15
16 endmodule
```

**Function:** A 2-to-1 multiplexer that selects between two 16-bit inputs (in0 and in1) based on a 1-bit selector signal (sel). **RTL:**

1. If  $\text{sel} = 0$ , the output ( $\text{out}$ ) is assigned the value of  $\text{in}_0$ .
  2. If  $\text{sel} = 1$ , the output ( $\text{out}$ ) is assigned the value of  $\text{in}_1$ .

**Function:** A 5-to-1 multiplexer that selects one of five 16-bit inputs (in0 to in4) based on a 3-bit selector signal (sel). **RTL:** Depending on the value of sel:

- i. 3'b000: Output is in0 (Jump address, concatenation of pc[15:9] and 9-bit offset).
  - ii. 3'b001: Output is in1 (Branch target address, pc + extended immediate).
  - iii. 3'b010: Output is in2 (Return address stored in the PC).
  - iv. 3'b011: Output is in3 (Next PC, pc + 1).
  - v. 3'b100: Output is in4 (For loop address or other custom behavior).

Default: Outputs in3 (Next PC).

## 4.13 Pc Control

```

module pcControl (
    input [3:0] opcode,      // 4-bit opcode (not the primary control)
    input [2:0] func,        // 3-bit function
    input zero,             // Zero flag
    input [1:0] ir_type,    // 2-bit Instruction Register Type
    input iterations_zero, // flag if the iterations are zero
    output reg [2:0] pc_ctrl // 3-bit PC control signal
);

    always @(*) begin
        // Default pc_ctrl value
        pc_ctrl = 3'b001;
    end

    case (ir_type)
        //----- R-type instructions-----
        2'b001: begin
            pc_ctrl = 3'b011;
        end

        //----- I-type instructions-----
        2'b001: begin
            case (opcode)
                // ADDI, ADDI_LW, SW
                4'b0010_4'b0011_4'b0100_4'b0101: begin
                    pc_ctrl = 3'b011;
                end

                // BEQ...
                4'b0010_4'b0011: begin
                    if (zero) begin
                        pc_ctrl = 3'b001; // Branch taken
                    end else begin
                        pc_ctrl = 3'b011; // Branch not taken
                    end
                end
            end
        end

        //----- J-type instructions-----
        2'b001: begin
            case (func)
                4'b0000: begin // JR
                    pc_ctrl = 3'b001;
                end
                default: begin
                    pc_ctrl = 3'b001; // Default case
                end
            endcase
        end

        default: begin
            pc_ctrl = 3'b001; // Default case
        end
    endcase
endmodule

```

**Function:** Generates the program counter (PC) control signal (pc\_ctrl) based on the instruction type, opcode, function code, zero flag, and loop iterations flag. **RTL:**

- For **R-type instructions** ( $ir\_type = 2'b00$ ): Default PC increment ( $pc\_ctrl = 3'b011$ ).
  - For **I-type instructions** ( $ir\_type = 2'b01$ ):
    - ✓ ALU operations (ANDI, ADDI, LW, SW): Default PC increment.

- ✓ **BEQ**: Branch taken (`pc_ctrl = 3'b001`) if zero = 1; otherwise, increment.
  - ✓ **BNE**: Branch taken if zero = 0; otherwise, increment.
  - ✓ **FOR**: Loop execution based on iterationsZero.
- For **J-type instructions** (`ir_type = 2'b10`):
- ✓ **RET**: Set PC to return address (`pc_ctrl = 3'b010`).
  - ✓ Default: PC increment.
- **Default case**: Increment PC by default (`pc_ctrl = 3'b011`).

## 4.14 ALU Control

```

3 module ALUcontrol (
4
5   input [3:0] opcode,
6   input [2:0] func,
7   output reg [2:0] ALU_ctrl
8 );
9
10 parameter
11   AND = 3'b001,
12   ADD = 3'b010,
13   SUB = 3'b011,
14   SLL = 3'b100,
15   SRL = 3'b101;
16
17 always @(*) begin
18
19   if (opcode == 4'b0000) begin
20
21     case (func)
22       //----- R-type instructions -----
23       3'b000: ALU_ctrl = AND; // AND
24       3'b001: ALU_ctrl = ADD; //ADD
25       3'b010: ALU_ctrl = SUB; //SUB
26       3'b011: ALU_ctrl = SLL; //SLL
27       3'b100: ALU_ctrl = SRL; //SRL
28       default: ALU_ctrl = 3'bx; //NONE
29     endcase
30   end
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
      else begin
        case (opcode)
          //----- I-type instructions -----
          4'b0010: ALU_ctrl = AND;// ANDI
          4'b0011: ALU_ctrl = ADD;//ADDI
          4'b0100: ALU_ctrl = ADD;//LW
          4'b0101: ALU_ctrl = ADD;//SW
          4'b0110: ALU_ctrl = SUB;//BEQ
          4'b0111: ALU_ctrl = SUB;//BNE
          default: ALU_ctrl = 3'bx;//NONE
        endcase
      end
    endmodule

```

**Function:** Decodes the opcode and func fields to generate the appropriate ALU control signal (ALU\_ctrl) for both R-type and I-type instructions. **RTL:**

- **R-type instructions** (`opcode = 4'b0000`):

Decodes func to set operations like AND, ADD, SUB, SLL, and SRL.

- **I-type instructions:**

Decodes opcode for operations like ANDI, ADDI, LW, SW, BEQ, and BNE.

Sets ALU\_ctrl to ADD or SUB as appropriate.

- **Default:**

Outputs 3'bx for undefined or unsupported instructions.

## 4.15 Main Control

**Function:** Decodes the opcode and func fields to generate control signals for fetching, writing, memory operations, ALU source selection and more. **RTL Behavior:**

1. **R-Type Instructions (opcode = 4'b0000):**

- ✓ Fetch enabled, ir\_type = 0 (R-type).
- ✓ Writes to Rd register with ALU results (write\_en = 1, write\_src = 1).
- ✓ No memory operations.

2. **J-Type Instructions (opcode = 4'b0001):**

- ✓ Fetch enabled, ir\_type = 2 (J-type).
- ✓ Control based on func: **CALL**: Saves return address (return\_en = 1). **JMP**: Fetch continues without updates. **RET**: Disables fetch (fetch\_en = 0).

3. **I-Type Instructions:**

- ✓ Fetch enabled, ir\_type = 1 (I-type).
- ✓ **ANDI, ADDI**: Writes ALU results to Rt with appropriate extension (write\_src = 1, ext\_op depends on the operation).
- ✓ **LW**: Reads from memory to Rt (mem\_r = 1, write\_src = 0).
- ✓ **SW**: Writes Rt data to memory (mem\_w = 1).
- ✓ **BEQ, BNE**: Branches conditionally based on zero flag (pc\_ctrl set in other modules).
- ✓ **FOR**: Iterates based on iterationsZero flag.

4. **Default Case:**

- ✓ Outputs undefined control signals (1'bx) for invalid or unsupported instructions.

## 4.16 Iterations

```
1 module Iterations(
2     input wire clk,           // Clock signal
3     input wire [15:0] in,    // 16-bit input
4     input wire Return_enable, // Enable signal
5     output reg [15:0] out,   // 16-bit output
6     output reg Zero_iterations // Flag output: 1 if iterations reach zero
7 );
8
9 always @(posedge clk) begin
10     if (Return_enable) begin
11         if (out <= in) // Load new value when enabled
12             out = in;
13         else if (out != 0) begin
14             out = out - 1; // Decrement value if not zero
15         end
16     end
17     else begin
18         if (out == 0) begin
19             Zero_iterations = 1;
20         end
21     end
22 end
23 endmodule
```

**Function:** Tracks the number of iterations for loops and sets a flag (Zero\_iterations) when the counter reaches zero.

## 4.17 Performance registers

```

3 module PerformanceRegisters (
4     input clk,
5     input reset,
6     input fetch_en,
7     input [3:0] opcode,
8     input [2:0] func,
9     output reg [15:0] total_instructions,
10    output reg [15:0] total_loads,
11    output reg [15:0] total_stores,
12    output reg [15:0] total_alu,
13    output reg [15:0] total_controls,
14    output reg [15:0] clock_cycles
15 );
16
17 initial begin
18     total_instructions = 0;
19     total_loads = 0;
20     total_stores = 0;
21     total_alu = 0;
22     total_controls = 0;
23     clock_cycles = 0;
24 end
25
26 always @ (posedge clk) begin
27     if (reset) begin
28         total_instructions <= 0;
29         total_loads <= 0;
30         total_stores <= 0;
31         total_alu <= 0;
32         total_controls <= 0;
33         clock_cycles <= 0;
34     end
35     else begin
36         total_instructions <= total_instructions + 1;
37         total_loads <= total_loads + 1;
38         total_stores <= total_stores + 1;
39         total_alu <= total_alu + 1;
40         total_controls <= total_controls + 1;
41         clock_cycles <= clock_cycles + 1;
42     end
43 end
44
45 endmodule

```

**Function:** Tracks various performance metrics of a processor, such as the total number of instructions executed, loads, stores, ALU operations, control instructions, and clock cycles.

## 5. Main modules simulation

### 5.1 ALU Waves



#### → Operations tested:

```
initial begin
    $monitor("alu_ctrl=%d, a=%d, b=%d, alu_out=%d, zero=%b, overflow=%b, neg=%b",
             alu_ctrl, a, b, alu_out, zero, overflow, neg);

    //AND operation
    alu_ctrl = 3'd1; a = 16'd5; b = 16'd3; #10;

    //ADD operation
    alu_ctrl = 3'd2; a = 16'd7; b = 16'd4; #10;
    alu_ctrl = 3'd2; a = 16'sd50; b = 16'sd20; #10;

    //SUBTRACT operation
    alu_ctrl = 3'd3; a = 16'd15; b = 16'd10; #10;
    alu_ctrl = 3'd3; a = 16'sd25; b = 16'sd30; #10;

    //Logical Shift Left
    alu_ctrl = 3'd4; a = 16'b0000_0000_0000_1000; b = 2; #10;

    //Logical Shift Right
    alu_ctrl = 3'd5; a = 16'b0000_1111_0000_0000; b = 2; #10;

    //Default case
    alu_ctrl = 3'd6; a = 16'd8; b = 16'd2; #10;

    $finish;
end
```

## 5.2 Reg\_file Waves



### → Tested Cases:

```

69
70 initial begin
71
72     $monitor("Time=%0t | RA=%d | RB=%d | RW=%d | bus_w=%h | bus_A=%h | bus_B=%h",
73             $time, RA, RB, RW, bus_w, bus_A, bus_B);
74
75     //Initial read (no write)
76     write_en = 0;
77     RA = 3'd1; // Read register 1
78     RB = 3'd2; // Read register 2
79     #10;
80
81     // Write to register 3
82     write_en = 1;
83     RW = 3'd3;
84     bus_w = 16'h1234;
85     #10;
86
87     //Disable write and read back register 3
88     write_en = 0;
89     RA = 3'd3;
90     RB = 3'd4; // Read register 4
91     #10;
92
93     // Write to register 5
94     write_en = 1;
95     RW = 3'd5;
96     bus_w = 16'h5678;
97     #10;
98
99     //Disable write and read back register 5
.00    write_en = 0;
.01    RA = 3'd5;
.02    RB = 3'd6; // Read register 6
.03    #10;
.04
.05
.06
.07
.08
|      $finish;
end

```

## 5.3 Instruction memory Waves



### → Tested Cases:

```
60 // Test sequence
61 initial begin
62     $monitor("Time=%0t | Address=%d | Fetch Enable=%b | Instruction=%h",
63             $time, address, fetch_en, instruction);
64
65     // Initialize inputs
66     fetch_en = 0;
67     address = 16'd0;
68
69     // Test 1: Fetch instruction at address 0
70     #10 fetch_en = 1;
71     address = 16'd0;
72     #10;
73
74     // Test 2: Fetch instruction at address 1
75     address = 16'd1;
76     #10;
77
78     // Test 3: Fetch instruction at address 2
79     address = 16'd2;
80     #10;
81
82     // Test 4: Disable fetch and check output
83     fetch_en = 0;
84     address = 16'd3;
85     #10;
86
87     // Test 5: Fetch instruction at address 4
88     fetch_en = 1;
89     address = 16'd4;
90     #10;
91
92     // Test 6: Fetch instruction at address 6
93     address = 16'd6;
94     #10;
95
96     // Finish simulation
97     $finish;
end
```

## 5.4 Data memory Waves



### → Tested Cases:

```

73 initial begin
74   $monitor("Time=%0t | Address=%d | Data In=%h | Mem Write=%b | Mem Read=%b | Data Out=%h",
75             $time, address, data_in, mem_W, mem_R, data_out);
76
77   // Initialize inputs
78   address = 16'd0;
79   data_in = 16'h0000;
80   mem_W = 0;
81   mem_R = 0;
82
83   // Test 1: Read from address 0
84   mem_R = 1;
85   address = 16'd0;
86   #10;
87
88   // Test 2: Write to address 4
89   mem_R = 0;
90   mem_W = 1;
91   address = 16'd4;
92   data_in = 16'h00FF;
93   #10;
94
95   // Test 3: Read from address 4
96   mem_W = 0;
97   mem_R = 1;
98   address = 16'd4;
99   #10;
00
01   // Test 4: Write to address 5
02   mem_R = 0;
03   mem_W = 1;
04   address = 16'd5;
05   data_in = 16'h00AA;
06   #10;
07
08   // Test 5: Read from address 5
09   mem_W = 0;
10   mem_R = 1;
11   address = 16'd5;
12   #10;
13

```

## 6. Simulation and testing

### 6.1 Instruction Set

A Set of ten dependent and independent instructions of different types have been implemented in the instruction memory to test the execution of the processor , the instructions implemented are viewed in the figure down below :

```
-----  
initial begin  
// R-type instructions:  
instruction_array[0] = {Constants.ADD, 3'b000, 3'b000, 3'b000, 3'b001};  
  
instruction_array[1] = {Constants.SW , 3'b010, 3'b001, 6'b000010};  
instruction_array[2] = {Constants.LW, 3'b011, 3'b101, 6'b000000};  
instruction_array[3] = {Constants.ADD, 3'b101, 3'b011, 3'b101, 3'b001};  
  
// I-type instructions:  
instruction_array[4] = {Constants.ADDI, 3'b010, 3'b101, 6'b000011};  
instruction_array[5] = {Constants.BEQ, 3'b111, 3'b110, 6'b000010};  
  
// J-type instructions:  
instruction_array[6] = {Constants.ADDI, 3'b010, 3'b011, 6'b000010};  
instruction_array[7] = {4'b0001, 9'b000001001, 3'b001};  
instruction_array[8] = {Constants.SUB, 3'b110, 3'b010, 3'b111, 3'b010};  
instruction_array[9] = {Constants.SLL, 3'b101, 3'b011, 3'b101, 3'b011};  
instruction_array[10] = {4'b0001, 9'b000001001, 3'b010};  
end
```

➔ The Values of the Registers initially are :

```
reg [15:0] regArray [0:7] = '{  
|  
    16'h0000, //R0  
    16'h000A, //R1  
    16'h0002, //R2  
    16'h0004, //R3  
    16'h0003, //R4  
    16'h0000, //R5  
    16'h000B, //R6  
    16'h000B //R7
```

→ Thus, The Instructions sequentially are clarified to be :

### 0. Dummy Instruction

1. SW R1 , 2(R2)
2. LW R5, 0 (R3)
3. ADD R5,R3,R5
4. ADDI R5,R2, 3
5. BEQ R7, R6 , 2
6. ADDI R3,R2, 2
7. Call 9
8. SUB R6, R2, R7
9. SLL R5,R3,R5
10. RET

## 6.2 Test Bench Structure

After implementing the instructions, all these instructions were tested by implementing a test bench, as viewed down below:

```
module datapath_tb2;  
  
// Declare testbench variables  
reg clk;  
wire [15:0] current_pc, instruction_out, next_pc, pc_out1, JTA, Bus_w, B, A, regA, regB, extimm, alu_opernad2;  
wire signed [15:0] alu_output, result_buffer;  
wire [3:0] opcode;  
wire [2:0] alu_ctrl, pc_ctrl;  
wire [11:0] reg1_rest;  
wire [1:0] IR_type;  
wire [2:0] func, Rd, Rt, Rs, Rw;  
wire [5:0] imm;  
wire [8:0] offset;  
wire zero, write_src, fetch_enable, dst_reg, write_enable, ext_op, alu_src, mem_W, mem_R, return_en, zero_iterations;  
  
// Instantiate the module under test (MUT)  
datapath uut (  
    .clk(clk)  
)  
  
// Generate clock signal with a period of 10 time units (5ns high, 5ns low)  
initial begin  
    clk = 0;  
    forever #5 clk = ~clk; // 10 time unit clock period  
end  
  
// Stimulus and sequential monitoring  
initial begin  
    #57; // Skip the dummy instruction  
  
    // Loop through multiple instructions  
    repeat (9) begin  
        // Fetch stage
```

```

#11; // Move to Decode

// Decode stage
$display("[DECODE] Time: %0t", $time);
$display(" IR_type: %b | Rd: %b | Rs: %b | Rt: %b | Func: %b | Imm: %b | Offset: %b",
        uut.IR_type, uut.Rd, uut.Rs, uut.Rt, uut.func, uut.imm, uut.offset);
$display(" Bus_A: %h | Bus_B: %h | RegA: %h | RegB: %h | ExtImm: %h",
        uut.A, uut.B, uut.regA, uut.regB, uut.extimm);

// Execute stage ((for specific instruction types)
if (uut.opcode != 4'b0001) begin
    #6;
    $display("[EXECUTE] Time: %0t", $time);
    $display(" ALU_Output: %h | ALU_Op: %h | Operand1: %h | Operand2: %h | Src: %h",
            uut.alu_output, uut.alu_ctrl, uut.regA, uut.alu_opernad2, uut.alu_src);
    #5;
    $display(" Result_Buffer: %h", uut.result_buffer);
end

if (uut.opcode == 4'b0001 && uut.func == 3'b010) begin
    #11;
end

// Memory stage (for specific instruction types)
if (uut.opcode != 4'b0000 && uut.opcode != 4'b0010 && uut.opcode != 4'b1000 &&
    uut.opcode != 4'b0011 && uut.opcode != 4'b0111 && uut.opcode != 4'b0110 && uut.opcode != 4'b0001) begin
    #5;
    $display("[MEMORY] Time: %0t", $time);
    $display(" Mem_W: %b | Mem_R: %b | Data_in: %h | Data_out: %h | Address: %h",
            uut.mem_W, uut.mem_R, uut.regB, uut.memo_out, uut.result_buffer);
    #5;
end

// Write-back stage (for specific instruction types)
if (uut.opcode != 4'b0101 && uut.opcode != 4'b0111 && uut.opcode != 4'b0110 &&
    uut.opcode != 4'b1000 && uut.opcode != 4'b0001) begin
    #5;
    $display("[WRITEBACK]Time: %0t", $time);
    $display(" Write_en: %b | Bus_W: %h | Destination_Reg=R%0d", uut.write_enable, uut.Bus_w, uut.Rw);
    #5;
end

// Display Performance Registers
$display("\n==== Performance Registers ===");
$display("Total Instructions: %d", uut.total_instructions);
$display("Total Loads:      %d", uut.total_loads);
$display("Total Stores:     %d", uut.total_stores);
$display("Total ALU Ops:    %d", uut.total_alu);
$display("Total Controls:   %d", uut.total_controls);
$display("Clock Cycles:     %d", uut.clock_cycles);

// End the simulation
$finish;
end
endmodule

```

The simulation begins with a clock signal that toggles every 5 time units, driving the synchronous operations in the module. After an initial delay to skip a dummy instruction, a loop iterates through 9 instructions, emulating the pipeline stages. For each instruction, the testbench displays detailed information about the current state, including the program counter, instruction content, control signals, register values, ALU outputs, and memory interactions. Conditional checks determine whether certain stages (such as Execute or Memory) are skipped based on the opcode. After completing the loop, the testbench prints performance statistics, such as the total number of instructions executed, loads, stores, ALU operations, control instructions, and clock cycles.

### 6.3 Test Bench Execution Output

#### → Instruction One : SW R1 , 2(R2)



```
Console
° # 5:33 AM, Tuesday, January 14, 2025
° # Simulation has been initialized
° # Waveform file 'untitled.awc' connected to 'c:/My_Designs/arch/arch/src/wave.asdb'.
° run 300ns
° # KERNEL:
° # KERNEL: === Instruction 1 ===
° # KERNEL:
° # KERNEL: [FETCH] Time: 60 | PC: 0001 | Instruction: 5442
° # KERNEL: [DECODE] Time: 71
° # KERNEL: IR_type: 01 | Rd: 000 | Rs: 010 | Rt: 001 | Func: 000 | Imm: 000010 | Offset: 00000000
° # KERNEL: Bus_A: 0002 | Bus_B: 000a | RegA: 0002 | RegB: 000a | ExtImm: 0002
° # KERNEL: [EXECUTE] Time: 77
° # KERNEL: ALU_Output: 0004 | ALU_Op: 2 | Operand1: 0002 | Operand2: 0002 | Src: 0
° # KERNEL: Result_Buffer: 0004
° # KERNEL: [MEMORY] Time: 87
° # KERNEL: Mem_W: 1 | Mem_R: 0 | Data_in: 000a | Data_out: 0000 | Address: 0004
° # KERNEL:
> run 300ns
Ln 257, Col 28 INS
Activate Windows
Go to Settings to activate Windows.
7:00 AM 1/14/2025
```

The console output demonstrates the correct functioning of the datapath module across the stages.

During the Fetch stage, the instruction 5442 “SW R1 , 2(R2)” is fetched from PC = 0001. In the Decode stage, the instruction is correctly interpreted as an I-Type with the fields (IR\_type = 01, Rd = 000, Rs = 010, Rt = 001, Imm = 000010), and the appropriate register values (Bus\_A = 0002, Bus\_B = 000A, ExtImm = 0002) are routed.

The Execute stage performs the ALU operation (ALU\_Op = 2), producing the expected result of 0004 by adding operands 0002 and 0002. In the Memory stage, a write operation occurs with data (Data\_in = 000A) written to address 0004, as indicated by control signals (Mem\_W = 1, Mem\_R = 0), therefore a successful four stage store instruction has been executed.

## → Instruction Two: LW R5, 0 (R3):

To Ensure that the previous store operation was executed successfully, the same address written to , is loaded from in this instruction:



The screenshot shows a Windows command-line window titled "Console". The output of the command "run 300ns" is displayed, detailing the execution of an instruction. The instruction is identified as h'4740 (LW R5, 0 (R3)). The timeline shows the following stages: Fetch (Time: 100), Decode (Time: 111), Execute (Time: 117), Memory (Time: 127), and Writeback (Time: 137). In the Memory stage, a read operation retrieves the value 000A from address 0004. Finally, in the Writeback stage, the value 000A is written to register R5. The status bar at the bottom right indicates the date and time: 1/14/2025, 7:04 AM.

```
Console
◦ # KERNEL: === Instruction 2 ===
◦ # KERNEL: [FETCH]    Time: 100 | PC: 0002 | Instruction: 4740
◦ # KERNEL: [DECODE]   Time: 111
◦ # KERNEL:           IR_type: 01 | Rd: 000 | Rs: 011 | Rt: 101 | Func: 000 | Imm: 000000 | Offset: 000000000
◦ # KERNEL:           Bus_A: 0004 | Bus_B: 0000 | RegA: 0004 | RegB: 0000 | ExtImm: 0000
◦ # KERNEL: [EXECUTE]  Time: 117
◦ # KERNEL:           ALU_Output: 0004 | ALU_Op: 2 | Operand1: 0004 | Operand2: 0000 | Src: 0
◦ # KERNEL:           Result_Buffer: 0004
◦ # KERNEL: [MEMORY]   Time: 127
◦ # KERNEL:           Mem_W: 0 | Mem_R: 1 | Data_in: 0000 | Data_out: 000a | Address: 0004
◦ # KERNEL: [WRITEBACK]Time: 137
◦ # KERNEL:           Write_en: 1 | Bus_W: 000a | Destination_Reg=R5
◦ # KERNEL: === Instruction 3 ===
◦ # KERNEL:
> run 300ns
Ln 257, Col 28      INS
Activate Windows
Go to Settings to activate Windows.
Console
SEARCH FILE EXPLORER CHROME F1 FACEBOOK YOUTUBE PAINT SAFARI A 7:04 AM 1/14/2025
```

The load instruction (h'4740) LW R5, 0 (R3) is successfully executed as a multicycle operation, demonstrating proper retrieval of the value stored during the previous store instruction, since in

- the Fetch stage, the instruction is fetched at PC = 0002,
- and in the Decode stage, the base address (0004) from Rs and an offset of 0 are extracted.
- The Execute stage calculates the effective address (0004) using the base address and offset.
- In the Memory stage, a read operation retrieves the stored value (000A) from address 0004. Finally, in the Write-Back stage, the value (000A) is written to register R5 with write-back enabled. This successful execution confirms that the datapath correctly handles memory read and register update operations.

## → Instruction three : ADD R5,R3,R5



```
Console
° # KERNEL:           Write_en: 1 | Bus_W: 000a | Destination_Reg=R5
° # KERNEL:
° # KERNEL: === Instruction 3 ===
° # KERNEL:
° # KERNEL: [FETCH]   Time: 150 | PC: 0003 | Instruction: 0ae9
° # KERNEL: [DECODE]   Time: 161
° # KERNEL:           IR_type: 00 | Rd: 101 | Rs: 011 | Rt: 101 | Func: 001 | Imm: 000000 | Offset: 00000000
° # KERNEL:           Bus_A: 0004 | Bus_B: 000a | RegA: 0004 | RegB: 000a | ExtImm: 0000
° # KERNEL: [EXECUTE]  Time: 167
° # KERNEL:           ALU_Output: 000e | ALU_Op: 2 | Operand1: 0004 | Operand2: 000a | Src: 1
° # KERNEL:           Result_Buffer: 000e
° # KERNEL: [WRITEBACK]Time: 177
° # KERNEL:           Write_en: 1 | Bus_W: 000e | Destination_Reg=R5
° # KERNEL: === Instruction 4 ===
° # KERNEL:
° # KERNEL: [FETCH]   Time: 190 | PC: 0004 | Instruction: 3543
> run 300ns
```

The ADD R5, R3, R5 instruction (0A09) is successfully executed:

- In the Fetch stage, the instruction is fetched from PC = 0003.
- In the Decode stage, the fields are correctly interpreted as an R-Type ADD operation, with R3 (0004) and R5 (000A) being the same register loaded to in the previous section
- The Execute stage performs the addition in the ALU, producing the correct result (000E) for  $4 + 10 = 14$
- Finally, in the Write-Back stage, the result (000E) is written back to the destination register R1 with write-back enabled. The control signals and data flow through all stages operate seamlessly, confirming the success of the instruction execution

## → Instruction four : ADDI R5,R2,3



```
Console
◦ # KERNEL:           Write_en: 1 | Bus_W: 000e | Destination_Reg=R5
◦ # KERNEL:
◦ # KERNEL: === Instruction 4 ===
◦ # KERNEL:
◦ # KERNEL: [FETCH]   Time: 190 | PC: 0004 | Instruction: 3543
◦ # KERNEL: [DECODE]   Time: 201
◦ # KERNEL:           IR_type: 01 | Rd: 000 | Rs: 010 | Rt: 101 | Func: 000 | Imm: 000011 | Offset: 000000000
◦ # KERNEL:           Bus_A: 0002 | Bus_B: 000e | RegA: 0002 | RegB: 000e | ExtImm: 0003
◦ # KERNEL: [EXECUTE]  Time: 207
◦ # KERNEL:           ALU_Output: 0005 | ALU_Op: 2 | Operand1: 0002 | Operand2: 0003 | Src: 0
◦ # KERNEL:           Result_Buffer: 0005
◦ # KERNEL: [WRITEBACK]Time: 217
◦ # KERNEL:           Write_en: 1 | Bus_W: 0005 | Destination_Reg=R5
◦ # KERNEL:
◦ # KERNEL: === Instruction 5 ===
◦ # KERNEL:
◦ # KERNEL: [FETCH]   Time: 230 | PC: 0005 | Instruction: 6f82
> run 300ns
```

- In the Fetch stage, the instruction is fetched from PC = 0004.
- In the Decode stage, it is identified as an I-Type instruction where R2 (0002) is the source register, and the immediate value (0003) is extended.
- The Execute stage performs the addition operation in the ALU, adding R2 (0002) and the immediate value (0003), resulting in 0005.

In the Write-Back stage, the result (0005) is written back to the destination register R5 with write-back enabled. The correct decoding, computation, and register update confirm the successful execution of this I type instruction.

## → Instruction Five: BEQ R7, R6 , 2

```
Console
. # KERNEL: Result_Buffer: 0005
. # KERNEL: [WRITEBACK]Time: 217
. # KERNEL: Write_en: 1 | Bus_W: 0005 | Destination_Reg=R5
. # KERNEL:
. # KERNEL: === Instruction 5 ===
. # KERNEL:
. # KERNEL: [FETCH] Time: 230 | PC: 0005 | Instruction: 6f82
. # KERNEL: [DECODE] Time: 241
. # KERNEL: IR_type: 01 | Rd: 000 | Rs: 111 | Rt: 110 | Func: 000 | Imm: 000010 | Offset: 000000000
. # KERNEL: Bus_A: 000b | Bus_B: 000b | RegA: 000b | RegB: 000b | ExtImm: 0002
. # KERNEL: [EXECUTE] Time: 247
. # KERNEL: ALU_Output: 0000 | ALU_Op: 3 | Operand1: 000b | Operand2: 000b | Src: 1
. # KERNEL: Result_Buffer: 0000
. # KERNEL: === Instruction 7 ===
. # KERNEL:
. # KERNEL: [FETCH] Time: 260 | PC: 0007 | Instruction: 1049
> run 300ns
Console
```

The BEQ R7, R6, 2 instruction (6782) executes as a branch instruction where:

- In the Fetch stage, the instruction is fetched from PC = 0005.
- In the Decode stage, it is identified as an I-Type branch instruction, where R7 (000B) and R6 (000B) are the source registers to compare, and the immediate value (2) is extended to calculate the branch offset.
- In the Execute stage, the ALU determines that R7 is equal to R6 (000B == 000B), by subtracting the two numbers so the branch is taken.
- As a result, the PC changes to 0007 (current PC(5) + 2) based on the branch offset. Thus the next instruction will be instruction number 7 not 6

## → Instruction Six: ADDI R3,R2, 2 (Branched Over Skipped)

## → Instruction Seven: Call 9



The screenshot shows a Windows command-line window titled "Console". The output displays a trace of kernel-level operations, including memory bus cycles and instruction fetches. Key lines include:

```
o # KERNEL: Bus_A: 000b | Bus_B: 000b | RegA: 000b | RegB: 000b | ExtImm: 0002
o # KERNEL: [EXECUTE] Time: 247
o # KERNEL: ALU_Output: 0000 | ALU_Op: 3 | Operand1: 000b | Operand2: 000b | Src: 1
o # KERNEL: Result_Buffer: 0000
o # KERNEL:
o # KERNEL: === Instruction 7 ===
o # KERNEL:
o # KERNEL: [FETCH] Time: 260 | PC: 0007 | Instruction: 1049
o # KERNEL: [DECODE] Time: 271
o # KERNEL: IR_type: 10 | Rd: 000 | Rs: 000 | Rt: 000 | Func: 001 | Imm: 000000 | Offset: 000001001
o # KERNEL: Bus_A: 0000 | Bus_B: 0000 | RegA: 0000 | RegB: 0000 | ExtImm: 0002
o # KERNEL:
o # KERNEL: === Instruction 9 ===
> run 300ns
```

The status bar at the bottom right shows "Ln 257, Col 28", "INS", "7:36 AM", and the date "1/14/2025". A watermark for "Activate Windows" is visible in the background.

The CALL 9 instruction (1049) executes as a jump-and-link instruction:

- In the Fetch stage, the instruction is fetched from PC = 0007.
- In the Decode stage, it is identified as a J-Type instruction, with the immediate value (9) being the jump target address. Additionally, the return address (PC + 1 = 0008) is prepared to be saved in the return register.
- In the Execute stage, the effective jump address (9) is calculated, and the program counter (PC) is updated to 0009. The return address (0008) is stored for use in the return register when returning from the subroutine using RET.

## → Instruction Eight : SUB R6, R2, R7 (Jumped Over)

## → Instruction Nine: SLL R5,R3,R5



```
Console
o # KERNEL: Bus_A: 0000 | Bus_B: 0000 | RegA: 0000 | RegB: 0000 | ExtImm: 0002
o # KERNEL:
o # KERNEL: === Instruction 9 ===
o # KERNEL:
o # KERNEL: [FETCH] Time: 279 | PC: 0009 | Instruction: 0aeb
o # KERNEL: [DECODE] Time: 290
o # KERNEL: IR_type: 00 | Rd: 101 | Rs: 011 | Rt: 101 | Func: 011 | Imm: 000000 | Offset: 00000000
o # KERNEL: Bus_A: 0004 | Bus_B: 0005 | RegA: 0000 | RegB: 0000 | ExtImm: 0000
o # KERNEL: [EXECUTE] Time: 296
o # KERNEL: ALU_Output: 0080 | ALU_Op: 4 | Operand1: 0004 | Operand2: 0005 | Src: 1
o # KERNEL: Result_Buffer: 0080
o # KERNEL: [WRITEBACK]Time: 306
o # KERNEL: Write_en: 1 | Bus_W: 0080 | Destination_Reg=R5

> run 300ns
\ Console
```

Activate Windows  
Go to Settings to activate Windows.

Ln 257, Col 28 | INS |

7:41 AM 1/14/2025

The SLL R5, R3, R5 instruction (0AEB) executes successfully:

- In the Fetch stage, the instruction is fetched from PC = 0009. In the Decode stage, it is identified as an R-Type instruction, with R3 (0004) as the first source operand, R5 (0005) as the second source operand, and R5 as the destination register.
- The shift operation is specified by Func = 011. In the Execute stage, the ALU performs the shift-left logical operation, using R3 (0004) and the shift amount (0005), producing the result (0x0080).
- In the Write-Back stage, the result (0x0080) is written back to the destination register R5 with write-back enabled. This confirms the successful execution of the shift-left logical operation.

## → Instruction Ten: RET



The screenshot shows a software interface with a central console window displaying assembly-like code and status information. The code includes instructions for kernel operations, specifically related to the RET instruction (104A). It details the fetch, decode, and execute stages, mentioning the PC, instruction type (J-Type), and return register. Below the console is a toolbar with various icons, and at the bottom right is a system tray with date and time information.

```
Console
◦ # KERNEL:           Write_en: 1 | Bus_W: 0080 | Destination_Reg=R5
◦ # KERNEL:
◦ # KERNEL: === Instruction 10 ===
◦ # KERNEL:
◦ # KERNEL: [FETCH]   Time: 319 | PC: 000a | Instruction: 104a
◦ # KERNEL: [DECODE]  Time: 330
◦ # KERNEL:          IR_type: 10 | Rd: 000 | Rs: 000 | Rt: 000 | Func: 010 | Imm: 000000 | Offset: 000001001
◦ # KERNEL:          Bus_A: 0000 | Bus_B: 0000 | RegA: 0004 | RegB: 0080 | ExtImm: 0000
◦ # KERNEL:          Return register : 0008
◦ # KERNEL: |
◦ # KERNEL: === Instruction 8 ===
◦ # KERNEL:
◦ # KERNEL: [FETCH]   Time: 349 | PC: 0008 | Instruction: 0cba
> run 300ns
Console
```

Activate Windows  
Go to Settings to activate Windows.

Ln 268, Col 33      INS

11:53 AM      1/14/2025

- The RET instruction (104A) is successfully executed as a return-from-subroutine operation.
- In the Fetch stage, the instruction is fetched from PC = 000A.
- In the Decode stage, it is identified as a J-Type instruction with the return address (0008) retrieved from the return register.
- The Execute stage prepares to update the program counter (PC) to the return address (0008). As a result, control is transferred back to the instruction following the original CALL instruction.
- This confirms the successful execution of the return operation, ensuring proper subroutine functionality by jumping back to the stored return address

## → Instruction Eight : SUB R6, R2, R7 (Returned to)



```
Console
◦ # KERNEL: === Instruction 8 ===
◦ # KERNEL: [FETCH] Time: 349 | PC: 0008 | Instruction: 0cba
◦ # KERNEL: [DECODE] Time: 360
◦ # KERNEL: IR_type: 00 | Rd: 110 | Rs: 010 | Rt: 111 | Func: 010 | Imm: 000000 | Offset: 00000000
◦ # KERNEL: Bus_A: 0002 | Bus_B: 000b | RegA: 0000 | RegB: 0000 | ExtImm: 0000
◦ # KERNEL: [EXECUTE] Time: 366
◦ # KERNEL: ALU_Output: ffff7 | ALU_Op: 3 | Operand1: 0002 | Operand2: 000b | Src: 1
◦ # KERNEL: Result_Buffer: ffff7
◦ # KERNEL: [WRITEBACK]Time: 376
◦ # KERNEL: Write_en: 1 | Bus_W: ffff7 | Destination_Reg=R6
◦ # KERNEL:

> run 300ns
Console
```

The screenshot shows a Windows Command Prompt window titled "Console". The window displays a trace of the execution of an instruction. The trace starts with "# KERNEL: === Instruction 8 ===" and follows through the Fetch, Decode, Execute, and Writeback stages. In the Execute stage, it shows the ALU performing a subtraction operation (R2 - R7) resulting in FFF7. The Writeback stage shows the result being written back to the destination register R6. The command "run 300ns" is entered at the prompt. The status bar at the bottom right shows "Ln 268, Col 33", "INS", the date "1/14/2025", and the time "12:12 PM". A watermark for "Activate Windows" is visible in the background.

The SUB R6, R2, R7 instruction (0CBA) executes successfully after returning to it from the subroutine.

- In the Fetch stage, the instruction is fetched from PC = 0008.
- In the Decode stage, it is identified as an R-Type instruction with R2 (0002) and R7 (000B) as the source registers, and R6 as the destination register. The function code (010) specifies the subtraction operation. I
- In the Execute stage, the ALU performs the subtraction operation (R2 - R7), resulting in FFF7 (signed result of 2 - 11) which is -9.
- In the Write-Back stage, the result (FFF7) is written back to the destination register R6 with write-back enabled. This confirms the successful execution of the subtraction operation following the return from the subroutine.

## 6.4 Test Bench Execution Output after adding FOR

The Fourth instruction in the instruction file , was replaced by the FOR to ensure its successful execution, the new instruction Set :

### 0. Dummy Instruction

1. SW R1 , 2(R2)
2. LW R5, 0 (R3)
3. ADD R5,R3,R5
4. FOR R5,R2
5. BEQ R7, R6 , 2
6. ADDI R3,R2, 2
7. Call 9
8. SUB R6, R2, R7
9. SLL R5,R3,R5
10. RET

### → Execution Output:



- As observed from the previous waveform when the PC reached the FOR instruction (PC=4), the system branched to PC=2 , since Rs stores the loop target address, the address of the first instruction in the loop block, and in this case Rs = R5 = 2
  - After branching the Program continues executing until it reaches the FOR again (PC=4 ), to repeat the process multiple times until the number of iterations held in Rt= R3=3 , are decremented to zero , which are kept in the iterations Register and only accesses it when FOR is reached by an enable called iterations\_en as viewed in the previous waveform.

## 6.5 Performance Registers Output

After executing the instructions , the contents of the performance register files are displayed :

Console

```
• # KERNEL:           Write_en: 1 | Bus_W: fff7 | Destination_Reg=R6
• # KERNEL:
• # KERNEL: === Performance Registers ===
• # KERNEL: Total Instructions:    10
• # KERNEL: Total Loads:          1
• # KERNEL: Total Stores:         1
• # KERNEL: Total ALU Ops:        5
• # KERNEL: Total Controls:       3
• # KERNEL: Clock Cycles:        39
• # RUNTIME: Info: RUNTIME_0068 Processor.v (304): $finish called.
• # KERNEL: Time: 386 ps, Iteration: 0, Instance: /datapath_tb2, Process: @INITIAL#237_1@.
• # KERNEL: stopped at time: 386 ps
• # VSIM: Simulation has finished. There are no more test vectors to simulate.

> run 300ns
Console
```

From the previous figure;

- #### ➤ Total executed Instructions : 10

Dummy Instruction  
SW R1 , 2(R2)  
LW R5, 0 (R3)  
ADD R5,R3,R5  
ADDI R5,R2, 3  
BEQ R7, R6 , 2  
ADDI R3,R2, 2  
**Call 9**  
SUB R6, R2, R7  
SLL R5,R3,R5  
RET

In this Testbench , Only the  
instructions in red were executed

**-Which are 10 instructions**

- Total Loads : 1

**Dummy Instruction**

**SW R1 , 2(R2)**  
**LW R5, 0 (R3)**  
**ADD R5,R3,R5**  
**ADDI R5,R2, 3**  
**BEQ R7, R6 , 2**  
**ADDI R3,R2, 2**  
**Call 9**  
**SUB R6, R2, R7**  
**SLL R5,R3,R5**  
**RET**

Only one Load instruction

- Total Stores: 1

**Dummy Instruction**

**SW R1 , 2(R2)**  
**LW R5, 0 (R3)**  
**ADD R5,R3,R5**  
**ADDI R5,R2, 3**  
**BEQ R7, R6 , 2**  
**ADDI R3,R2, 2**  
**Call 9**  
**SUB R6, R2, R7**  
**SLL R5,R3,R5**  
**RET**

Only one Store instruction

- Total ALU ops : 5

**Dummy Instruction**

**SW R1 , 2(R2)**  
**LW R5, 0 (R3)**  
**ADD R5,R3,R5**  
**ADDI R5,R2, 3**  
**BEQ R7, R6 , 2**  
**ADDI R3,R2, 2 “wasn’t executed”**  
**Call 9**  
**SUB R6, R2, R7**  
**SLL R5,R3,R5**  
**RET**

Since the dummy instruction is

ADD R0,R0,R0

The Alu operations are 5

- Total control ops : 3

Dummy Instruction

SW R1 , 2(R2)

LW R5, 0 (R3)

ADD R5,R3,R5

ADDI R5,R2, 3

**BEQ R7, R6 , 2**

ADDI R3,R2, 2

**Call 9**

SUB R6, R2, R7

SLL R5,R3,R5

**RET**

Branch , call and return

The control operations are 3

- Total clock cycles : 39

Dummy Instruction :6 cycles

SW R1 , 2(R2) :4 cycles

LW R5, 0 (R3) :5 cycles

ADD R5,R3,R5 :4 cycles

ADDI R5,R2, 3 :4 cycles

BEQ R7, R6 , 2 :3 cycles

ADDI R3,R2, 2 :"not executed"

Call 9 :2 cycles

SUB R6, R2, R7. :4 cycles

SLL R5,R3,R5 :4 cycles

RET :3 cycles

Cycles = 5\*4+ 5+ 3+3+2+6 = 39 cycles

## 6.6 Multi-Cycle Scheduling

To Ensure that the processor in hand performs as a multi-cycle machine , the cycles of each instruction types is observed :



- **Store Instruction** at PC =1 , from the previous figure this instruction requires **four clock cycles**



Which is correct since this instruction requires 4 stages to execute:

- Fetch , Decode, Execute, Memory

- **Load Instruction** at PC =2, from the previous figure this instruction requires **five clock cycles**



Which is correct since this instruction requires 5 stages to execute:

- Fetch , Decode, Execute, Memory ,Write Back

- **Alu operations Instructions** at PC =3 (ADD), from the previous figure this instruction requires **4 clock cycles**



Which is correct since this instruction requires 4 stages to execute:

- Fetch, Decode, Execute, Memory

- **Branch Instruction** at PC =5 , from the previous figure this instruction requires **3 clock cycles**



Which is correct since this instruction requires 3 stages to execute:

- Fetch , Decode, Execute

- **J- type Instructions** at PC =7, from the previous figure this instruction requires **2 clock cycles**



Which is correct since this instruction requires 2 stages to execute:

- Fetch , Decode

## 7. Conclusion

To conclude, this project successfully demonstrates the design and verification of a 16-bit multicycle RISC processor. The processor includes a 16-bit program counter (PC), eight 16-bit general-purpose registers, a 16-bit return register (RR) for function calls, and performance registers to monitor program execution metrics. The architecture uses word-addressable memories, with separate physical memories for data and instructions, and supports R-, I-, and J-type instructions.

The multicycle design executes each instruction through distinct steps: fetch, decode, execute, memory access, and write-back. Boolean equations and control signals were developed to ensure the proper functioning of both the datapath and control path. Verification was achieved through a comprehensive testbench for the primary structural module and waveform simulations for each module.

Active-HDL was employed for RTL design and simulation, while Canva assisted the creation of detailed datapath diagrams. This report highlights the processor's design, implementation, and testing, providing valuable insights through supporting diagrams and test cases, ultimately validating the functionality of the proposed architecture.

## **8. References**

- 1] <https://www.geeksforgeeks.org/multi-cycle-data-path-and-control/>