

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY  
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY  
Faculty of Computer Science and Engineering



**LAB 3**  
**INTRODUCTION TO SYSTEM ON CHIP**  
**LABORATORY REPORT**  
Instructor: Pham Kieu Nhat Anh

Class: CC01  
Name: Hoang Thuy Tram  
Student ID: 2353202

Ho Chi Minh city, December 2025

# Contents

|          |                                                |           |
|----------|------------------------------------------------|-----------|
| <b>1</b> | <b>Github</b>                                  | <b>1</b>  |
| <b>2</b> | <b>Overview</b>                                | <b>1</b>  |
| <b>3</b> | <b>RTL implementation</b>                      | <b>2</b>  |
| <b>4</b> | <b>Testbench</b>                               | <b>14</b> |
| 4.1      | RegFile testbench . . . . .                    | 14        |
| 4.2      | Processor testbench . . . . .                  | 18        |
| <b>5</b> | <b>Testbench results</b>                       | <b>23</b> |
| 5.1      | RegFile testbench results . . . . .            | 23        |
| 5.2      | Processor testbench results . . . . .          | 33        |
| <b>6</b> | <b>Simulation waveform</b>                     | <b>35</b> |
| 6.1      | RegFile simulation waveform . . . . .          | 35        |
| 6.2      | Processor simulation waveform . . . . .        | 36        |
| <b>7</b> | <b>Timing closure and Resource utilization</b> | <b>38</b> |
| 7.1      | Timing closure . . . . .                       | 38        |
| 7.2      | Resource utilization . . . . .                 | 39        |

# 1 Github

The Lab 3 project files (RTL source code, testbenches, simulation scripts) are located in the Lab03 folder. They are available at the following repository: <https://github.com/trxmhoang/SoC.git>

## 2 Overview

The `DatapathSingleCycle.v` file details the design and implementation of a 32-bit single-cycle RISC-V processor using Verilog. The processor implements the RV32I base integer instruction set. The architecture is single-cycle, meaning every instruction completes its execution within one clock cycle, whether it is a simple addition or a complex memory access. The design consists of four main modules:

- `RegFile`: The register file contains 32 general-purpose registers.
  - The RISC-V architecture specifies 32 registers (`x0` to `x31`), each 32 bits wide. `RegFile` module is used to implement this storage.
  - In RISC-V, register `x0` is hardwired to zero. Therefore, the module also checks if the destination register is `x0` to ensure that any instruction trying to write to `x0` effectively does nothing.
  - Writing to registers happens on the `posedge clk`, ensuring state updates are synchronized.
  - Reading is combination, which happens immediately, allowing the datapath to use register values within the same clock cycle.
- `DatapathSingleCycle`: The core logic that handles instruction decoding, ALU operations and control flow. It is the “brain” of the processor.
  - The 32-bit instruction received from instruction memory is sliced into specific fields. This splitting allows the processor to immediately identify the source registers (`rs1`, `rs2`), the destination register (`rd`) and the operation codes (`opcode`, `funct3`, `funct7`).
  - RISC-V instructions pack immediate values (constants) differently depending on the instruction type (I, S, B, J). The code extracts these bits and performs sign extension, replicates the most significant bit (MSB) to fill the upper bits of the 32-bit word, allowing the processor to handle negative numbers correctly in arithmetic operations.
  - The core logic is implemented in a large combinatorial `always @(*)` block. This block acts as the control unit. Based on the `inst_opcode`, the processor determines ALU operations (add, subtract, XOR, etc), memory access (read/write to memory via `OpLoad`, `OpStore`) and branching (whether to change the Program Counter (`pcNext`) based on the comparisons).
- `MemorySingleCycle`: A unified memory module for both instructions and data.
  - The memory module simulates a Von Neumann architecture physically with one memory array, but behaves like a Harvard architecture with separate ports for instruction and data to support single-cycle execution.
  - While the memory array is 32-bits wide, RISC-V addresses memory by byte. The code handles this using `store_we_dmem`, a 4-bit write enable mask. This allows writing individual bytes (SB), half-word (SH) or full words (SW) without corrupting neighbors.

- Processor: The top-level module connecting the datapath and memory.

In this design, we use `cla` module from Lab 2 to implement the `addi`, `add`, and `sub` instructions for fast addition and subtraction. Other situations where we need to add things, such as incrementing the PC or computing branch targets, we will use the ‘+’ operator. Module `divu_liter`, `divider_unsigned` from Lab 1 are also used to replace ‘/’, ‘%’ operators that handle division and remainder operations.

The logic is designed so that `pcCurrent` updates exactly once per clock edge. In the time between clock edges, the signal propagates through:

- Instruction memory fetch
- Register read
- ALU execution
- Data memory access
- Register write back

This design is conceptually simple and easy to debug. The CPI (Cycles Per Instruction) is exactly 1. However, the clock frequency is limited by the longest possible path (usually a Load instruction), making it slower than pipelined architectures in terms of absolute time.

### 3 RTL implementation

The provided Verilog code successfully implements a functional subset of the RISC-V 32-bit instruction set. By modularizing the design into Datapath, Control (embedded in Datapath) and Memory, the system achieves a clear separation of concerns. The use of combinatorial logic for instruction decoding ensures that the processor can react to new instructions immediately within the single clock cycle, satisfying the requirements of the architecture.

```

1  `timescale 1ns / 1ns
2
3 // registers are 32 bits in RV32
4 `define REG_SIZE 31
5
6 // RV opcodes are 7 bits
7 `define OPCODE_SIZE 6
8
9 // Don't forget your previous ALUs
10 /* `include "divider_unsigned.v"
11 `include "cla.v" */
12
13 module RegFile (
14     input      [        4:0] rd ,
15     input      [`REG_SIZE:0] rd_data ,
16     input      [        4:0] rs1 ,
17     output reg [`REG_SIZE:0] rs1_data ,
18     input      [        4:0] rs2 ,
19     output reg [`REG_SIZE:0] rs2_data ,
20     input

```

```

21      input                  we ,
22      input                  rst
23  );
24  localparam NumRegs = 32;
25  reg ['REG_SIZE:0] regs[0:NumRegs-1];
26  integer i;
27
28 // TODO: your code here
29  always @(posedge clk or posedge rst) begin
30    if (rst) begin
31      for (i = 0; i < NumRegs; i = i + 1) begin
32        regs[i] <= 0;
33      end
34    end else if (we && (rd != 0)) begin
35      regs[rd] <= rd_data;
36    end
37  end
38
39  always @(*) begin
40    rs1_data = (rs1 == 0) ? 32'd0 : regs[rs1];
41    rs2_data = (rs2 == 0) ? 32'd0 : regs[rs2];
42  end
43 endmodule
44
45 module DatapathSingleCycle (
46   input                  clk ,
47   input                  rst ,
48   output reg             halt ,
49   output      ['REG_SIZE:0] pc_to_imem ,
50   input       ['REG_SIZE:0] inst_from_imem ,
51   // addr_to_dmem is a read-write port
52   output reg ['REG_SIZE:0] addr_to_dmem ,
53   input       ['REG_SIZE:0] load_data_from_dmem ,
54   output reg ['REG_SIZE:0] store_data_to_dmem ,
55   output reg [      3:0] store_we_to_dmem
56 );
57
58 // components of the instruction
59 wire [      6:0] inst FUNCT7;
60 wire [      4:0] inst_RS2;
61 wire [      4:0] inst_RS1;
62 wire [      2:0] inst FUNCT3;
63 wire [      4:0] inst RD;
64 wire ['OPCODE_SIZE:0] inst OPCODE;
65
66 // split R-type instruction - see section 2.2 of RiscV spec
67 assign {inst FUNCT7, inst RS2, inst RS1, inst FUNCT3, inst RD,
68         inst OPCODE} = inst from_imem;
69
70 // setup for I, S, B & J type instructions
71 // I - short immediates and loads
72 wire [11:0] imm_i;
73 assign imm_i = inst from_imem[31:20];

```

```

73   wire [ 4:0] imm_shamt = inst_from_imem[24:20];
74
75 // S - stores
76 wire [11:0] imm_s;
77 assign imm_s = {inst FUNCT7, inst RD};
78
79 // B - conditionals
80 wire [12:0] imm_b;
81 assign {imm_b[12], imm_b[10:1], imm_b[11], imm_b[0]} = {inst FUNCT7,
82           inst RD, 1'b0};
83
84 // J - unconditional jumps
85 wire [20:0] imm_j;
86 assign {imm_j[20], imm_j[10:1], imm_j[11], imm_j[19:12], imm_j[0]} =
87   {inst_from_imem[31:12], 1'b0};
88
89 wire ['REG_SIZE:0] imm_i_sext = {{20{imm_i[11]}}, imm_i[11:0]};
90 wire ['REG_SIZE:0] imm_s_sext = {{20{imm_s[11]}}, imm_s[11:0]};
91 wire ['REG_SIZE:0] imm_b_sext = {{19{imm_b[12]}}, imm_b[12:0]};
92 wire ['REG_SIZE:0] imm_j_sext = {{11{imm_j[20]}}, imm_j[20:0]};
93
94 // opcodes - see section 19 of RiscV spec
95 localparam ['OPCODE_SIZE:0] OpLoad      = 7'b00_000_11;
96 localparam ['OPCODE_SIZE:0] OpStore      = 7'b01_000_11;
97 localparam ['OPCODE_SIZE:0] OpBranch     = 7'b11_000_11;
98 localparam ['OPCODE_SIZE:0] OpJalr      = 7'b11_001_11;
99 localparam ['OPCODE_SIZE:0] OpMiscMem   = 7'b00_011_11;
100 localparam ['OPCODE_SIZE:0] OpJal       = 7'b11_011_11;
101
102 localparam ['OPCODE_SIZE:0] OpRegImm    = 7'b00_100_11;
103 localparam ['OPCODE_SIZE:0] OpRegReg    = 7'b01_100_11;
104 localparam ['OPCODE_SIZE:0] OpEnviron   = 7'b11_100_11;
105
106 localparam ['OPCODE_SIZE:0] OpAuipc     = 7'b00_101_11;
107 localparam ['OPCODE_SIZE:0] OpLui       = 7'b01_101_11;
108
109 wire inst_lui      = (inst_opcode == OpLui      );
110 wire inst_auiipc   = (inst_opcode == OpAuipc   );
111 wire inst_jal      = (inst_opcode == OpJal      );
112 wire inst_jalr     = (inst_opcode == OpJalr     );
113
114 wire inst_beq      = (inst_opcode == OpBranch ) & (inst_from_imem
115   [14:12] == 3'b000);
116 wire inst_bne      = (inst_opcode == OpBranch ) & (inst_from_imem
117   [14:12] == 3'b001);
118 wire inst_blt      = (inst_opcode == OpBranch ) & (inst_from_imem
119   [14:12] == 3'b100);
120 wire inst_bge      = (inst_opcode == OpBranch ) & (inst_from_imem
121   [14:12] == 3'b101);
122 wire inst_bltu     = (inst_opcode == OpBranch ) & (inst_from_imem
123   [14:12] == 3'b110);
124 wire inst_bgeu     = (inst_opcode == OpBranch ) & (inst_from_imem
125   [14:12] == 3'b111);

```

```

118     wire inst_lb      = (inst_opcode == OpLoad) & (inst_from_imem
119         [14:12] == 3'b000);
120     wire inst_lh      = (inst_opcode == OpLoad) & (inst_from_imem
121         [14:12] == 3'b001);
122     wire inst_lw      = (inst_opcode == OpLoad) & (inst_from_imem
123         [14:12] == 3'b010);
124     wire inst_lbu     = (inst_opcode == OpLoad) & (inst_from_imem
125         [14:12] == 3'b100);
126     wire inst_lhu     = (inst_opcode == OpLoad) & (inst_from_imem
127         [14:12] == 3'b101);
128
129     wire inst_sb      = (inst_opcode == OpStore) & (inst_from_imem
130         [14:12] == 3'b000);
131     wire inst_sh      = (inst_opcode == OpStore) & (inst_from_imem
132         [14:12] == 3'b001);
133     wire inst_sw      = (inst_opcode == OpStore) & (inst_from_imem
134         [14:12] == 3'b010);
135
135     wire inst_addi    = (inst_opcode == OpRegImm) & (inst_from_imem
136         [14:12] == 3'b000);
137     wire inst_slti    = (inst_opcode == OpRegImm) & (inst_from_imem
138         [14:12] == 3'b010);
139     wire inst_sltiu   = (inst_opcode == OpRegImm) & (inst_from_imem
140         [14:12] == 3'b011);
141     wire inst_xori    = (inst_opcode == OpRegImm) & (inst_from_imem
142         [14:12] == 3'b100);
143     wire inst_ori     = (inst_opcode == OpRegImm) & (inst_from_imem
144         [14:12] == 3'b110);
145     wire inst_andi    = (inst_opcode == OpRegImm) & (inst_from_imem
146         [14:12] == 3'b111);
147
147     wire inst_slli    = (inst_opcode == OpRegImm) & (inst_from_imem
148         [14:12] == 3'b001) & (inst_from_imem[31:25] == 7'd0);
149     wire inst_srli    = (inst_opcode == OpRegImm) & (inst_from_imem
150         [14:12] == 3'b101) & (inst_from_imem[31:25] == 7'd0);
151     wire inst_srai    = (inst_opcode == OpRegImm) & (inst_from_imem
152         [14:12] == 3'b101) & (inst_from_imem[31:25] == 7'b0100000);
153
153     wire inst_add     = (inst_opcode == OpRegReg) & (inst_from_imem
154         [14:12] == 3'b000) & (inst_from_imem[31:25] == 7'd0);
155     wire inst_sub     = (inst_opcode == OpRegReg) & (inst_from_imem
156         [14:12] == 3'b000) & (inst_from_imem[31:25] == 7'b0100000);
157     wire inst_sll     = (inst_opcode == OpRegReg) & (inst_from_imem
158         [14:12] == 3'b001) & (inst_from_imem[31:25] == 7'd0);
159     wire inst_slt     = (inst_opcode == OpRegReg) & (inst_from_imem
160         [14:12] == 3'b010) & (inst_from_imem[31:25] == 7'd0);
161     wire inst_sltu   = (inst_opcode == OpRegReg) & (inst_from_imem
162         [14:12] == 3'b011) & (inst_from_imem[31:25] == 7'd0);
163     wire inst_xor     = (inst_opcode == OpRegReg) & (inst_from_imem
164         [14:12] == 3'b100) & (inst_from_imem[31:25] == 7'd0);
165     wire inst_srl     = (inst_opcode == OpRegReg) & (inst_from_imem
166         [14:12] == 3'b101) & (inst_from_imem[31:25] == 7'd0);

```

```

147    wire inst_sra      = (inst_opcode == OpRegReg) & (inst_from_imem
148        [14:12] == 3'b101) & (inst_from_imem[31:25] == 7'b0100000);
149    wire inst_or       = (inst_opcode == OpRegReg) & (inst_from_imem
150        [14:12] == 3'b110) & (inst_from_imem[31:25] == 7'd0      );
151    wire inst_and      = (inst_opcode == OpRegReg) & (inst_from_imem
152        [14:12] == 3'b111) & (inst_from_imem[31:25] == 7'd0      );
153
154    wire inst_mul      = (inst_opcode == OpRegReg) & (inst_from_imem
155        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b000    );
156    wire inst_mulh     = (inst_opcode == OpRegReg) & (inst_from_imem
157        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b001    );
158    wire inst_mulhsu   = (inst_opcode == OpRegReg) & (inst_from_imem
159        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b010    );
160    wire inst_mulhu   = (inst_opcode == OpRegReg) & (inst_from_imem
161        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b011    );
162    wire inst_div      = (inst_opcode == OpRegReg) & (inst_from_imem
163        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b100    );
164    wire inst_divu     = (inst_opcode == OpRegReg) & (inst_from_imem
165        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b101    );
166    wire inst_rem      = (inst_opcode == OpRegReg) & (inst_from_imem
167        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b110    );
168    wire inst_remu     = (inst_opcode == OpRegReg) & (inst_from_imem
169        [31:25] == 7'd1 ) & (inst_from_imem[14:12] == 3'b111    );
170
171    wire inst_ecall    = (inst_opcode == OpEnviron) & (inst_from_imem
172        [31:7] == 25'd0 );
173    wire inst_fence    = (inst_opcode == OpMiscMem);
174
175    // program counter
176    reg ['REG_SIZE:0] pcNext, pcCurrent;
177    always @ (posedge clk) begin
178        if (rst) begin
179            pcCurrent <= 32'd0;
180        end else begin
181            pcCurrent <= pcNext;
182        end
183    end
184    assign pc_to_imem = pcCurrent;
185
186    // cycle/inst._from_imem counters
187    reg ['REG_SIZE:0] cycles_current, num_inst_current;
188    always @ (posedge clk) begin
189        if (rst) begin
190            cycles_current <= 0;
191            num_inst_current <= 0;
192        end else begin
193            cycles_current <= cycles_current + 1;
194            if (!rst) begin
195                num_inst_current <= num_inst_current + 1;
196            end
197        end
198    end
199
```

```

188 // NOTE: don't rename your RegFile instance as the tests expect it
189 // to be 'rf'
190 // TODO: you will need to edit the port connections, however.
191 wire ['REG_SIZE:0] rs1_data;
192 wire ['REG_SIZE:0] rs2_data;
193 reg ['REG_SIZE:0] rd_data;
194 reg rf_we;
195
196 RegFile rf (
197     .clk      (clk),
198     .rst      (rst),
199     .we       (rf_we),
200     .rd       (inst_rd),
201     .rd_data  (rd_data),
202     .rs1      (inst_rs1),
203     .rs2      (inst_rs2),
204     .rs1_data (rs1_data),
205     .rs2_data (rs2_data)
206 );
207
208 wire [31:0] cla_res, cla_op_b, cla_op_b_raw;
209 wire cla_cin;
210
211 assign cla_op_b_raw = (inst_opcode == OpRegImm) ? imm_i_sext :
212     rs2_data;
213 assign cla_op_b = inst_sub ? ~cla_op_b_raw : cla_op_b_raw;
214 assign cla_cin = inst_sub ? 1'b1 : 1'b0;
215
216 cla u_cla (
217     .a    (rs1_data),
218     .b    (cla_op_b),
219     .cin  (cla_cin),
220     .sum   (cla_res)
221 );
222
223 wire [31:0] div_op_a, div_op_b;
224 wire is_signed_div;
225 assign is_signed_div = inst_div | inst_rem;
226 assign div_op_a = (is_signed_div & rs1_data[31]) ? (~rs1_data + 1) :
227     rs1_data;
228 assign div_op_b = (is_signed_div & rs2_data[31]) ? (~rs2_data + 1) :
229     rs2_data;
230
231 wire [31:0] div_quot_raw, div_rem_raw;
232 divider_unsigned u_div (
233     .dividend (div_op_a),
234     .divisor  (div_op_b),
235     .quotient (div_quot_raw),
236     .remainder (div_rem_raw)
237 );
238
239 reg [31:0] div_res;
240 always @(*) begin

```

```

237     if (inst_div || inst_divu) begin
238         if (rs2_data == 0) begin
239             div_res = 32'hFFFF_FFFF;
240         end else if (inst_div && (rs1_data == 32'h8000_0000) && (
241             rs2_data == 32'hFFFF_FFFF)) begin
242             div_res = 32'h8000_0000;
243         end else if (is_signed_div && (rs1_data[31] ^ rs2_data[31]))
244             begin
245                 div_res = ~div_quot_raw + 32'd1;
246             end else begin
247                 div_res = div_quot_raw;
248             end
249         end else begin // rem or remu
250             if (rs2_data == 0) begin
251                 div_res = rs1_data;
252             end else if (inst_rem && (rs1_data == 32'h8000_0000) && (
253                 rs2_data == 32'hFFFF_FFFF)) begin
254                 div_res = 32'd0;
255             end else if (is_signed_div && (rs1_data[31])) begin
256                 div_res = ~div_rem_raw + 32'd1;
257             end else begin
258                 div_res = div_rem_raw;
259             end
260         end
261     end
262
263     wire [63:0] mul_u, mul_s, mul_su;
264     assign mul_u = {32'b0, rs1_data} * {32'b0, rs2_data};
265     assign mul_s = $signed(rs1_data) * $signed(rs2_data);
266     assign mul_su = $signed(rs1_data) * $signed({1'b0, rs2_data});
267
268     reg take_branch;
269     always @(*) begin
270         case (inst FUNCT3)
271             3'b000: take_branch = (rs1_data == rs2_data);
272                 // beq
273             3'b001: take_branch = (rs1_data != rs2_data);
274                 // bne
275             3'b100: take_branch = ($signed(rs1_data) < $signed(rs2_data));
276                 // blt
277             3'b101: take_branch = ($signed(rs1_data) >= $signed(rs2_data));
278                 // bge
279             3'b110: take_branch = (rs1_data < rs2_data);
280                 // bltu
281             3'b111: take_branch = (rs1_data >= rs2_data);
282                 // bgeu
283             default: take_branch = 1'b0;
284         endcase
285     end
286
287     reg [31:0] load_data;
288     always @(*) begin
289         case (addr_to_dmem[1:0])

```

```

281     2'b00: load_data = load_data_from_dmem;
282     2'b01: load_data = load_data_from_dmem >> 8;
283     2'b10: load_data = load_data_from_dmem >> 16;
284     2'b11: load_data = load_data_from_dmem >> 24;
285   endcase
286 end
287
288 reg illegal_inst;
289
290 always @(*) begin
291   illegal_inst = 1'b0;
292   pcNext = pcCurrent + 4;
293   rf_we = 0;
294   rd_data = 0;
295   halt = 0;
296   addr_to_dmem = 0;
297   store_data_to_dmem = 0;
298   store_we_to_dmem = 0;
299
300   case (inst_opcode)
301     OpLui: begin
302       // TODO: start here by implementing lui
303       rf_we = 1;
304       rd_data = {inst_from_imem[31:12], 12'b0};
305     end
306
307     OpAuipc: begin
308       rf_we = 1;
309       rd_data = pcCurrent + {inst_from_imem[31:12], 12'b0};
310     end
311
312     OpJal: begin
313       rf_we = 1;
314       rd_data = pcCurrent + 4;
315       pcNext = pcCurrent + imm_j_sext;
316     end
317
318     OpJalr: begin
319       rf_we = 1;
320       rd_data = pcCurrent + 4;
321       pcNext = (rs1_data + imm_i_sext) & ~32'd1;
322     end
323
324     OpBranch: begin
325       if (take_branch) begin
326         pcNext = pcCurrent + imm_b_sext;
327       end
328     end
329
330     OpLoad: begin
331       rf_we = 1;
332       addr_to_dmem = rs1_data + imm_i_sext;
333

```

```

334     case (inst_func3)
335         3'b000: rd_data = {{24{load_data[7]}}, load_data[7:0]}; // lb
336         3'b001: rd_data = {{16{load_data[15]}}, load_data[15:0]}; // lh
337         3'b010: rd_data = load_data; // lw
338         3'b100: rd_data = {24'b0, load_data[7:0]}; // lbu
339         3'b101: rd_data = {16'b0, load_data[15:0]}; // lhu
340     endcase
341 end
342
343 OpStore: begin
344     addr_to_dmem = rs1_data + imm_s_sext;
345     store_data_to_dmem = rs2_data << (addr_to_dmem[1:0] * 8);
346
347     case (inst_func3)
348         3'b000: store_we_to_dmem = 4'b0001 << addr_to_dmem[1:0]; // sb
349         3'b001: store_we_to_dmem = 4'b0011 << addr_to_dmem[1:0]; // sh
350         3'b010: store_we_to_dmem = 4'b1111; // sw
351     endcase
352 end
353
354 OpRegImm: begin
355     rf_we = 1;
356     case (inst_func3)
357         3'b000: rd_data = cla_res; // addi
358         3'b010: rd_data = ($signed(rs1_data) < $signed(imm_i_sext))
359             ? 32'd1 : 32'd0; // slti
360         3'b011: rd_data = (rs1_data < imm_i_sext) ? 32'd1 : 32'd0;
361             // sltiu
362         3'b100: rd_data = rs1_data ^ imm_i_sext;
363             // xor
364         3'b110: rd_data = rs1_data | imm_i_sext;
365             // ori
366         3'b111: rd_data = rs1_data & imm_i_sext;
367             // andi
368         3'b001: rd_data = rs1_data << imm_shamt;
369             // slli
370         3'b101: begin
371             if (inst_func7 == 7'd0) begin
372                 rd_data = rs1_data >> imm_shamt; // srli
373             end else begin
374                 rd_data = $signed(rs1_data) >>> imm_shamt; // srai
375             end
376         endcase
377     end

```

```

373
374     OpRegReg: begin
375         rf_we = 1;
376         if (inst_func7 == 7'd1) begin
377             case (inst_func3)
378                 3'b000: rd_data = rs1_data * rs2_data;           // mul
379                 3'b001: rd_data = mul_s[63:32];               // mulh
380                 3'b010: rd_data = mul_su[63:32];              //
381                         mulhsu
382                 3'b011: rd_data = mul_u[63:32];               // mulhu
383                         mulhu
384                 3'b100, 3'b101, 3'b110, 3'b111: rd_data = div_res; // div,
385                         divu, rem, remu
386             endcase
387         end else begin
388             case (inst_func3)
389                 3'b000: rd_data = cla_res;                   // add,
390                         sub
391                 3'b001: rd_data = rs1_data << rs2_data[4:0];
392                         // sll
393                 3'b010: rd_data = ($signed(rs1_data) < $signed(rs2_data))
394                         ? 32'd1 : 32'd0; // slt
395                 3'b011: rd_data = (rs1_data < rs2_data) ? 32'd1 : 32'd0;
396                         // sltu
397                 3'b100: rd_data = rs1_data ^ rs2_data;
398                         // xor
399                 3'b101: begin
400                     if (inst_func7[5]) rd_data = $signed(rs1_data) >>>
401                         rs2_data[4:0];           // sra
402                     else rd_data = rs1_data >> rs2_data[4:0];
403                         // srl
404                 end
405                 3'b110: rd_data = rs1_data | rs2_data;
406                         // or
407                 3'b111: rd_data = rs1_data & rs2_data;
408                         // and
409             endcase
410         end
411     end
412
413     OpEnviron: begin
414         if (inst_from_imem[31:7] == 25'd0) halt = 1'b1; // ecall
415     end
416
417     default: begin
418         illegal_inst = 1'b1;
419     end
420   endcase
421 end
422
423 /* A memory module that supports 1-cycle reads and writes, with one

```

```

    read-only port
413 * and one read+write port.
414 */
415 module MemorySingleCycle #(
416     parameter NUM_WORDS = 512
417 ) (
418     input                         rst,           // rst for both imem
419     and dmem
420     input                         clock_mem,      // clock for both imem
421     and dmem
422     input      ['REG_SIZE:0] pc_to_imem,        // must always be
423     aligned to a 4B boundary
424     output reg ['REG_SIZE:0] inst_from_imem,    // the value at memory
425     location pc_to_imem
426     input      ['REG_SIZE:0] addr_to_dmem,       // must always be
427     aligned to a 4B boundary
428     output reg ['REG_SIZE:0] load_data_from_dmem, // the value at memory
429     location addr_to_dmem
430     input      ['REG_SIZE:0] store_data_to_dmem, // the value to be
431     written to addr_to_dmem, controlled by store_we_to_dmem
432     // Each bit determines whether to write the corresponding byte of
433     // store_data_to_dmem to memory location addr_to_dmem.
434     // E.g., 4'b1111 will write 4 bytes. 4'b0001 will write only the
435     least-significant byte.
436     input      [            3:0] store_we_to_dmem
437 );
438
439     // memory is arranged as an array of 4B words
440     reg      ['REG_SIZE:0] mem_array[0:NUM_WORDS-1];
441
442     // preload instructions to mem_array
443     always @(posedge rst) begin
444         $readmemh("mem_initial_contents.hex", mem_array);
445     end
446
447     localparam AddrMsb = $clog2(NUM_WORDS) + 1;
448     localparam AddrLsb = 2;
449
450     always @(posedge clock_mem) begin
451         inst_from_imem <= mem_array[{pc_to_imem[AddrMsb:AddrLsb]}];
452     end
453
454     always @(negedge clock_mem) begin
455         if (store_we_to_dmem[0]) begin
456             mem_array[addr_to_dmem[AddrMsb:AddrLsb]][7:0] <=
457                 store_data_to_dmem[7:0];
458         end
459         if (store_we_to_dmem[1]) begin
460             mem_array[addr_to_dmem[AddrMsb:AddrLsb]][15:8] <=
461                 store_data_to_dmem[15:8];
462         end
463         if (store_we_to_dmem[2]) begin
464             mem_array[addr_to_dmem[AddrMsb:AddrLsb]][23:16] <=
465

```

```

        store_data_to_dmem[23:16];
454    end
455    if (store_we_to_dmem[3]) begin
456        mem_array[addr_to_dmem[AddrMsb:AddrLsb]][31:24] <=
457            store_data_to_dmem[31:24];
458    end
459    // dmem is "read-first": read returns value before the write
460    load_data_from_dmem <= mem_array[{addr_to_dmem[AddrMsb:AddrLsb]}];
461 end
462
463 /*
464 This shows the relationship between clock_proc and clock_mem. The
465     clock_mem is
466 phase-shifted 90° from clock_proc. You could think of one proc cycle
467 being
468 broken down into 3 parts. During part 1 (which starts @posedge
469     clock_proc)
470 the current PC is sent to the imem. In part 2 (starting @posedge
471     clock_mem) we
472 read from imem. In part 3 (starting @negedge clock_mem) we read/write
473     memory and
474 prepare register/PC updates, which occur at @posedge clock_proc.
475
476 proc: | ----
477     | ----
478     | ----
479     | ----
480 */
481 module Processor (
482     input  clock_proc,
483     input  clock_mem,
484     input  rst,
485     output halt
486 );
487
488     wire ['REG_SIZE:0] pc_to_imem, inst_from_imem, mem_data_addr,
489         mem_data_loaded_value, mem_data_to_write;
490     wire [      3:0] mem_data_we;
491
492     // This wire is set by cocotb to the name of the currently-running
493     // test, to make it easier
494     // to see what is going on in the waveforms.
495     wire [(8*32)-1:0] test_case;
496
497     MemorySingleCycle #(
498         .NUM_WORDS(8192)
499     ) memory (
500         .rst              (rst),
501         .clock_mem        (clock_mem),
502         // imem is read-only
503         .pc_to_imem       (pc_to_imem),
504         .inst_from_imem   (inst_from_imem),

```

```

498     // dmem is read-write
499     .addr_to_dmem      (mem_data_addr),
500     .load_data_from_dmem (mem_data_loaded_value),
501     .store_data_to_dmem (mem_data_to_write),
502     .store_we_to_dmem   (mem_data_we)
503 );
504
505 DatapathSingleCycle datapath (
506     .clk              (clock_proc),
507     .rst              (rst),
508     .pc_to_imem       (pc_to_imem),
509     .inst_from_imem   (inst_from_imem),
510     .addr_to_dmem     (mem_data_addr),
511     .store_data_to_dmem (mem_data_to_write),
512     .store_we_to_dmem   (mem_data_we),
513     .load_data_from_dmem (mem_data_loaded_value),
514     .halt             (halt)
515 );
516 endmodule

```

Program 1: RTL implementation of the DatapathSingleCycle module.

## 4 Testbench

### 4.1 RegFile testbench

This test bench is used to verify the correctness of Register File. It uses two main tasks:

- Task `wr` for simulates a synchronous write operation by driving `we`, `rd` and `rd_data` aligned with the positive clock edge.
- Task `check` to sample the read ports (`rs1_data`, `rs2_data`) on the negative clock edge. This ensures the data from the DUT is stable before comparison.

The test bench provides four distinct tests to validate the correctness of the module:

- Test 1 (x0 verification): To confirm that register x0 is read-only and hardwired to zero.
- Test 2 (Basic read/write sanity check): To confirm basic storage capability.
- Test 3 (Iterative register access): To verify address decoding and reset behavior for all registers. The test generates a random 32-bit integer, which is then written to the register and immediately verified.
- Test 4 (Write all and read back): To ensure that writing to one register does not corrupt data in others. The test filling all registers then use a loop to read them back sequentially.

```

1  `timescale 1ns / 1ps
2  module tb_reg;
3  reg clk, rst, we;
4  reg [4:0] rs1, rs2, rd;
5  reg [31:0] rd_data;
6  wire [31:0] rs1_data, rs2_data;

```

```

7
8 integer i, pass, fail;
9 reg [31:0] wr_val [0:31];
10 reg [31:0] exp_val;
11
12 RegFile dut (
13     .clk (clk),
14     .rst (rst),
15     .we (we),
16     .rs1 (rs1),
17     .rs2 (rs2),
18     .rd (rd),
19     .rd_data (rd_data),
20     .rs1_data (rs1_data),
21     .rs2_data (rs2_data)
22 );
23
24 task divi;
25     $display ("%0s", {80{"-"}});
26 endtask
27
28 task br;
29     $display ("%0s", {80{"="}});
30 endtask
31
32 task msg (input [700:0] txt);
33 begin
34     br();
35     $display ("%0s", txt);
36     br();
37 end
38 endtask
39
40 initial begin
41     clk = 0;
42     forever #5 clk = ~clk;
43 end
44
45 task setup;
46 begin
47     rst = 1;
48     repeat (2) @(posedge clk);
49     #1;
50     rst = 0;
51     we = 0;
52     @(posedge clk);
53 end
54 endtask
55
56 task wr (input [4:0] reg_num, input [31:0] val);
57 begin
58     divi();
59     $display ("[WRITE] Time = %0t | rd = 0x%2h, Data = 0x%2h",

```

```

        $time, reg_num, val);
60    divi();

61    @(posedge clk);
62    #1;
63    we = 1;
64    rd = reg_num;
65    rd_data = val;

66    @(posedge clk);
67    #1;
68    we = 0;
69
70  end
71
72 endtask

73
74 task check (input [4:0] reg_num, input [31:0] exp);
75 begin
76     rs1 = reg_num;
77     @(negedge clk);
78     #1;
79     if (rs1_data === exp) begin
80         $display ("[PASS] Time = %0t | rs1 = 0x%2h, Expected = 0x
81             %8h, Output = 0x%8h", $time, reg_num, exp, rs1_data);
82         pass = pass + 1;
83     end else begin
84         $display ("[FAIL] Time = %0t | rs1 = 0x%2h, Expected = 0x
85             %8h, Output = 0x%8h", $time, reg_num, exp, rs1_data);
86         fail = fail + 1;
87     end
88 end
89
90 endtask

91
92 task check_all (input [4:0] reg1, input [31:0] exp1, input [4:0] reg2,
93                 input [31:0] exp2);
94 begin
95     rs1 = reg1;
96     rs2 = reg2;
97     @(negedge clk);
98     #1;

99     if (rs1_data === exp1) begin
100         $display ("[PASS] Time = %0t | rs1 = 0x%2h, Expected = 0x
101             %8h, Output = 0x%8h", $time, reg1, exp1, rs1_data);
102         pass = pass + 1;
103     end else begin
104         $display ("[FAIL] Time = %0t | rs1 = 0x%2h, Expected = 0x
105             %8h, Output = 0x%8h", $time, reg1, exp1, rs1_data);
106         fail = fail + 1;
107     end

108     if (rs2_data === exp2) begin
109         $display ("[PASS] Time = %0t | rs2 = 0x%2h, Expected = 0x
110             %8h, Output = 0x%8h", $time, reg2, exp2, rs2_data);

```

```

106         pass = pass + 1;
107     end else begin
108         $display ("[FAIL] Time = %0t | rs2 = 0x%2h, Expected = 0x
109             %8h, Output = 0x%8h", $time, reg2, exp2, rs2_data);
110         fail = fail + 1;
111     end
112 end
113
114 task init (input [4:0] reg1, input [4:0] reg2, input [31:0] exp);
115     begin
116         @ (negedge clk);
117         #1;
118         if (rs1_data != 0 || rs2_data != 0) begin
119             $display ("[FAIL] Time = %0t | rs1 = 0x%2h and rs2 = 0x%2
120                 h should be initialized to 0", $time, reg1, reg2);
121             fail = fail + 1;
122         end else begin
123             $display ("[PASS] Time = %0t | rs1 = 0x%2h and rs2 = 0x%2
124                 h initialized to 0", $time, reg1, reg2);
125             pass = pass + 1;
126         end
127     end
128 endtask
129
130 initial begin
131     pass = 0;
132     fail = 0;
133
134     msg ("TEST 1: WRITE AND READ x0");
135     setup();
136     check (5'd0, 32'd0);
137
138     wr (5'd0, 32'hFFFF_FFFF);
139     check (5'd0, 32'd0);
140
141     wr (5'd0, 32'h1234_5678);
142     check (5'd0, 32'd0);
143
144     msg ("TEST 2: WRITE AND READ x1");
145     setup();
146     wr (5'd1, 32'h1234_5678);
147     check (5'd1, 32'h1234_5678);
148
149     msg ("TEST 3: CHECK ALL REGISTERS");
150     setup();
151     for (i = 1; i < 32; i = i + 1) begin
152         rs1 = i[4:0];
153         rs2 = i[4:0];
154         init (rs1, rs2, 32'd0);
155
156         exp_val = $urandom();

```

```

156     wr (i[4:0], exp_val);
157     check_all (i[4:0], exp_val, i[4:0], exp_val);
158   end
159
160   msg ("TEST 4: WRITE ALL AND READ BACK");
161   setup();
162   wr_val[0] = 32'd0; // x0
163
164   for (i = 1; i < 32; i = i + 1) begin
165     wr_val[i] = $urandom();
166     wr (i[4:0], wr_val[i]);
167   end
168
169   for (i = 1; i < 31; i = i + 1) begin
170     check_all (i[4:0], wr_val[i], i + 1, wr_val[i + 1]);
171   end
172
173   msg ("SUMMARY");
174   $display ("TOTAL TESTS : %0d", pass + fail);
175   $display ("TOTAL PASSED: %0d", pass);
176   $display ("TOTAL FAILED: %0d", fail);
177
178   if (fail == 0)
179     $display ("=====> ALL TESTS PASSED");
180   else if (pass == 0)
181     $display ("=====> ALL TESTS FAILED");
182   else
183     $display ("=====> SOME TESTS FAILED");
184
185   #100;
186   $finish;
187 end
188 endmodule

```

Program 2: Test bench implementation for the RegFile module.

## 4.2 Processor testbench

The `tb_single` module serves as the top level integration test, validating the interaction between the Datapath, Control unit and Memory. It executes actual RISC-V machine code to ensure the processor complies with the ISA specification.

The testbench generates two clock signals to simulate the timing constraints of a single-cycle architecture:

- `clock_proc`: The main system clock. Register writes and PC updates occur on the rising edge.
- `clock_mem`: A phase-shifted 90 degrees from `clock_proc` clock.

The testbench uses the `load_inst` task to write 32-bit machine code directly into the `mem_array` of the instantiated Memory module. The task `check` uses hierarchical referencing to peek directly into the internal register file in order to validate a CPU requires checking the state of internal general-purpose registers (x1-x31). Since the registers are limited by 32 with one is hardwired to zero values, there is only 31 registers that can store data. Therefore, the

testbench breaks into two tests, Test 1 focuses on ALU operations such as ADD, XOR, SRAI; multiplication and division; use LW and SW to verify the load/store path and executes a JAL to skip instructions, verifying the PC updates non-sequentially. Meanwhile, Test 2 focuses on implement signed and unsigned operations like MULH, MULHU; sub-word access that stores data using SB and reads back using LB and LBU to verifying that the memory masking logic correctly handles sign-extension for 8 and 16-bit values; and a JALR instruction is used to jump to an address calculated at runtime, verifying the register-based jump logic.

```

1  `timescale 1ns / 1ps
2  module tb_single;
3  reg clock_proc, clock_mem, rst;
4  wire halt;
5  integer i, test, timeout, pass, fail;
6
7 Processor dut (
8     .clock_proc (clock_proc),
9     .clock_mem (clock_mem),
10    .rst        (rst),
11    .halt      (halt)
12 );
13
14 task br;
15     $display ("%0s", {80{"="}});
16 endtask
17
18 task msg (input [700:0] txt);
19     begin
20         br();
21         $display ("%0s", txt);
22         br();
23     end
24 endtask
25
26 task load_inst (input [31:0] addr, input [31:0] inst);
27     begin
28         dut.memory.mem_array[addr >> 2] = inst;
29     end
30 endtask
31
32 task check (input [200:0] txt, input [4:0] reg_num, input [31:0] exp);
33     begin
34         if (dut.datapath.rf.regs[reg_num] === exp) begin
35             $display ("[PASS] Time = %0t | %s | Output x%0d = 0x%8h",
36                         $time, txt, reg_num, dut.datapath.rf.regs[reg_num]);
37             pass = pass + 1;
38         end else begin
39             $display ("[FAIL] Time = %0t | %s | Expect x%0d = 0x%8h |
40                         Output x%0d = 0x%8h", $time, txt, reg_num, exp, reg_num
41                         , dut.datapath.rf.regs[reg_num]);
42             fail = fail + 1;
43         end
44     end
45 endtask

```

```

43
44 initial begin
45     clock_proc = 0;
46     forever #5 clock_proc = ~clock_proc;
47 end
48
49 initial begin
50     #2.5 clock_mem = 0;
51     forever #5 clock_mem = ~clock_mem;
52 end
53
54 initial begin
55     msg ("TEST BEGIN");
56     rst = 1;
57     timeout = 0;
58     pass = 0;
59     fail = 0;
60     test = 0;
61     #20;
62
63     msg ("TEST 1");
64     load_inst (32'h00, 32'h00a00093); // ADDI x1, x0, 10
65     load_inst (32'h04, 32'hffb00113); // ADDI x2, x0, -5
66     load_inst (32'h08, 32'h002081b3); // ADD x3, x1, x2 (5)
67     load_inst (32'h0C, 32'h40208233); // SUB x4, x1, x2 (15)
68     load_inst (32'h10, 32'h0020f2b3); // AND x5, x1, x2 (10)
69     load_inst (32'h14, 32'h0020e333); // OR x6, x1, x2 (-5)
70     load_inst (32'h18, 32'h0020c3b3); // XOR x7, x1, x2 (-15)
71     load_inst (32'h1C, 32'h00209413); // SLLI x8, x1, 2 (40)
72     load_inst (32'h20, 32'h40115493); // SRAI x9, x2, 1 (-3)
73     load_inst (32'h24, 32'h00115513); // SRLI x10, x2, 1
74     load_inst (32'h28, 32'h001125b3); // SLT x11, x2, x1 (1)
75     load_inst (32'h2C, 32'h00113633); // SLTU x12, x2, x1 (0)
76     load_inst (32'h30, 32'h021086b3); // MUL x13, x1, x1 (100)
77     load_inst (32'h34, 32'h0220c733); // DIV x14, x1, x2 (-2)
78     load_inst (32'h38, 32'h0220e7b3); // REM x15, x1, x2 (0)
79     load_inst (32'h3C, 32'h02115833); // DIVU x16, x2, x1
80     load_inst (32'h40, 32'h06302223); // SW x3, 100(x0)
81     load_inst (32'h44, 32'h06402903); // LW x18, 100(x0)
82     load_inst (32'h48, 32'h00108463); // BEQ x1, x1, 8
83     load_inst (32'h4C, 32'h06300993); // (Skipped)
84     load_inst (32'h50, 32'h00100993); // ADDI x19, x0, 1
85     load_inst (32'h54, 32'h00800a6f); // JAL x20, 8
86     load_inst (32'h58, 32'h06300a93); // (Skipped)
87     load_inst (32'h5C, 32'h00200a93); // ADDI x21, x0, 2
88     load_inst (32'h60, 32'h12345b37); // LUI x22, 0x12345
89     load_inst (32'h68, 32'h0050cb93); // XORI x23, x1, 5
90     load_inst (32'h6C, 32'h00f0fc13); // ANDI x24, x1, 0xF
91     load_inst (32'h70, 32'hf006ec93); // ORI x25, x13, 0xFF00
92     load_inst (32'h74, 32'h0283ad13); // SLTI x26, x7, 0x28
93     load_inst (32'h78, 32'hff143d93); // SLTIU x27, x8, 0xffffffff
94     load_inst (32'h7C, 32'h01309e33); // SLL x28, x1, x19
95     load_inst (32'h80, 32'h01335eb3); // SRL x29, x6, x19

```

```

96     load_inst (32'h84, 32'h41335f33); // SRA x30, x6, x19
97     load_inst (32'h8C, 32'h02337fb3); // REMU x31, x6, x3
98     load_inst(32'h90, 32'h00000073); // HALT
99
100    #10 rst = 0;
101    #1;
102
103    while (!halt && timeout < 1000) begin
104        #10 timeout = timeout + 10;
105    end
106
107    if (test == 0) begin
108        check ("ADDI x1, x0, 10", 1, 10);
109        check ("ADDI x2, x0, -5", 2, -5);
110        check ("ADD x3, x1, x2", 3, 5);
111        check ("SUB x4, x1, x2", 4, 15);
112        check ("AND x5, x1, x2", 5, 10);
113        check ("OR x6, x1, x2", 6, -5);
114        check ("XOR x7, x1, x2", 7, -15);
115        check ("SLLI x8, x1, 2", 8, 40);
116        check ("SRAI x9, x2, 1", 9, -3);
117        check ("SRLI x10, x2, 1", 10, 32'h7fff_ffffd);
118        check ("SLT x11, x2, x1", 11, 1);
119        check ("SLTU x12, x2, x1", 12, 0);
120        check ("MUL x13, x1, x1", 13, 100);
121        check ("DIV x14, x1, x2", 14, -2);
122        check ("REM x15, x1, x2", 15, 0);
123        check ("DIVU x16, x2, x1", 16, 32'h1999_9999);
124        check ("LW x18, 100(x0)", 18, 5);
125        check ("ADDI x19, x0, 1", 19, 1);
126        check ("JAL x20, 8", 20, 32'h58); // PC of next instruction
127        check ("ADDI x21, x0, 2", 21, 2);
128        check ("LUI x22, 0x12345", 22, 32'h1234_5000);
129        check ("XORI x23, x1, 5", 23, 32'hf);
130        check ("ANDI x24, x1, 0xF", 24, 32'ha);
131        check ("ORI x25, x13, 0xFF00", 25, 32'hffff_ff64);
132        check ("SLTI x26, x7, 0x28", 26, 1);
133        check ("SLTIU x27, x8, 0xFFFF_FFF1", 27, 1);
134        check ("SLL x28, x1, x19", 28, 20);
135        check ("SRL x29, x6, x19", 29, 32'h7FFF_FFFD);
136        check ("SRA x30, x6, x19", 30, 32'hFFFF_FFFD);
137        check ("REMU x31, x6, x3", 31, 1);
138
139        if (halt) begin
140            $display ("[PASS] HALT asserted.");
141            pass = pass + 1;
142        end else begin
143            $display ("[FAIL] HALT is missing.");
144            fail = fail + 1;
145        end
146    end
147
148    msg ("TEST 2 (REUSED REGISTERS)");

```

```

149     @(negedge clock_proc);
150     rst = 1;
151     test = 1;
152     timeout = 0;
153     #20;
154
155     load_inst (32'h0, 32'hffb00093); // ADDI x1, x0, 0xFFFF_FFFB
156     load_inst (32'h4, 32'h00300113); // ADDI x2, x0, 3
157     load_inst (32'h8, 32'h022091b3); // MULH x3, x1, x2
158     load_inst (32'hC, 32'h0220a233); // MULHSU x4, x1, x2
159     load_inst (32'h10, 32'h0220b2b3); // MULHU x5, x1, x2
160     load_inst (32'h14, 32'he9800313); // ADDI x6, x0, 0xABCD_FE98
161     load_inst (32'h18, 32'h06600223); // SB x6, 100(x0)
162     load_inst (32'h1C, 32'h06601423); // SH x6, 104(x0)
163     load_inst (32'h20, 32'h06400383); // LB x7, 100(x0)
164     load_inst (32'h24, 32'h06801403); // LH x8, 104(x0)
165     load_inst (32'h28, 32'h00831463); // BNE x6, x8, 8
166     load_inst (32'h2C, 32'h06904483); // LBU x9, 105(x0)
167     load_inst (32'h30, 32'h06805503); // LHU x10, 104(x0)
168     load_inst (32'h34, 32'h00a4c463); // BLT x9, x10, 12
169     load_inst (32'h38, 32'h00a4c5b3); // XOR x11, x9, x10 //skipped
170     load_inst (32'h3C, 32'h00a4e5b3); // OR x11, x9, x10 // skipped
171     load_inst (32'h40, 32'h00a485b3); // ADD x11, x9, x10
172     load_inst (32'h44, 32'h00535463); // BGE x6, x5, 8
173     load_inst (32'h48, 32'h0062e663); // BLTU x5, x6, 12
174     load_inst (32'h4C, 32'h00a4e633); // OR x12, x9, x10 // skip then
175         turned back by bgeu
176     load_inst (32'h50, 32'h08200767); // JALR x14, 130(x0)
177     load_inst (32'h54, 32'h0004e633); // OR x12, x9, x0
178     load_inst (32'h58, 32'h06c02c23); // SW x12, 120(x0)
179     load_inst (32'h5C, 32'h07802683); // LW x13, 120(x0)
180     load_inst (32'h60, 32'hfe9576e3); // BGEU x10, x9, -20
181     load_inst (32'h82, 32'h00000073); // HALT
182
183     #10 rst = 0;
184     #1;
185     while (!halt && timeout < 1000) begin
186         #10 timeout = timeout + 10;
187     end
188
189     if (test) begin
190         check ("ADDI x1, x0, 0xFFFF_FFFB", 1, 32'hFFFF_FFFB);
191         check ("ADDI x2, x0, 3", 2, 32'h3);
192         check ("MULH x3, x1, x2", 3, 32'hFFFF_FFFF);
193         check ("MULHSU x4, x1, x2", 4, 32'hFFFF_FFFF);
194         check ("MULHU x5, x1, x2", 5, 32'h2);
195         check ("ADDI x6, x0, 0xABCD_FE98", 6, 32'hFFFF_FE98);
196         check ("LB x7, 100(x0)", 7, 32'hFFFF_FF98);
197         check ("LH x8, 104(x0)", 8, 32'hFFFF_FE98);
198         check ("LBU x9, 105(x0)", 9, 32'h0000_00FE);
199         check ("LHU x10, 104(x0)", 10, 32'h0000_FE98);
200         check ("ADD x11, x9, x10", 11, 32'h0000_FF96);
201         check ("OR x12, x9, x10", 12, 32'h0000_FEFE);

```

```

201     check ("LW x13, 120(x0)", 13, 32'h0000_00FE);
202     check ("JALR x14, 130(x0)", 14, 32'h54);

203     if (halt) begin
204         $display ("[PASS] HALT asserted.");
205         pass = pass + 1;
206     end else begin
207         $display ("[FAIL] HALT is missing.");
208         fail = fail + 1;
209     end
210 end
211
212 msg ("SUMMARY");
213 $display ("TOTAL TESTS : %0d", pass + fail);
214 $display ("TOTAL PASSED: %0d", pass);
215 $display ("TOTAL FAILED: %0d", fail);
216
217 if (fail == 0)
218     $display ("===== > ALL TESTS PASSED");
219 else if (pass == 0)
220     $display ("===== > ALL TESTS FAILED");
221 else
222     $display ("===== > SOME TESTS FAILED");
223
224 msg ("TEST END");
225
226 #100;
227 $finish;
228
229 end
230 endmodule

```

Program 3: Test bench implementation for the Processor module.

## 5 Testbench results

All test cases executed successfully for all tested modules, with each marked as PASSED. This outcome confirms that the test modules behave as intended and that the implemented RTL designs meet the expected functionality.

### 5.1 RegFile testbench results

```

1 # =====
2 # TEST 1: WRITE AND READ x0
3 # =====
4 # [PASS] Time = 31000 | rs1 = 0x00, Expected = 0x00000000, Output = 0
5 #           x00000000
6 # -----
7 # [WRITE] Time = 31000 | rd = 0x00, Data = 0xffffffff
8 # -----
9 # [PASS] Time = 51000 | rs1 = 0x00, Expected = 0x00000000, Output = 0
10 #          x00000000
11 # -----

```

```

10 # [WRITE] Time = 51000 | rd = 0x00, Data = 0x12345678
11 #
12 # [PASS] Time = 71000 | rs1 = 0x00, Expected = 0x00000000, Output = 0
13 # x00000000
14 # =====
15 # TEST 2: WRITE AND READ x1
16 #
17 # [WRITE] Time = 95000 | rd = 0x01, Data = 0x12345678
18 #
19 # [PASS] Time = 121000 | rs1 = 0x01, Expected = 0x12345678, Output =
20 # 0x12345678
21 # =====
22 # TEST 3: CHECK ALL REGISTERS
23 #
24 # [PASS] Time = 151000 | rs1 = 0x01 and rs2 = 0x01 initialized to 0
25 #
26 # [WRITE] Time = 151000 | rd = 0x01, Data = 0x369d4b49
27 #
28 # [PASS] Time = 171000 | rs1 = 0x01, Expected = 0x369d4b49, Output =
29 # 0x369d4b49
30 # [PASS] Time = 171000 | rs2 = 0x01, Expected = 0x369d4b49, Output =
31 # 0x369d4b49
32 # [PASS] Time = 181000 | rs1 = 0x02 and rs2 = 0x02 initialized to 0
33 #
34 # [WRITE] Time = 181000 | rd = 0x02, Data = 0x2e53207a
35 #
36 # [PASS] Time = 201000 | rs1 = 0x02, Expected = 0x2e53207a, Output =
37 # 0x2e53207a
38 # [PASS] Time = 201000 | rs2 = 0x02, Expected = 0x2e53207a, Output =
39 # 0x2e53207a
40 # [PASS] Time = 211000 | rs1 = 0x03 and rs2 = 0x03 initialized to 0
41 #
42 # [WRITE] Time = 211000 | rd = 0x03, Data = 0x7cb0da7e
43 #
44 # [PASS] Time = 231000 | rs1 = 0x03, Expected = 0x7cb0da7e, Output =
45 # 0x7cb0da7e
46 # [PASS] Time = 231000 | rs2 = 0x03, Expected = 0x7cb0da7e, Output =
47 # 0x7cb0da7e
48 # [PASS] Time = 241000 | rs1 = 0x04 and rs2 = 0x04 initialized to 0
49 #
50 # [WRITE] Time = 241000 | rd = 0x04, Data = 0x6a8defda
51 #

```

```

52 # [PASS] Time = 291000 | rs2 = 0x05, Expected = 0xc7119e4e, Output =
53 # [PASS] Time = 301000 | rs1 = 0x06 and rs2 = 0x06 initialized to 0
54 #
55 # [WRITE] Time = 301000 | rd = 0x06, Data = 0xf0b39a02
56 #
57 # [PASS] Time = 321000 | rs1 = 0x06, Expected = 0xf0b39a02, Output =
58 # [PASS] Time = 321000 | rs2 = 0x06, Expected = 0xf0b39a02, Output =
59 # [PASS] Time = 331000 | rs1 = 0x07 and rs2 = 0x07 initialized to 0
60 #
61 # [WRITE] Time = 331000 | rd = 0x07, Data = 0xb81d179b
62 #
63 # [PASS] Time = 351000 | rs1 = 0x07, Expected = 0xb81d179b, Output =
64 # [PASS] Time = 351000 | rs2 = 0x07, Expected = 0xb81d179b, Output =
65 # [PASS] Time = 361000 | rs1 = 0x08 and rs2 = 0x08 initialized to 0
66 #
67 # [WRITE] Time = 361000 | rd = 0x08, Data = 0xcaf481d9
68 #
69 # [PASS] Time = 381000 | rs1 = 0x08, Expected = 0xcaf481d9, Output =
70 # [PASS] Time = 381000 | rs2 = 0x08, Expected = 0xcaf481d9, Output =
71 # [PASS] Time = 391000 | rs1 = 0x09 and rs2 = 0x09 initialized to 0
72 #
73 # [WRITE] Time = 391000 | rd = 0x09, Data = 0x4aae874c
74 #
75 # [PASS] Time = 411000 | rs1 = 0x09, Expected = 0x4aae874c, Output =
76 # [PASS] Time = 411000 | rs2 = 0x09, Expected = 0x4aae874c, Output =
77 # [PASS] Time = 421000 | rs1 = 0xa and rs2 = 0xa initialized to 0
78 #
79 # [WRITE] Time = 421000 | rd = 0xa, Data = 0xd0bc7957
80 #
81 # [PASS] Time = 441000 | rs1 = 0xa, Expected = 0xd0bc7957, Output =
82 # [PASS] Time = 441000 | rs2 = 0xa, Expected = 0xd0bc7957, Output =
83 # [PASS] Time = 451000 | rs1 = 0xb and rs2 = 0xb initialized to 0
84 #
85 # [WRITE] Time = 451000 | rd = 0xb, Data = 0x6aa8df9b
86 #
87 # [PASS] Time = 471000 | rs1 = 0xb, Expected = 0x6aa8df9b, Output =
88 # [PASS] Time = 471000 | rs2 = 0xb, Expected = 0x6aa8df9b, Output =
89 # [PASS] Time = 481000 | rs1 = 0xc and rs2 = 0xc initialized to 0
90 #
91 # [WRITE] Time = 481000 | rd = 0xc, Data = 0xfd6562cf

```

```

92 # -----
93 # [PASS] Time = 501000 | rs1 = 0x0c, Expected = 0xfd6562cf, Output =
94 #   0xfd6562cf
95 # [PASS] Time = 501000 | rs2 = 0x0c, Expected = 0xfd6562cf, Output =
96 #   0xfd6562cf
97 # [PASS] Time = 511000 | rs1 = 0x0d and rs2 = 0x0d initialized to 0
98 #
99 # -----
100 # [WRITE] Time = 511000 | rd = 0x0d, Data = 0x25568ff3
101 #
102 # -----
103 # [PASS] Time = 531000 | rs1 = 0x0d, Expected = 0x25568ff3, Output =
104 #   0x25568ff3
105 # [PASS] Time = 531000 | rs2 = 0x0d, Expected = 0x25568ff3, Output =
106 #   0x25568ff3
107 # [PASS] Time = 541000 | rs1 = 0x0e and rs2 = 0x0e initialized to 0
108 #
109 # -----
110 # [WRITE] Time = 541000 | rd = 0x0e, Data = 0x66fa2046
111 #
112 # -----
113 # [PASS] Time = 561000 | rs1 = 0x0e, Expected = 0x66fa2046, Output =
114 #   0x66fa2046
115 # [PASS] Time = 561000 | rs2 = 0x0e, Expected = 0x66fa2046, Output =
116 #   0x66fa2046
117 # [PASS] Time = 571000 | rs1 = 0x0f and rs2 = 0x0f initialized to 0
118 #
119 # -----
120 # [WRITE] Time = 571000 | rd = 0x0f, Data = 0x41766910
121 #
122 # -----
123 # [PASS] Time = 591000 | rs1 = 0x0f, Expected = 0x41766910, Output =
124 #   0x41766910
125 # [PASS] Time = 591000 | rs2 = 0x0f, Expected = 0x41766910, Output =
126 #   0x41766910
127 # [PASS] Time = 601000 | rs1 = 0x10 and rs2 = 0x10 initialized to 0
128 #
129 # -----
130 # [WRITE] Time = 601000 | rd = 0x10, Data = 0x81103480
131 #
132 # -----
133 # [PASS] Time = 621000 | rs1 = 0x10, Expected = 0x81103480, Output =
134 #   0x81103480
135 # [PASS] Time = 621000 | rs2 = 0x10, Expected = 0x81103480, Output =
136 #   0x81103480
137 # [PASS] Time = 631000 | rs1 = 0x11 and rs2 = 0x11 initialized to 0
138 #
139 # -----
140 # [WRITE] Time = 631000 | rd = 0x11, Data = 0x2942e4d8
141 #
142 # -----
143 # [PASS] Time = 651000 | rs1 = 0x11, Expected = 0x2942e4d8, Output =
144 #   0x2942e4d8
145 # [PASS] Time = 651000 | rs2 = 0x11, Expected = 0x2942e4d8, Output =
146 #   0x2942e4d8
147 # [PASS] Time = 661000 | rs1 = 0x12 and rs2 = 0x12 initialized to 0
148 #
149 # -----
150 # [WRITE] Time = 661000 | rd = 0x12, Data = 0x546a052c
151 #
152 # -----
153 # [PASS] Time = 681000 | rs1 = 0x12, Expected = 0x546a052c, Output =
154 #   0x546a052c
155 # [PASS] Time = 681000 | rs2 = 0x12, Expected = 0x546a052c, Output =
156 #   0x546a052c

```

```

131 # [PASS] Time = 691000 | rs1 = 0x13 and rs2 = 0x13 initialized to 0
132 #
133 # [WRITE] Time = 691000 | rd = 0x13, Data = 0xe34c5b88
134 #
135 # [PASS] Time = 711000 | rs1 = 0x13, Expected = 0xe34c5b88, Output =
136 # 0xe34c5b88
137 # [PASS] Time = 711000 | rs2 = 0x13, Expected = 0xe34c5b88, Output =
138 # 0xe34c5b88
139 # [PASS] Time = 721000 | rs1 = 0x14 and rs2 = 0x14 initialized to 0
140 #
141 # [WRITE] Time = 721000 | rd = 0x14, Data = 0x3635d07
142 #
143 # [PASS] Time = 741000 | rs1 = 0x14, Expected = 0x03635d07, Output =
144 # 0x03635d07
145 # [PASS] Time = 741000 | rs2 = 0x14, Expected = 0x03635d07, Output =
146 # 0x03635d07
147 # [PASS] Time = 751000 | rs1 = 0x15 and rs2 = 0x15 initialized to 0
148 #
149 # [WRITE] Time = 751000 | rd = 0x15, Data = 0xc691eee3
150 #
151 # [PASS] Time = 771000 | rs1 = 0x15, Expected = 0xc691eee3, Output =
152 # 0xc691eee3
153 # [PASS] Time = 771000 | rs2 = 0x15, Expected = 0xc691eee3, Output =
154 # 0xc691eee3
155 # [PASS] Time = 781000 | rs1 = 0x16 and rs2 = 0x16 initialized to 0
156 #
157 # [WRITE] Time = 781000 | rd = 0x16, Data = 0x6e6773f3
158 #
159 # [PASS] Time = 801000 | rs1 = 0x16, Expected = 0x6e6773f3, Output =
160 # 0x6e6773f3
161 # [PASS] Time = 801000 | rs2 = 0x16, Expected = 0x6e6773f3, Output =
162 # 0x6e6773f3
163 # [PASS] Time = 811000 | rs1 = 0x17 and rs2 = 0x17 initialized to 0
164 #
165 # [WRITE] Time = 811000 | rd = 0x17, Data = 0xd0cd0050
166 #
167 # [PASS] Time = 831000 | rs1 = 0x17, Expected = 0xd0cd0050, Output =
168 # 0xd0cd0050
169 # [PASS] Time = 831000 | rs2 = 0x17, Expected = 0xd0cd0050, Output =
170 # 0xd0cd0050
171 # [PASS] Time = 841000 | rs1 = 0x18 and rs2 = 0x18 initialized to 0
172 #
173 # [WRITE] Time = 841000 | rd = 0x18, Data = 0x43c3b3e2
174 #
175 # [PASS] Time = 861000 | rs1 = 0x18, Expected = 0x43c3b3e2, Output =
176 # 0x43c3b3e2
177 # [PASS] Time = 861000 | rs2 = 0x18, Expected = 0x43c3b3e2, Output =
178 # 0x43c3b3e2
179 # [PASS] Time = 871000 | rs1 = 0x19 and rs2 = 0x19 initialized to 0
180 #
181 # [WRITE] Time = 871000 | rd = 0x19, Data = 0x77622873
182 #

```

```

171 # [PASS] Time = 891000 | rs1 = 0x19, Expected = 0x77622873, Output =
172 # 0x77622873
173 # [PASS] Time = 891000 | rs2 = 0x19, Expected = 0x77622873, Output =
174 # 0x77622873
175 # [PASS] Time = 901000 | rs1 = 0x1a and rs2 = 0x1a initialized to 0
176 # -----
177 # [WRITE] Time = 901000 | rd = 0x1a, Data = 0x9ef912bb
178 # -----
179 # [PASS] Time = 921000 | rs1 = 0x1a, Expected = 0x9ef912bb, Output =
180 # 0x9ef912bb
181 # [PASS] Time = 921000 | rs2 = 0x1a, Expected = 0x9ef912bb, Output =
182 # 0x9ef912bb
183 # [PASS] Time = 931000 | rs1 = 0x1b and rs2 = 0x1b initialized to 0
184 # -----
185 # [WRITE] Time = 931000 | rd = 0x1b, Data = 0x34c281fc
186 # -----
187 # [PASS] Time = 951000 | rs1 = 0x1b, Expected = 0x34c281fc, Output =
188 # 0x34c281fc
189 # [PASS] Time = 951000 | rs2 = 0x1b, Expected = 0x34c281fc, Output =
190 # 0x34c281fc
191 # [PASS] Time = 961000 | rs1 = 0x1c and rs2 = 0x1c initialized to 0
192 # -----
193 # [WRITE] Time = 961000 | rd = 0x1c, Data = 0xe659f7ae
194 # -----
195 # [PASS] Time = 981000 | rs1 = 0x1c, Expected = 0xe659f7ae, Output =
196 # 0xe659f7ae
197 # [PASS] Time = 981000 | rs2 = 0x1c, Expected = 0xe659f7ae, Output =
198 # 0xe659f7ae
199 # [PASS] Time = 991000 | rs1 = 0x1d and rs2 = 0x1d initialized to 0
200 # -----
201 # [WRITE] Time = 991000 | rd = 0x1d, Data = 0x713c11cf
202 # -----
203 # [PASS] Time = 1011000 | rs1 = 0x1d, Expected = 0x713c11cf, Output =
204 # 0x713c11cf
205 # [PASS] Time = 1011000 | rs2 = 0x1d, Expected = 0x713c11cf, Output =
206 # 0x713c11cf
207 # [PASS] Time = 1021000 | rs1 = 0x1e and rs2 = 0x1e initialized to 0
208 # -----
209 # [WRITE] Time = 1021000 | rd = 0x1e, Data = 0x87401cb6
# -----
# [PASS] Time = 1041000 | rs1 = 0x1e, Expected = 0x87401cb6, Output =
# 0x87401cb6
# [PASS] Time = 1041000 | rs2 = 0x1e, Expected = 0x87401cb6, Output =
# 0x87401cb6
# [PASS] Time = 1051000 | rs1 = 0x1f and rs2 = 0x1f initialized to 0
# -----
# [WRITE] Time = 1051000 | rd = 0x1f, Data = 0xa24d13c7
# -----
# [PASS] Time = 1071000 | rs1 = 0x1f, Expected = 0xa24d13c7, Output =
# 0xa24d13c7
# [PASS] Time = 1071000 | rs2 = 0x1f, Expected = 0xa24d13c7, Output =
# 0xa24d13c7
# =====

```

```

210 # TEST 4: WRITE ALL AND READ BACK
211 # =====
212 #
213 # [WRITE] Time = 1095000 | rd = 0x01, Data = 0x4c8e7537
214 #
215 #
216 # [WRITE] Time = 1116000 | rd = 0x02, Data = 0xc969e31a
217 #
218 #
219 # [WRITE] Time = 1136000 | rd = 0x03, Data = 0xd837cd88
220 #
221 #
222 # [WRITE] Time = 1156000 | rd = 0x04, Data = 0x9f28cd7b
223 #
224 #
225 # [WRITE] Time = 1176000 | rd = 0x05, Data = 0x87f27d84
226 #
227 #
228 # [WRITE] Time = 1196000 | rd = 0x06, Data = 0x9162bd0c
229 #
230 #
231 # [WRITE] Time = 1216000 | rd = 0x07, Data = 0x613fb4ba
232 #
233 #
234 # [WRITE] Time = 1236000 | rd = 0x08, Data = 0xa75a26b9
235 #
236 #
237 # [WRITE] Time = 1256000 | rd = 0x09, Data = 0x97befd48
238 #
239 #
240 # [WRITE] Time = 1276000 | rd = 0x0a, Data = 0x74871bf9
241 #
242 #
243 # [WRITE] Time = 1296000 | rd = 0x0b, Data = 0xb731590
244 #
245 #
246 # [WRITE] Time = 1316000 | rd = 0x0c, Data = 0x22e8e590
247 #
248 #
249 # [WRITE] Time = 1336000 | rd = 0x0d, Data = 0xd66099fd
250 #
251 #
252 # [WRITE] Time = 1356000 | rd = 0x0e, Data = 0xba542fdb
253 #
254 #
255 # [WRITE] Time = 1376000 | rd = 0x0f, Data = 0x8bb8376e
256 #
257 #
258 # [WRITE] Time = 1396000 | rd = 0x10, Data = 0x1603d7b9
259 #
260 #
261 # [WRITE] Time = 1416000 | rd = 0x11, Data = 0x781c5ba3
262 #

```

```

263 # -----
264 # [WRITE] Time = 1436000 | rd = 0x12, Data = 0x2381cb4e
265 # -----
266 #
267 # [WRITE] Time = 1456000 | rd = 0x13, Data = 0x671e6b6f
268 # -----
269 #
270 # [WRITE] Time = 1476000 | rd = 0x14, Data = 0x2ebaab9a
271 # -----
272 #
273 # [WRITE] Time = 1496000 | rd = 0x15, Data = 0xc3391889
274 # -----
275 #
276 # [WRITE] Time = 1516000 | rd = 0x16, Data = 0xcfef6f7b5
277 # -----
278 #
279 # [WRITE] Time = 1536000 | rd = 0x17, Data = 0x6be10d7b
280 # -----
281 #
282 # [WRITE] Time = 1556000 | rd = 0x18, Data = 0x720de158
283 # -----
284 #
285 # [WRITE] Time = 1576000 | rd = 0x19, Data = 0xdcd928e
286 # -----
287 #
288 # [WRITE] Time = 1596000 | rd = 0x1a, Data = 0x327638a8
289 # -----
290 #
291 # [WRITE] Time = 1616000 | rd = 0x1b, Data = 0xdb062489
292 # -----
293 #
294 # [WRITE] Time = 1636000 | rd = 0x1c, Data = 0x879f10ac
295 # -----
296 #
297 # [WRITE] Time = 1656000 | rd = 0x1d, Data = 0x38674ea5
298 # -----
299 #
300 # [WRITE] Time = 1676000 | rd = 0x1e, Data = 0x5a2a5c21
301 # -----
302 #
303 # [WRITE] Time = 1696000 | rd = 0x1f, Data = 0xd5f6a9d
304 # -----
305 # [PASS] Time = 1721000 | rs1 = 0x01, Expected = 0x4c8e7537, Output =
306 #           0x4c8e7537
307 # [PASS] Time = 1721000 | rs2 = 0x02, Expected = 0xc969e31a, Output =
308 #           0xc969e31a
309 # [PASS] Time = 1731000 | rs1 = 0x02, Expected = 0xc969e31a, Output =
310 #           0xc969e31a
311 # [PASS] Time = 1731000 | rs2 = 0x03, Expected = 0xd837cd88, Output =
312 #           0xd837cd88
313 # [PASS] Time = 1741000 | rs1 = 0x03, Expected = 0xd837cd88, Output =
314 #           0xd837cd88

```

```

310 # [PASS] Time = 1741000 | rs2 = 0x04, Expected = 0x9f28cd7b, Output =
311 # [PASS] Time = 1751000 | rs1 = 0x04, Expected = 0x9f28cd7b, Output =
312 # [PASS] Time = 1751000 | rs2 = 0x05, Expected = 0x87f27d84, Output =
313 # [PASS] Time = 1761000 | rs1 = 0x05, Expected = 0x87f27d84, Output =
314 # [PASS] Time = 1761000 | rs2 = 0x06, Expected = 0x9162bd0c, Output =
315 # [PASS] Time = 1771000 | rs1 = 0x06, Expected = 0x9162bd0c, Output =
316 # [PASS] Time = 1771000 | rs2 = 0x07, Expected = 0x613fb4ba, Output =
317 # [PASS] Time = 1781000 | rs1 = 0x07, Expected = 0x613fb4ba, Output =
318 # [PASS] Time = 1781000 | rs2 = 0x08, Expected = 0xa75a26b9, Output =
319 # [PASS] Time = 1791000 | rs1 = 0x08, Expected = 0xa75a26b9, Output =
320 # [PASS] Time = 1791000 | rs2 = 0x09, Expected = 0x97befd48, Output =
321 # [PASS] Time = 1801000 | rs1 = 0x09, Expected = 0x97befd48, Output =
322 # [PASS] Time = 1801000 | rs2 = 0x0a, Expected = 0x74871bf9, Output =
323 # [PASS] Time = 1811000 | rs1 = 0x0a, Expected = 0x74871bf9, Output =
324 # [PASS] Time = 1811000 | rs2 = 0x0b, Expected = 0x9b731590, Output =
325 # [PASS] Time = 1821000 | rs1 = 0x0b, Expected = 0x9b731590, Output =
326 # [PASS] Time = 1821000 | rs2 = 0x0c, Expected = 0x22e8e590, Output =
327 # [PASS] Time = 1831000 | rs1 = 0x0c, Expected = 0x22e8e590, Output =
328 # [PASS] Time = 1831000 | rs2 = 0x0d, Expected = 0xd66099fd, Output =
329 # [PASS] Time = 1841000 | rs1 = 0x0d, Expected = 0xd66099fd, Output =
330 # [PASS] Time = 1841000 | rs2 = 0x0e, Expected = 0xba542fdb, Output =
331 # [PASS] Time = 1851000 | rs1 = 0x0e, Expected = 0xba542fdb, Output =
332 # [PASS] Time = 1851000 | rs2 = 0x0f, Expected = 0x8bb8376e, Output =
333 # [PASS] Time = 1861000 | rs1 = 0x0f, Expected = 0x8bb8376e, Output =
334 # [PASS] Time = 1861000 | rs2 = 0x10, Expected = 0x1603d7b9, Output =
335 # [PASS] Time = 1871000 | rs1 = 0x10, Expected = 0x1603d7b9, Output =

```

```

336 # [PASS] Time = 1871000 | rs2 = 0x11, Expected = 0x781c5ba3, Output =
337 # [PASS] Time = 1881000 | rs1 = 0x11, Expected = 0x781c5ba3, Output =
338 # [PASS] Time = 1881000 | rs2 = 0x12, Expected = 0x2381cb4e, Output =
339 # [PASS] Time = 1891000 | rs1 = 0x12, Expected = 0x2381cb4e, Output =
340 # [PASS] Time = 1891000 | rs2 = 0x13, Expected = 0x671e6b6f, Output =
341 # [PASS] Time = 1901000 | rs1 = 0x13, Expected = 0x671e6b6f, Output =
342 # [PASS] Time = 1901000 | rs2 = 0x14, Expected = 0x2ebaab9a, Output =
343 # [PASS] Time = 1911000 | rs1 = 0x14, Expected = 0x2ebaab9a, Output =
344 # [PASS] Time = 1911000 | rs2 = 0x15, Expected = 0xc3391889, Output =
345 # [PASS] Time = 1921000 | rs1 = 0x15, Expected = 0xc3391889, Output =
346 # [PASS] Time = 1921000 | rs2 = 0x16, Expected = 0xcfef6f7b5, Output =
347 # [PASS] Time = 1931000 | rs1 = 0x16, Expected = 0xcfef6f7b5, Output =
348 # [PASS] Time = 1931000 | rs2 = 0x17, Expected = 0x6be10d7b, Output =
349 # [PASS] Time = 1941000 | rs1 = 0x17, Expected = 0x6be10d7b, Output =
350 # [PASS] Time = 1941000 | rs2 = 0x18, Expected = 0x720de158, Output =
351 # [PASS] Time = 1951000 | rs1 = 0x18, Expected = 0x720de158, Output =
352 # [PASS] Time = 1951000 | rs2 = 0x19, Expected = 0xdcd928e, Output =
353 # [PASS] Time = 1961000 | rs1 = 0x19, Expected = 0xdcd928e, Output =
354 # [PASS] Time = 1961000 | rs2 = 0x1a, Expected = 0x327638a8, Output =
355 # [PASS] Time = 1971000 | rs1 = 0x1a, Expected = 0x327638a8, Output =
356 # [PASS] Time = 1971000 | rs2 = 0x1b, Expected = 0xdb062489, Output =
357 # [PASS] Time = 1981000 | rs1 = 0x1b, Expected = 0xdb062489, Output =
358 # [PASS] Time = 1981000 | rs2 = 0x1c, Expected = 0x879f10ac, Output =
359 # [PASS] Time = 1991000 | rs1 = 0x1c, Expected = 0x879f10ac, Output =
360 # [PASS] Time = 1991000 | rs2 = 0x1d, Expected = 0x38674ea5, Output =
361 # [PASS] Time = 2001000 | rs1 = 0x1d, Expected = 0x38674ea5, Output =

```

```

362 # [PASS] Time = 2001000 | rs2 = 0x1e, Expected = 0x5a2a5c21, Output =
363 # 0x5a2a5c21
364 # [PASS] Time = 2011000 | rs1 = 0x1e, Expected = 0x5a2a5c21, Output =
365 # 0x5a2a5c21
366 # [PASS] Time = 2011000 | rs2 = 0x1f, Expected = 0x0d5f6a9d, Output =
367 # 0x0d5f6a9d
368 # =====
369 # SUMMARY
370 # =====
371 # TOTAL TESTS : 157
372 # TOTAL PASSED: 157
373 # TOTAL FAILED: 0
374 # =====> ALL TESTS PASSED
375 # ** Note: $finish : ../../tb/tb_reg.v(186)
376 #     Time: 2111 ns Iteration: 0 Instance: /tb_reg

```

Simulation 1: Simulation results of the RegFile module.

## 5.2 Processor testbench results

```

1 # =====
2 # =====
3 # TEST BEGIN
4 # =====
5 # =====
6 # TEST 1
7 # =====
8 # [PASS] Time = 371 | ADDI x1, x0, 10 | Output x1 = 0
9 # [PASS] Time = 371 | x0000000a
10 # [PASS] Time = 371 | ADDI x2, x0, -5 | Output x2 = 0
11 # [PASS] Time = 371 | xfffffff
12 # [PASS] Time = 371 | ADD x3, x1, x2 | Output x3 = 0
13 # [PASS] Time = 371 | x00000005
14 # [PASS] Time = 371 | SUB x4, x1, x2 | Output x4 = 0
15 # [PASS] Time = 371 | x0000000f
16 # [PASS] Time = 371 | AND x5, x1, x2 | Output x5 = 0
17 # [PASS] Time = 371 | x0000000a
18 # [PASS] Time = 371 | OR x6, x1, x2 | Output x6 = 0
19 # [PASS] Time = 371 | xfffffff
20 # [PASS] Time = 371 | XOR x7, x1, x2 | Output x7 = 0
# [PASS] Time = 371 | xfffffff1
# [PASS] Time = 371 | SLLI x8, x1, 2 | Output x8 = 0
# [PASS] Time = 371 | x00000028
# [PASS] Time = 371 | SRAI x9, x2, 1 | Output x9 = 0
# [PASS] Time = 371 | xfffffff
# [PASS] Time = 371 | SR LI x10, x2, 1 | Output x10 = 0
# [PASS] Time = 371 | x7fffffd
# [PASS] Time = 371 | SLT x11, x2, x1 | Output x11 = 0
# [PASS] Time = 371 | x00000001
# [PASS] Time = 371 | SLTU x12, x2, x1 | Output x12 = 0
# [PASS] Time = 371 | x00000000
# [PASS] Time = 371 | MUL x13, x1, x1 | Output x13 = 0
# [PASS] Time = 371 | x00000064

```

```

21 # [PASS] Time = 371 | DIV x14, x1, x2 | Output x14 = 0
22 # [PASS] Time = 371 | REM x15, x1, x2 | Output x15 = 0
23 # [PASS] Time = 371 | DIVU x16, x2, x1 | Output x16 = 0
24 # [PASS] Time = 371 | LW x18, 100(x0) | Output x18 = 0
25 # [PASS] Time = 371 | ADDI x19, x0, 1 | Output x19 = 0
26 # [PASS] Time = 371 | JAL x20, 8 | Output x20 = 0
27 # [PASS] Time = 371 | ADDI x21, x0, 2 | Output x21 = 0
28 # [PASS] Time = 371 | LUI x22, 0x12345 | Output x22 = 0
29 # [PASS] Time = 371 | XORI x23, x1, 5 | Output x23 = 0
30 # [PASS] Time = 371 | ANDI x24, x1, 0xF | Output x24 = 0
31 # [PASS] Time = 371 | ORI x25, x13, 0xFF00 | Output x25 = 0
32 # [PASS] Time = 371 | SLTI x26, x7, 0x28 | Output x26 = 0
33 # [PASS] Time = 371 | LTIU x27, x8, 0xFFFF_FFF1 | Output x27 = 0
34 # [PASS] Time = 371 | SLL x28, x1, x19 | Output x28 = 0
35 # [PASS] Time = 371 | SRL x29, x6, x19 | Output x29 = 0
36 # [PASS] Time = 371 | SRA x30, x6, x19 | Output x30 = 0
37 # [PASS] Time = 371 | REMU x31, x6, x3 | Output x31 = 0
38 # [PASS] HALT asserted.
39 =====
40 # TEST 2 (REUSED REGISTERS)
41 =====
42 # [PASS] Time = 651 | ADDI x1, x0, 0xFFFF_FFFB | Output x1 = 0
43 # [PASS] Time = 651 | xfffffff
44 # [PASS] Time = 651 | ADDI x2, x0, 3 | Output x2 = 0
45 # [PASS] Time = 651 | x00000003
46 # [PASS] Time = 651 | MULH x3, x1, x2 | Output x3 = 0
47 # [PASS] Time = 651 | xfffffff
48 # [PASS] Time = 651 | MULHSU x4, x1, x2 | Output x4 = 0
49 # [PASS] Time = 651 | xfffffff
50 # [PASS] Time = 651 | MULHU x5, x1, x2 | Output x5 = 0
51 # [PASS] Time = 651 | x00000002
52 # [PASS] Time = 651 | ADDI x6, x0, 0xABCD_FE98 | Output x6 = 0
53 # [PASS] Time = 651 | xfffffff
54 # [PASS] Time = 651 | LB x7, 100(x0) | Output x7 = 0
55 # [PASS] Time = 651 | xfffffff

```

```

49 # [PASS] Time = 651 | LH x8 , 104(x0) | Output x8 = 0
  xfffffe98
50 # [PASS] Time = 651 | LBU x9 , 105(x0) | Output x9 = 0
  x000000fe
51 # [PASS] Time = 651 | LHU x10 , 104(x0) | Output x10 = 0
  x0000fe98
52 # [PASS] Time = 651 | ADD x11 , x9 , x10 | Output x11 = 0
  x0000ff96
53 # [PASS] Time = 651 | OR x12 , x9 , x10 | Output x12 = 0
  x0000fefef
54 # [PASS] Time = 651 | LW x13 , 120(x0) | Output x13 = 0
  x000000fe
55 # [PASS] Time = 651 | JALR x14 , 130(x0) | Output x14 = 0
  x00000054
56 # [PASS] HALT asserted.
57 =====
58 # SUMMARY
59 =====
60 # TOTAL TESTS : 46
61 # TOTAL PASSED: 46
62 # TOTAL FAILED: 0
63 # =====> ALL TESTS PASSED
64 =====
65 # TEST END
66 =====
67 # ** Note: $finish : ../../tb/tb_single.v(230)
# Time: 751 ns Iteration: 0 Instance: /tb_single

```

Simulate 2: Testbench results of the Processor module.

## 6 Simulation waveform

### 6.1 RegFile simulation waveform

The screenshot in Figure 1 presents a portion of the waveform from Test 3. In this test, a random value is written to the destination register. The waveform confirms that the same value can be correctly read from both source registers, rs1 and rs2.



Figure 1

Figure 1 shows that at the positive clock edge occurring at 185141 ps, the write enable (`we`) signal is asserted. At this moment, the destination register `rd` is set to 0x2, and the written data `rd_data` is 32'h2e53207a. Because the storage is updated synchronously with the positive clock edge, while `rs1` and `rs2` can be read immediately after the new value is written, both `rs1_data`, `rs2_data` are updated to 32'h2e53207a on the next clock edge. This behavior confirms the correct store and read operations of the Register File module.

## 6.2 Processor simulation waveform



Figure 2

Looking at Figure 2, we can see that the first 30 ns is used to load instructions into the memory array while reset is asserted. Once reset is released at  $t = 30\text{ns}$ , instruction execution begins starting from  $\text{PC} = 0x00$ .

The first instruction is  $0x00a00093$  (ADDI  $x_1, x_0, 10$ ). The instruction fetch occurs on the positive edge of `clock_mem`, where the instruction is read from instruction memory into the `inst_from_imem` signal. The processor then decodes this instruction combinationaly: `inst_rd` becomes 1 ( $x_1$ ), `inst_rs1` becomes 0 ( $x_0$ ), and the immediate value is extracted as 10. Since ADDI is an I-type instruction, `inst_rs2` is not used by the operation, but the decoder still extracts bits [24:20] from the instruction encoding. The register file performs a combinational read, so `rs1_data` immediately reflects the value of  $x_0$  (which is 0). On the next positive edge of `clock_proc`, the result ( $0 + 10 = 10$ ) is written to register  $x_1$ , and the PC increments to  $0x04$ .

Looking at a later instruction  $0x002081b3$  at  $\text{PC} = 0x08$  (ADD  $x_3, x_1, x_2$ ), we observe more complex datapath behavior. The decoder extracts: `inst_rd` = 3 ( $x_3$ ), `inst_rs1` = 1 ( $x_1$ ), `inst_rs2` = 2 ( $x_2$ ). The register file performs combinational reads, outputting `rs1_data` = 0xA (10 decimal) and `rs2_data` = 0xFFFFFFFb (-5 in two's complement). These values are fed into the CLA (Carry Lookahead Adder) module, which computes  $10 + (-5) = 5$ . The `cla_res` signal shows 0x00000005, and this value propagates to `rd_data`. On the next positive edge of `clock_proc`, the value 5 is written to register  $x_3$ .

The key timing relationship is: instruction fetch happens on the positive edge of `clock_mem`, combinational decoding and ALU computation occur immediately, and register writeback happens on the next positive edge of `clock_proc`. Memory operations (loads/stores) use the negative edge of `clock_mem` for data access to avoid conflicts with instruction fetch.



Figure 3

Looking at the instruction 0x02115833 (DIVU x16, x2, x1) executing around 180 ns in Figure 3, we observe the unsigned division operation. The `rs2_data` signal shows 0x0000000A (10 decimal), which is the value from register x1, while `rs1_data` shows 0xFFFFFFFb (4,294,967,291 unsigned), which is the value from register x2. Note that DIVU follows the format DIVU rd, rs1, rs2, which computes  $rd = rs1 / rs2$ . Therefore, this instruction calculates  $x2 / x1 = 4,294,967,291 / 10 = 429,496,729 = 0x19999999$ . This unsigned division is executed by the `divider_unsigned` module, and the result appears on the `div_res` signal as 0x19999999. The `rd_data` signal also reflects this value, confirming that 0x19999999 will be written to register x16 on the next positive edge of `clock_proc`.

At approximately 188 ns, a store instruction 0x06302223 (SW x3, 100(x0)) executes. The `rf_we` (register file write enable) signal is deasserted because store instructions do not write to any register, unlike arithmetic or load instructions which require register writeback. For the SW instruction format (S-type), rs1 provides the base address, rs2 provides the source data, and the immediate provides the offset. In this case, `inst_rs2` = 3 (x3) with `rs2_data` = 0x00000005, `inst_rs1` = 0 (x0) with `rs1_data` = 0x00000000, and the offset is 100. The effective address is calculated as  $0 + 100 = 100$  (decimal) = 0x00000064 (hex), which appears on `mem_data_addr`. The value 0x00000005 from x3 appears on `mem_data_to_write`, and this data is written to memory address 100 on the negative edge of `clock_mem`.

Subsequently, when instruction 0x06402903 (LW x18, 100(x0)) executes, the processor reads from the same memory location. The effective address calculation uses `rs1_data` = 0x00000000 (from x0) plus offset 100, resulting in address 0x00000064 on `mem_data_addr`. The `mem_data_loaded_value` signal shows 0x00000005, which is the data previously stored. The `inst_rd` field specifies x18 as the destination register, so this loaded value will be written to register x18 on the next `clock_proc` positive edge.



Figure 4

In Figure 4, at approximately 368 ns, the HALT instruction (0x00000073) is executed at PC = 0x90. This causes the halt signal to assert high, indicating successful completion of Test 1. The processor stops fetching and executing further instructions when halt is asserted. At 380 ns, the testbench reasserts the rst (reset) signal to reinitialize the processor state for Test 2. During this reset phase, the program counter returns to 0x0, and all 32 registers in the register file are cleared to 0x0. The testbench then loads a new sequence of instructions into instruction memory before releasing reset to begin Test 2 execution.

## 7 Timing closure and Resource utilization

### 7.1 Timing closure

#### Design Timing Summary

| Setup                                      | Hold                               | Pulse Width                                       |
|--------------------------------------------|------------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): -106,309 ns    | Worst Hold Slack (WHS): -0,239 ns  | Worst Pulse Width Slack (WPWS): 3,500 ns          |
| Total Negative Slack (TNS): -104255,298 ns | Total Hold Slack (THS): -54,937 ns | Total Pulse Width Negative Slack (TPWS): 0,000 ns |
| Number of Failing Endpoints: 1023          | Number of Failing Endpoints: 1023  | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 2046            | Total Number of Endpoints: 2046    | Total Number of Endpoints: 1024                   |

**Timing constraints are not met.**

Figure 5: Timing summary.

Based on the implementation results from Vivado, the timing summary is as follows:

- Clock (clock\_proc) constraint: 125 MHz (Period: 8.000 ns)
- Worst negative slack (WNS): -106.309 ns
- Conclusion: The design has not achieved Timing closure (Timing constraints are not met).

The negative WNS indicates that the signal propagation delay exceeds the clock period constraint. This is expected for a single-cycle processor architecture. In a single-cycle design, the critical path is significantly long as it must traverse multiple stages within one clock cycle, including instruction memory fetch, register file read, ALU/divider execution, data memory

access and write-back.

Based on the WNS, the minimum clock period required for the design to function correctly is:

$$T_{min} = T_{constraint} - WNS = 8.000 - (-106.309) \approx 114.31ns$$

The estimated maximum operating frequency (Fmax) is:

$$F_{max} = \frac{1}{T_{min}} \approx 8.75MHz$$

## 7.2 Resource utilization



Figure 6: Utilization summary.

The design fits comfortably within the available resources of the Arty Z7-20 board. The utilization is dominated by Slice LUTs (Look-up Tables) rather than Flip-Flops. This is consistent with a single-cycle architecture, which relies heavily on large combinational logic blocks (such as the ALU, multiplier, and especially the divider) to complete instructions in a single cycle, rather than pipelining registers.