



UNIVERSITI  
TEKNOLOGI  
PETRONAS

## Design of a 16-bit CPU

**EEM5023: Advanced Digital Systems Design**

**January 2025**

Date: 7 April 2025

| Name                     | Student ID | Submodules                       | Semester                             |
|--------------------------|------------|----------------------------------|--------------------------------------|
| Yeap Wei Shen            | 20000375   | IMEM, RF,<br>Top-level<br>design | 4 <sup>th</sup> Year 3 <sup>rd</sup> |
| Tharshen A/L Subramaniam | 20001074   | ALU, MUX,<br>Datapath            | 4 <sup>th</sup> Year 2 <sup>nd</sup> |
| Tang Jin Hang            | 20001336   | DMEM, PC,<br>Controller          | 4 <sup>th</sup> Year 2 <sup>nd</sup> |

## **TABLE OF CONTENTS**

|            |                                    |    |
|------------|------------------------------------|----|
| Chapter 1: | Introduction .....                 | 2  |
| Chapter 2: | Objectives .....                   | 2  |
| Chapter 3: | Methodology.....                   | 3  |
| 3.1        | Top Level Design.....              | 3  |
| 3.1.1      | Datapath.....                      | 4  |
| 3.1.2      | Controller.....                    | 8  |
| 3.2        | Arithmetic Logic Unit (ALU) .....  | 15 |
| 3.3        | Instruction Memory (IMEM) .....    | 17 |
| 3.4        | Data Memory (DMEM).....            | 19 |
| 3.5        | Register File (RF) .....           | 22 |
| 3.6        | Program Counter (PC).....          | 24 |
| 3.7        | Multixplexers (MUX).....           | 25 |
| Chapter 4: | Results and Discussion .....       | 27 |
| 4.1        | Top Level Design.....              | 27 |
| 4.2        | Arithmetic Logic Unit (ALU) .....  | 32 |
| 4.3        | Instruction Memory (IMEM) .....    | 35 |
| 4.4        | Data Memory (DMEM).....            | 37 |
| 4.5        | Register File (RF) .....           | 40 |
| 4.6        | Program Counter (PC).....          | 44 |
| 4.7        | Multiplexers (MUX).....            | 46 |
| Chapter 5: | Conclusion & Recommendations ..... | 48 |

## **CHAPTER 1: INTRODUCTION**

A central processing unit (CPU) is known as the brain of the computer which is responsible for executing instructions and processing data. A 16-bit CPU is a computer architecture which uses 16-bit integers for memory addresses or 2 octets wide. The arithmetic unit (ALU) are also based on 16-bit registers. From the equation  $N = 2^x$ , where x is the number of bits and N is the number of possible values. For a 16-bit CPU it has integer representation of between 0 to 65535 unsigned numbers or -32768 to 32767 representations in two's complement.

A CPU is generally made up of multiple sub-modules such as the controller, program counter (PC), instruction memory (IMEM), register file (RF), data memory (DMEM) and the arithmetic logic unit (ALU). The sub-modules are connected through the datapath and the controlled by the signals from the controller.

## **CHAPTER 2: OBJECTIVES**

To design a 16-bit CPU by integrating smaller building blocks (i.e. ALU, DMEM, IMEM, RF sub-modules) and an instruction decoder as a complete CPU datapath. Then to integrate the datapath with the corresponding controller or the specified instruction set. The sub-modules are to be built first followed by integration into a working CPU.

## CHAPTER 3: METHODOLOGY

### 3.1 Top Level Design

To bring together the Datapath and Controller black boxes, the top-level design was declared which connects all the input and output pins of the Datapath to the controller and vice versa. The top-level design calls all the module required in the CPU such as the Controller, Datapath, ALU, DMEM, IMEM, MUX, PC, and RF modules. The top-level design has the clock (clk) and reset (rst) as the input, while having LEDs of 8-bit size as the output.

The code for the top-level design is as shown in below:

```
'timescale 1ns / 1ps
`include "control.sv"
`include "datapath.sv"
`include "alu.sv"
`include "dmem.sv"
`include "mux.sv"
`include "pc.sv"
`include "rom.sv"
`include "rf.sv"

module TopLevel (
    input logic clk,           // Clock input
    input logic reset,         // Reset input
    output logic [7:0] LEDs   // LEDs output from RF module (for debugging or status)
);

// Internal signals for connecting controller and datapath
logic [3:0] opcode; // Opcode fetched from IMEM
logic Zero, Negative, Overflow;

// Control signals from controller
logic pc_load;
logic i_mem_oe;
logic rf_mux_sel;
logic rf_write_en;
logic alu_mux_sel;
logic [3:0] alu_opcode;
logic d_mem_rw_;
logic d_mem_cs;
logic data_out_mux;
logic branch;
logic jump;

// Additional control signals for datapath
logic ctrl_output_to_mux_rf_selected_rd_input;
logic ctrl_output_to_mux_alu_selected_rt_input;
logic ctrl_output_to_mux_rf_selected_data_in_input;
logic [3:0] ctrl_output_to_alu_input_opcode;
logic ctrl_output_to_mux_branch;
logic ctrl_output_to_mux_jump;
```

```

// Instantiate the datapath module
Datapath dp_inst (
    .clk(clk),
    .reset(reset),
    .pc_load(pc_load),           // Connected to controller
    .write_en(rf_write_en),      // Connected to controller
    .RW_(d_mem_rw_),            // Connected to controller
    .CS(d_mem_cs),              // Connected to controller
    .OE(i_mem_oe),               // Connected to controller
    .ctrl_output_to_mux_rf_selected_rd_input(rf_mux_sel), // Connected to controller
    .ctrl_output_to_mux_alu_selected_rt_input(alu_mux_sel), // Connected to controller
    .ctrl_output_to_mux_rf_selected_data_in_input(data_out_mux), // Connected to controller
    .ctrl_output_to_alu_input_opcode(alu_opcode), // Connected to controller
    .ctrl_output_to_mux_branch(branch), // Connected to controller
    .ctrl_output_to_mux_jump(jump), // Connected to controller
    .imem_output_to_control_input_opcode(opcode), // Opcode output to controller
    .Zero(Zero),                  // Zero flag to controller
    .Negative(Negative),          // Negative flag to controller
    .Overflow(Overflow),          // Overflow flag to controller
    .LEDs(LEDs)                  // Debugging output
);

// Instantiate the controller module
controller ctrl_inst (
    .clk(clk),
    .rst(reset),
    .opcode(opcode), // Opcode comes from IMEM in Datapath
    .Zero(Zero),                  // Zero flag to controller
    .Negative(Negative),          // Negative flag to controller
    .Overflow(Overflow),          // Overflow flag to controller
    .pc_load(pc_load),
    .i_mem_oe(i_mem_oe),
    .rf_mux_sel(rf_mux_sel),
    .rf_write_en(rf_write_en),
    .alu_mux_sel(alu_mux_sel),
    .alu_opcode(alu_opcode),
    .d_mem_rw_(d_mem_rw_),
    .d_mem_cs(d_mem_cs),
    .data_out_mux(data_out_mux),
    .branch(branch),
    .jump(jump)
);

```

endmodule

### 3.1.1 Datapath

The Datapath module is the backbone of the CPU as it interconnects the major components like the Program Counter (PC), Instruction Memory (IMEM), Register File (RF), Arithmetic Logic Unit (ALU), Data Memory (DMEM), and all relevant multiplexers (MUXes) that control data flow. The entire design was based on the single-cycle CPU architecture diagram shown in Figure 1.0, which served as the primary reference throughout implementation.

The Datapath starts at the PC, which sends a 16-bit address to IMEM. IMEM then outputs a 16-bit instruction based on that address. This instruction is sliced into 3 parts which are opcode, rs, rt, and rd. These are used to determine the source and destination registers in RF. The RF then outputs two 16-bit values from the selected registers, which are used as inputs to the ALU or sent to DMEM, depending on the instruction type.

For operations involving immediate values such as shift or branch, a sign extension is applied to the 4-bit immediate from the instruction to convert it into a 16-bit operand. This is achieved in the Datapath through bit-concatenation logic.

A total of five MUXes are placed in key spots to handle variations between instruction types:

- One MUX chooses between rt and rd as the destination register.
- Another MUX selects between register value or immediate for ALU's second input.
- The third MUX selects between ALU or DMEM output to write back into RF.
- Two more MUXes are used for updating the PC based on whether it's a normal increment, branch, or jump.

The ALU receives its inputs and performs operations such as arithmetic, logic, and shifts. It outputs both a result and status flags which are Zero, Negative, and Overflow, which are crucial for conditional branches. For memory operations, the ALU result is used as the address for DMEM access, and depending on control signals, data is either read from or written to memory.

At the end of the cycle, the result which is either from ALU or DMEM is written back into RF if write\_en is high. The rd destination register is chosen using a MUX depending on whether the instruction is R-type or I-type.

This modular Datapath design supports all instruction types in the ADSD-RISC ISA (R-type, S-type, B-type, L-type, J-type) while maintaining clean separation between control signals and data lines, making it easier to debug and extend.



Figure 1.0 Block diagram of a single cycle microarchitecture of ADSD-RISC with the controller module

The design code for Datapath is shown below:

```
1 //`timescale 1ns / 1ps
2 module Datapath(
3     input clk,
4     input reset,
5     input pc_load,
6     input write_en,
7     input RW_,
8     input CS,
9     input OE,
10    input ctrl_output_to_mux_rf_selected_rd_input,
11    input ctrl_output_to_mux_alu_selected_rt_input,
12    input ctrl_output_to_mux_rf_selected_data_in_input,
13    input [3:0] ctrl_output_to_alu_input_opcode,
14    input ctrl_output_to_mux_branch,
15    input ctrl_output_to_mux_jump,
16    output [3:0] imem_output_to_control_input_opcode,
17    output Zero,
18    output Negative,
19    output Overflow,
20    output [7:0] LEDs
21 );
22
23    wire [15:0] pc_output_to_imem_input;
24    wire [15:0] imem_output_instruction_register_IR;
25    wire [15:0] rf_output_rs_to_alu_input;
26    wire [15:0] rf_output_rt_to_alu_input;
27    wire [15:0] alu_output_to_rf_input;
28    wire [15:0] dmem_output_to_rf_input;
29    wire [15:0] imem_output_to_alu_input;
30    wire [15:0] alu_output_to_dmem_addr;
31    wire [3:0] rf_selected_rd;
32    wire [15:0] alu_selected_rt, rf_selected_data_in;
33    wire [15:0] mux_branch_output_to_mux_jump_input;
34    wire [15:0] mux_jump_output_to_pc_input;
35
36    assign imem_output_to_alu_input =
37        {{12{imem_output_instruction_register_IR[3:0]}},
38
39        imem_output_instruction_register_IR[3:0]};
40
41    assign imem_output_to_control_input_opcode =
42        imem_output_instruction_register_IR[15:12];
43
44    shortint addition_pc_output_plus_2,
45    addition_pc_output_plus_2_plus_imem_output_to_alu_input;
46
47    assign addition_pc_output_plus_2 = pc_output_to_imem_input + 16'b01;
48
49    assign addition_pc_output_plus_2_plus_imem_output_to_alu_input =
50    addition_pc_output_plus_2 + imem_output_to_alu_input;
51
52 // Instantiate Program Counter (PC)
53 pc PC (
54     .pc_load(pc_load),
55     .alu_output_to_rf_input(mux_jump_output_to_pc_input),
56     .reset(reset),
57     .clk(clk),
58     .pc_output_to_imem_input(pc_output_to_imem_input)
59 );
60
61 // Instantiate Instruction Memory (IMEM)
62 IMEM ROM (
63     .pc_output_to_imem_input(pc_output_to_imem_input),
64     .OE(OE),
65     .imem_output_to_rf_input(imem_output_instruction_register_IR)
66 );
67
```

```

68 // Instantiate Register File (RF)
69 RF RegisterFile (
70     .imem_output_to_rf_input_rs(imem_output_instruction_register_IR[11:8]), // rs register
71     .imem_output_to_rf_input_rt(imem_output_instruction_register_IR[7:4]), // rt register
72     .imem_output_to_rf_input_rd(rf_selected_rd), // rd register
73     .alu_output_to_rf_input(rf_selected_data_in), // Data to RF from DMEM
74     .write_en(write_en),
75     .rst(reset),
76     .clk(clk),
77     .rf_output_rs_to_alu_input(rf_output_rs_to_alu_input),
78     .rf_output_rt_to_alu_input(rf_output_rt_to_alu_input),
79     .leds(LEDs)
80 );
81
82 // Instantiate ALU
83 ALU alu (
84     .rf_output_rs_to_alu_input(rf_output_rs_to_alu_input),
85     .rf_output_rt_to_alu_input(alu_selected_rt),
86     .imem_output_to_rf_input_opcode(ctrl_output_to_alu_input_opcode),
87     .alu_output_to_rf_input(alu_output_to_rf_input),
88     .Zero(Zero),
89     .Negative(Negative),
90     .Overflow(Overflow)
91 );
92
93 // Instantiate Data Memory (DMEM)
94 dmem DataMemory (
95     .clk(clk),
96     .alu_output_to_dmem_addr(alu_output_to_rf_input), // Address from ALU
97     .rf_output_rt_to_alu_input(rf_output_rt_to_alu_input), // Data input
98     .RW_(RW_), // Read/Write control
99     .CS_(CS), // Chip Select
100    .dmem_output_to_rf_input(dmem_output_to_rf_input) // Data out to RF
101 );
102
103 mux_4bit_2to1 mux_rf_selected_rd (
104     .I0(imem_output_instruction_register_IR[7:4]), // Option 0: Bits [7:4]
105     .I1(imem_output_instruction_register_IR[3:0]), // Option 1: Bits [3:0]
106     .S(ctrl_output_to_mux_rf_selected_rd_input), // Select signal
107     .Y(rf_selected_rd) // Output of MUX
108 );
109
110 mux_16bit_2to1 mux_alu_selected_rt (
111     .I0(rf_output_rt_to_alu_input),
112     .I1(imem_output_to_alu_input),
113     .S(ctrl_output_to_mux_alu_selected_rt_input),
114     .Y(alu_selected_rt)
115 );
116
117 mux_16bit_2to1 mux_rf_selected_data_in (
118     .I0(dmem_output_to_rf_input),
119     .I1(alu_output_to_rf_input),
120     .S(ctrl_output_to_mux_rf_selected_data_in_input),
121     .Y(rf_selected_data_in)
122 );
123
124 mux_16bit_2to1 mux_branch (
125     .I0(addition_pc_output_plus_2),
126     .I1(addition_pc_output_plus_2_plus_imem_output_to_alu_input),
127     .S(ctrl_output_to_mux_branch),
128     .Y(mux_branch_output_to_mux_jump_input)
129 );
130
131 mux_16bit_2to1 mux_jump (
132     .I0(mux_branch_output_to_mux_jump_input),
133     .I1({{4{imem_output_instruction_register_IR[11]}}, imem_output_instruction_register_IR[11:0]}),
134     .S(ctrl_output_to_mux_jump),
135     .Y(mux_jump_output_to_pc_input)
136 );
137
138
139 endmodule
140

```

### 3.1.2 Controller

In order to coordinate the operations of different processor components, including the Program Counter (PC), Register File (RF), Arithmetic Logic Unit (ALU), Instruction Memory (I-MEM), and Data Memory (D-MEM), the Controller (CTRL) subsystem was implemented as a finite-state logic unit. In order to regulate the datapath behaviours during instruction execution, this module decodes the instruction opcode and keeps an eye on the status flags (Zero, Negative, Overflow).

Combinational logic within an “always @(\*)” block was used in the design of the CTRL module to guarantee a prompt response to any changes in the input. The following inputs are given to it:

- opcode: a 4-bit signal representing the instruction operation.
- Zero, Negative, and Overflow: status flags from the ALU.
- clk and rst: system clock and active-low reset signal.

Upon reset ( $\text{rst} = 0$ ), all control signals are deactivated, ensuring a safe initial state. When reset is inactive ( $\text{rst} = 1$ ), the controller performs opcode decoding using a case statement. Each opcode corresponds to a specific instruction type and accordingly sets the necessary control signals such as:

- $\text{pc\_load}$ : enables loading of the Program Counter.
- $\text{i\_mem\_oe}$ : enables output of IMEM.
- $\text{rf\_write\_en}$ : enables writing to the Register File.
- $\text{alu\_opcode}$ : selects the ALU operation.
- $\text{rf\_mux\_sel}$  and  $\text{alu\_mux\_sel}$ : control the source of data for the RF and ALU.
- $\text{data\_out\_mux}$ : control the input of RF through selection between output of DMEM or output of ALU.
- $\text{d\_mem\_cs}$ ,  $\text{d\_mem\_rw}$ : control the chip select and read/write for data memory.
- branch, jump: handle control flow changes based on ALU flags.

Next, the controller supports various instruction types including:

- **R-type** (e.g., ADD, SUB, AND, OR) – use register operands and perform ALU operations.

- **S-type** (e.g., SHIFT, ROTATE, NOT, ADDI) – include immediate value operations or bit manipulations.
- **B-type** (e.g., BEQ, BGT, BLT) – conditional branch instructions, dependent on ALU flags like Zero or Negative.
- **L-type** (LD, ST) – memory access operations involving load and store instructions.
- **J-type** (JUMP) – unconditional jumps handled by setting the jump signal high.

To prevent unexpected behaviours, extra effort was taken to specify default values for all control signals. By verifying ALU flags and asserting the branch signal appropriately, conditional branching is put into practice. For the processor's control flow and sequential instruction execution, the controller makes sure that the signals are properly coordinated. Thus, easy debugging, extension for new instructions, and integration with the rest of the datapath are made possible by this modular and opcode-driven methodology.

The block diagram of a single cycle microarchitecture of ADSD-RISC with the controller module is shown as below:



Figure 2.0: Block diagram of a single cycle microarchitecture of ADSD-RISC with the controller module

The codes for the CTRL, controller module is shown as below:

```
1 `timescale 1ns / 1ps
2 module controller (
3     input clk,
4     input rst,
5     input wire [3:0] opcode,    // 4-bit opcode from instruction
6     input Zero,
7     input Negative,
8     input Overflow,
9     output reg      pc_load,
10    output reg      i_mem_oe,
11    output reg      rf_mux_sel, //mux before going into rf
12    output reg      rf_write_en, //write_en for rf
13    output reg      alu_mux_sel, //mux before going into alu
14    output reg [3:0] alu_opcode, // ALU operation selection
15    output reg      d_mem_rw_, //rw_ for dmem
16    output reg      d_mem_cs, //cs for dmem
17    output reg      data_out_mux, //data_out from dmem into mux
18    output reg      branch,   // Branch signal
19    output reg      jump     // Jump signal
20 );
21
22 always @(*) begin
23     // Default values
24     if (!rst) begin
25         pc_load = 1'b0;
26         i_mem_oe = 1'b0;
27         rf_mux_sel = 1'b1;
28         rf_write_en = 1'b0;
29         alu_mux_sel = 1'b0;
30         alu_opcode = 4'b0000;
31         d_mem_rw_ = 1'b0;
32         d_mem_cs = 1'b0;
33         data_out_mux = 1'b1;
34         branch = 1'b0;
35         jump = 1'b0;
36     end
37
38     else begin
39         pc_load = 1'b0;
40         i_mem_oe = 1'b1;
41         rf_mux_sel = 1'b1;
42         rf_write_en = 1'b0;
43         alu_mux_sel = 1'b0;
44         alu_opcode = 4'b0000;
45         d_mem_rw_ = 1'b0;
46         d_mem_cs = 1'b0;
47         data_out_mux = 1'b1;
48         branch = 1'b0;
49         jump = 1'b0;
```

```

51 case (opcode)
52     // R-type instructions (add, sub, or, and)
53     4'b0000: begin //Add
54         pc_load = 1'b1;
55         i_mem_oe = 1'b1;
56         rf_mux_sel = 1'b1;
57         rf_write_en = 1'b1;
58         alu_mux_sel = 1'b0;
59         alu_opcode = 4'b0000;
60         d_mem_rw_ = 1'b0;
61         d_mem_cs = 1'b0;
62         data_out_mux = 1'b1;
63         branch = 1'b0;
64         jump = 1'b0;
65     end
66
67     4'b0001: begin //Sub
68         pc_load = 1'b1;
69         i_mem_oe = 1'b1;
70         rf_mux_sel = 1'b1;
71         rf_write_en = 1'b1;
72         alu_mux_sel = 1'b0;
73         alu_opcode = 4'b0001;
74         d_mem_rw_ = 1'b0;|
75         d_mem_cs = 1'b0;
76         data_out_mux = 1'b1;
77         branch = 1'b0;
78         jump = 1'b0;
79     end
80
81     4'b0010: begin //Or
82         pc_load = 1'b1;
83         i_mem_oe = 1'b1;
84         rf_mux_sel = 1'b1;
85         rf_write_en = 1'b1;
86         alu_mux_sel = 1'b0;
87         alu_opcode = 4'b0010;
88         d_mem_rw_ = 1'b0;
89         d_mem_cs = 1'b0;
90         data_out_mux = 1'b1;
91         branch = 1'b0;
92         jump = 1'b0;
93     end
94
95     4'b0011: begin //And
96         pc_load = 1'b1;
97         i_mem_oe = 1'b1;
98         rf_mux_sel = 1'b1;
99         rf_write_en = 1'b1;
100        alu_mux_sel = 1'b0;
101        alu_opcode = 4'b0011;
102        d_mem_rw_ = 1'b0;
103        d_mem_cs = 1'b0;
104        data_out_mux = 1'b1;
105        branch = 1'b0;
106        jump = 1'b0;
107    end
108

```

```

109 // S-type (shift, rotate, NOT, addi)
110 4'b0100: begin //shift left
111    pc_load = 1'b1;
112    i_mem_oe = 1'b1;
113    rf_mux_sel = 1'b0;
114    rf_write_en = 1'b1;
115    alu_mux_sel = 1'b1;
116    alu_opcode = 4'b0100;
117    d_mem_rw_ = 1'b0;
118    d_mem_cs = 1'b0;
119    data_out_mux = 1'b1;
120    branch = 1'b0;
121    jump = 1'b0;
122  end
123
124 4'b0101: begin //shift right
125    pc_load = 1'b1;
126    i_mem_oe = 1'b1;
127    rf_mux_sel = 1'b0;
128    rf_write_en = 1'b1;
129    alu_mux_sel = 1'b1;
130    alu_opcode = 4'b0101;
131    d_mem_rw_ = 1'b0;
132    d_mem_cs = 1'b0;
133    data_out_mux = 1'b1;
134    branch = 1'b0;
135    jump = 1'b0;
136  end
137
138 4'b0110: begin //rotate left
139    pc_load = 1'b1;
140    i_mem_oe = 1'b1;
141    rf_mux_sel = 1'b0;
142    rf_write_en = 1'b1;
143    alu_mux_sel = 1'b1;
144    alu_opcode = 4'b0110;
145    d_mem_rw_ = 1'b0;
146    d_mem_cs = 1'b0;
147    data_out_mux = 1'b1;
148    branch = 1'b0;
149    jump = 1'b0;
150  end
151
152 4'b0111: begin //rotate right
153    pc_load = 1'b1;
154    i_mem_oe = 1'b1;
155    rf_mux_sel = 1'b0;
156    rf_write_en = 1'b1;
157    alu_mux_sel = 1'b1;
158    alu_opcode = 4'b0111;
159    d_mem_rw_ = 1'b0;
160    d_mem_cs = 1'b0;
161    data_out_mux = 1'b1;
162    branch = 1'b0;
163    jump = 1'b0;
164  end

```

```

166 4'b1000: begin //not
167   pc_load = 1'b1;
168   i_mem_oe = 1'b1;
169   rf_mux_sel = 1'b0;
170   rf_write_en = 1'b1;
171   alu_mux_sel = 1'b1;
172   alu_opcode = 4'b1000;
173   d_mem_rw_ = 1'b0;
174   d_mem_cs = 1'b0;
175   data_out_mux = 1'b1;
176   branch = 1'b0;
177   jump = 1'b0;
178 end
179
180 4'b1111: begin //addi
181   pc_load = 1'b1;
182   i_mem_oe = 1'b1;
183   rf_mux_sel = 1'b0;
184   rf_write_en = 1'b1;
185   alu_mux_sel = 1'b1;
186   alu_opcode = 4'b1111; |
187   d_mem_rw_ = 1'b0;
188   d_mem_cs = 1'b0;
189   data_out_mux = 1'b1;
190   branch = 1'b0;
191   jump = 1'b0;
192 end
193
194 // B-type (Branch instructions)
195 4'b1001: begin //Beq
196   pc_load = 1'b1;
197   i_mem_oe = 1'b1;
198   rf_mux_sel = 1'b0;
199   rf_write_en = 1'b0;
200   alu_mux_sel = 1'b0;
201   alu_opcode = 4'b0001;
202   d_mem_rw_ = 1'b0;
203   d_mem_cs = 1'b0;
204   data_out_mux = 1'b0;
205   // branch = 1'b0;
206   jump = 1'b0;
207   if (Zero) branch = 1'b1;
208 end
209
210 4'b1010: begin //Blt
211   pc_load = 1'b1;
212   i_mem_oe = 1'b1;
213   rf_mux_sel = 1'b0;
214   rf_write_en = 1'b0;
215   alu_mux_sel = 1'b0;
216   alu_opcode = 4'b0001;
217   d_mem_rw_ = 1'b0;
218   d_mem_cs = 1'b0;
219   data_out_mux = 1'b0;
220   // branch = 1'b0;
221   jump = 1'b0;
222   if (Negative) branch = 1'b1;
223 end

```

```

225      4'b1011; begin //Bgt
226          pc_load = 1'b1;
227          i_mem_oe = 1'b1;
228          rf_mux_sel = 1'b0;
229          rf_write_en = 1'b0;
230          alu_mux_sel = 1'b0;
231          alu_opcode = 4'b0001;
232          d_mem_rw_ = 1'b0;
233          d_mem_cs = 1'b0;
234          data_out_mux = 1'b0;
235          branch = 1'b0;
236          jump = 1'b0;
237          if (!Zero && !Negative) branch = 1'b1;
238      end
239
240
241      // L-type (Load and Store)
242      4'b1100; begin //Ld
243          pc_load = 1'b1;
244          i_mem_oe = 1'b1;
245          rf_mux_sel = 1'b0;
246          rf_write_en = 1'b1;
247          alu_mux_sel = 1'b1;
248          alu_opcode = 4'b0000;
249          d_mem_rw_ = 1'b1; //read
250          d_mem_cs = 1'b1;
251          data_out_mux = 1'b0;
252          branch = 1'b0;
253          jump = 1'b0;
254      end
255
256      4'b1101; begin //St
257          pc_load = 1'b1;
258          i_mem_oe = 1'b1;
259          rf_mux_sel = 1'bx; //dont care
260          rf_write_en = 1'b0;
261          alu_mux_sel = 1'b1;
262          alu_opcode = 4'b0000;
263          d_mem_rw_ = 1'b0; //write
264          d_mem_cs = 1'b1;
265          data_out_mux = 1'bx; //dont care
266          branch = 1'b0;
267          jump = 1'b0;
268      end
269
270      // J-type (Jump)
271      4'b1110; begin
272          pc_load = 1'b1;
273          i_mem_oe = 1'b1;
274          rf_mux_sel = 1'bx; //dont care
275          rf_write_en = 1'b0;
276          alu_mux_sel = 1'b0; |
277          alu_opcode = 4'b0000;
278          d_mem_rw_ = 1'b0;
279          d_mem_cs = 1'b0;
280          data_out_mux = 1'bx; //dont care
281          branch = 1'b0;
282
283      endcase
284  end
285 end
286
287
288
289 endmodule

```

### 3.2 Arithmetic Logic Unit (ALU)

The Arithmetic Logic Unit (ALU) is responsible for performing all the arithmetic and logical operations in the CPU. It operates entirely based on the 4-bit opcode passed from the instruction, and the design strictly follows the operation definitions shown in Table 1.0, which lists all R-type and S-type instructions for the ADSD-RISC ISA.

The ALU takes two 16-bit inputs from the Register File (rs and rt), processes them according to the opcode, and outputs a 16-bit result to be written back to the destination register. Internally, both operands are treated as signed integers to properly handle arithmetic overflow during add, sub, and addi instructions.

For overflow detection:

- Addition overflow is detected when both operands have the same sign, but the result flips sign.
- Subtraction overflow occurs when the operands have opposite signs, and the result matches the subtrahend's sign. This is implemented using bitwise logic checks on the sign bits (MSB) of the operands and result.

Logical instructions like and, or, and not are performed directly using Verilog's bitwise operators. For shift and rotate instructions, only the lowest 4 bits of the rt input are used as the shift amount, as required by the instruction format.

The ALU also sets three status flags used by the controller for branching:

- Zero: Set when the result is zero.
- Negative: Set when the result is negative (MSB = 1).
- Overflow: Set when an arithmetic overflow is detected.

Each opcode maps to one operation, and a case statement is used to implement the logic cleanly inside an “always @(\*)” block. This ensures the ALU behaves as a purely combinational block as it reacts instantly to input changes, without any clock dependency.

| Opcode                         | Instruction        | RTL Operation                                           |
|--------------------------------|--------------------|---------------------------------------------------------|
| R-type (Register Instructions) |                    |                                                         |
| 0000                           | add rs, rt, rd     | $rd \leftarrow rs + rt$                                 |
| 0001                           | sub rs, rt, rd     | $rd \leftarrow rs - rt$                                 |
| 0010                           | or rs, rt, rd      | $rd \leftarrow rs   rt$ (bitwise OR)                    |
| 0011                           | and rs, rt, rd     | $rd \leftarrow rs \& rt$ (bitwise AND)                  |
| S-type (Shift Instructions)    |                    |                                                         |
| 0100                           | shl rs, rt, #imm4  | $rt \leftarrow$ arithmetic shift left rs by #imm4 bits  |
| 0101                           | shr rs, rt, #imm4  | $rt \leftarrow$ arithmetic shift right rs by #imm4 bits |
| 0110                           | rol rs, rt         | $rt \leftarrow \{rs[14:0], rs[15]\}$                    |
| 0111                           | ror rs, rt         | $rt \leftarrow \{rs[0], rs[15:1]\}$                     |
| 1000                           | not rs, rt         | $rt \leftarrow \sim rs$ (bitwise negation)              |
| 1111                           | addi rs, rt, #imm4 | $rt \leftarrow rs + \{12\{imm4[3], imm4\}\}$            |

Table 1.0 List of R-type and S-type instructions and their RTL operations

This ALU design is modular, compact, and easily extendable, which makes it suitable for integration in the single-cycle CPU architecture we built.

The design code for ALU is shown below:

```

1 //`timescale 1ns / 1ps
2
3 module ALU(
4   input [15:0] rf_output_rs_to_alu_input, // ALU gets rs from RF
5   input [15:0] rf_output_rt_to_alu_input, // ALU gets rt from RF
6   input [3:0] imem_output_to_rf_input_opcode, // Opcode from IMEM (instruction)
7   output reg [15:0] alu_output_to_rf_input, // ALU result goes to RF
8   output reg Zero, Negative, Overflow
9 );
10 reg signed [15:0] signed_rs, signed_rt, signed_Result;
11
12 always @(*) begin
13   // Default values
14   Zero = 0;
15   Negative = 0;
16   Overflow = 0;
17   signed_rs <= rf_output_rs_to_alu_input;
18   signed_rt <= rf_output_rt_to_alu_input;
19
20   case (imem_output_to_rf_input_opcode)
21     4'b0000: begin // ADD
22       signed_Result = signed_rs + signed_rt;
23       Overflow = ((signed_rs[15] == signed_rt[15]) && (signed_Result[15] != signed_rs[15]));
24       alu_output_to_rf_input = signed_Result;
25     end
26     4'b0001: begin // SUB
27       signed_Result = signed_rs - signed_rt;
28       Overflow = ((signed_rs[15] != signed_rt[15]) && (signed_Result[15] != signed_rs[15]));
29       alu_output_to_rf_input = signed_Result;
30     end
31     4'b0010: alu_output_to_rf_input = signed_rs | signed_rt; // OR
32     4'b0011: alu_output_to_rf_input = signed_rs & signed_rt; // AND
33     4'b0100: alu_output_to_rf_input = signed_rs << rf_output_rt_to_alu_input[3:0]; // SHL (Arithmetic Shift Left)
34     4'b0101: alu_output_to_rf_input = $signed(rf_output_rs_to_alu_input) >> rf_output_rt_to_alu_input[3:0]; // SHR (Arithmetic Shift Right)
35     4'b0110: alu_output_to_rf_input = {rf_output_rs_to_alu_input[14:0], rf_output_rs_to_alu_input[15]}; // ROL (Rotate Left)
36     4'b0111: alu_output_to_rf_input = {rf_output_rs_to_alu_input[0], rf_output_rs_to_alu_input[15:1]}; // ROR
37     4'b1000: alu_output_to_rf_input = ~rf_output_rs_to_alu_input; // NOT
38     4'b1111: alu_output_to_rf_input = signed_rs + {12{rf_output_rt_to_alu_input[3]}}, rf_output_rt_to_alu_input[3:0]; // ADDI (Sign-extended)
39     default: alu_output_to_rf_input = 16'b0;
40   endcase
41
42   // Zero flag
43   Zero = (alu_output_to_rf_input == 16'b0) ? 1'b1 : 1'b0;
44
45   // Negative flag
46   Negative = alu_output_to_rf_input[15];
47 end
48 endmodule
49

```

### 3.3 Instruction Memory (IMEM)

The Instruction Memory (IMEM) submodule in a 16-bit CPU architecture is required by the system to store program and data to function properly. The module is implemented as a double-byte addressable where each address contains 16-bit. This submodule is designed to interface with the Program Counter (PC) and Register File (RF) modules.

The IMEM is designed to have 1 kilobyte of space starting at address 0x0000. This allows for 512 words of 16-bit width to be stored. The Instruction Memory module was developed in System Verilog and designed to support 16-bit data width that is double-byte addressable and has a 16-bit input address (addr) and a 16-bit data output (data\_out). The module has a 1-bit output enable (OE) input which allows the 16-bit values to be present at the output. When OE=0, the output of the module is set to high impedance state (16 ‘bz).

The registers were defined to be from 0 to 511, which has 512 possible values of 16-bits. This equals to 1 KB of ROM space. The program data was programmed into the ROM in the design of ROM. For testing of the ROM module, eight different ROM values were defined. The ROM is asynchronous and does not depend on clock cycle to update the data. The “always @” block was used with the addr input and OE input as the trigger to update the data\_out value. When OE is 1, the data\_out would be the value defined by the address given in addr input. Else the data\_out will be high impedance. The block diagram of the IMEM module is as shown in Figure 3.0.



Figure 3.0: IMEM Block Diagram

The code for IMEM, Instruction Memory is as below:

```
module IMEM (
    input [15:0] pc_output_to_imem_input,           // 16-bit address input
    input OE,                                         // Output Enable
    output reg [15:0] imem_output_to_rf_input        // 16-bit data output
);

// Memory array: 512 words, each word is 16 bits (2 bytes)
reg [15:0] rom [0:511]; // 512 words, each 16 bits

// Initializing ROM with hardcoded binary values (simulation purposes)
initial begin
    // Hardcode the ROM values for simulation (this replaces the file read)
    rom[0] = 16'b1010000101100010; // ROM[0] = 0xA1B2 (little-endian)
    rom[1] = 16'b1100001111010100; // ROM[1] = 0xC3D4 (little-endian)
    rom[2] = 16'b0101111101101110; // ROM[2] = 0x5F7E (little-endian)
    rom[3] = 16'b0111110110001100; // ROM[3] = 0x7D8C (little-endian)
    rom[4] = 16'b1010000101100010; // ROM[0] = 0xA1B2 (little-endian)
    rom[5] = 16'b1100001111010100; // ROM[1] = 0xC3D4 (little-endian)
    rom[6] = 16'b0101111101101110; // ROM[2] = 0x5F7E (little-endian)
    rom[7] = 16'b0111110110001100;
    // ... continue initializing for all required memory addresses
    // Ensure to initialize all 512 entries as needed
end

// Always block to access memory
always @ (pc_output_to_imem_input or OE) begin
    if (OE == 1) begin
        // Read 16-bit word from the given address (no need for byte combining)
        imem_output_to_rf_input = rom[pc_output_to_imem_input];
    end else begin
        // High impedance state when OE is 0
        imem_output_to_rf_input = 16'bz;
    end
end
endmodule
```

### 3.4 Data Memory (DMEM)

The Data Memory (DMEM) subsystem in a 16-bit CPU architecture is responsible for storing and retrieving data during program execution. This module is implemented as a byte-addressable memory unit capable of writing and reading operations. Read operations perform when the control signal RW\_1 and CS is 1 regardless of the clock edge whereas write operations perform when it is synchronized to clock edge. This submodule is designed to interface with both Arithmetic Logic Unit (ALU) and Register File (RF).

With 512 words of 16-bit width and a memory space of 1KB, the Data Memory module is developed in Verilog and designed to support 16-bit data width. In order to maintain compatibility with the rest of the CPU architecture, the memory is designed to be byte-addressable while retaining 16-bit data ports. Next, the internal memory is implemented as a 512-element register array, each 16 bits wide. Thus, a total of 1KB of memory is provided by this structure, allowing aligned word access and supporting a 16-bit CPU instruction/data width.

The interface signals used in the Data Memory subsystem is shown as below:

- clk (input): The system clock used to synchronize write operations.
- alu\_output\_to\_dmem\_addr (input): A 16-bit address generated by the ALU, used to index into the memory.
- rf\_output\_rt\_to\_alu\_input (input): A 16-bit data line in from the Register File to be written into memory.
- RW\_ (input): A control signal indicating the memory operation (Read = 1, Write = 0).
- CS (input): Chip Select signal that enables or disables the memory block.
- dmem\_output\_to\_rf\_input (output): The 16-bit output data line feeding back into the Register File.

For the read mechanism, the “always @(\*)” block is used to handle it asynchronously. The memory only outputs data when CS is active and RW\_ is high. The corresponding data is obtained and driven onto the output when the address is within boundaries. If not, a value of zero is assigned by default. To prevent bus contention, the output switches to a high-impedance (z) state in all other situations, such as writing operations or chip deactivation.

For the write mechanism, the operations performed are synced with the clock's positive edge. The information from the Register File is written into the RAM at the designated location while CS is active, RW\_ is low, and the address is valid.

Not only that, but the subsystem also implements basic bounds checking for both read and write operations. With minimum error control to prevent unauthorized memory access, addresses outside of the specified range ( $alu\_output\_to\_dmem\_addr \geq 511$ ) are ignored or result in a default zero during reads.

The block diagram of the DMEM, Data Memory is shown as below:



Figure 4.0: DMEM Block Diagram

The codes of DMEM, Data Memory is shown as below:

```
1 `timescale 1ns / 1ps
2 module dmem (
3     input wire clk,                      // Clock input
4     input wire [15:0] alu_output_to_dmem_addr,    // 16-bit address
5     input wire [15:0] rf_output_rt_to_alu_input,   // 16-bit data_in
6     input wire RW_,                      // Read/Write control (1 for read, 0 for write)
7     input wire CS,                      // Chip select
8     output reg [15:0] dmem_output_to_rf_input    // 16-bit data_out
9 );
10
11 // Memory array (1KB = 1024 bytes = 512 16-bit words)
12 // Implement as byte addressable but with 16-bit ports
13 reg [15:0] memory [0:511];      // 1KB byte-addressable memory
14
15 // Tri-state output control
16 always @(*) begin
17     if (!CS) begin
18         // High impedance state when chip select is inactive
19         dmem_output_to_rf_input = 16'bz;
20     end
21     else if (RW_) begin
22         // Read operation
23         if (alu_output_to_dmem_addr < 511) begin
24             // Little-endian byte ordering (like RISC-V)
25             dmem_output_to_rf_input = memory[alu_output_to_dmem_addr];
26         end
27         else begin
28             // Address beyond memory space
29             dmem_output_to_rf_input = 16'b0;
30         end
31     end
32     else begin
33         // During write operation, output is high impedance
34         dmem_output_to_rf_input = 16'bz;
35     end
36 end
37
38 // Write operation (synchronized to clock)
39 always @ (posedge clk) begin
40     if (CS && !RW_ && alu_output_to_dmem_addr < 511) begin
41         // Little-endian byte ordering
42         memory[alu_output_to_dmem_addr] = rf_output_rt_to_alu_input;
43     end
44 end
45
46 endmodule
```

### 3.5 Register File (RF)

The register file (RF) is required for the CPU to store values temporarily. The RF module was designed with a sixteen 16-bit general purpose registers with read and write capabilities. The RF module is also responsible for updating the LEDs on the FPGA as stores in register reg[15]. The RF module has 6 inputs and 2 outputs, where the read operation of rs and rt are asynchronous, while the register write operation to rd is synchronous to the clock. The inputs and outputs are as defined below:

- Inputs :
  - rs and rt input: 4-bit register select for source register
  - rd input: 4-bit register select for destination register
  - data\_in: 16-bit data input
  - write\_en: 1-bit write enable for rd register
  - rst (reset): 1-bit reset input which clears all register contents
- Outputs:
  - rs\_data\_out and rt\_data\_out: 16-bit data outputs

Apart from the defined input and outputs, there are two special registers defined in the RF module. The first is register 0 (r0) which is a read-only register with zero values. Register 0 content should not be written to. The next is register 15 (r15) which is used to display output on the LEDs of the FPGA. r15 is an 8-bit register as the FPGA has 8 LEDs from LED7 to LED0.

The register output rs\_data\_out and rt\_data\_out was assigned to the register address from input rs and rt source destination. The always @ block was used with the module defined as change when positive edge of a clock or positive edge of reset. If the rst input is high, all register values in RF will be written with 0 values. When write\_en is high, and the destination address (rd) is not register 0, the 16-bit value at data\_in will be written to the destination register. The block diagram of the RF module is as shown in Figure 5.0.



Figure 5.0: RF Module Block Diagram

The code for the RF (Register File) is as shown below:

```

module
RF(imem_output_to_rf_input_rs,imem_output_to_rf_input_rt,imem_output_to_rf_input_rd,alu_output_to_rf_input,write_en,rst,clk,rf_out
put_rs_to_alu_input,rf_output_rt_to_alu_input,leds);
  input [3:0] imem_output_to_rf_input_rs; //4-bit rs input
  input [3:0] imem_output_to_rf_input_rt; //4-bit rt input
  input [3:0] imem_output_to_rf_input_rd; //4-bit destination register
  input [15:0] alu_output_to_rf_input; //16-bit data input
  input write_en, rst; //1-bit control signal
  input clk; //clock signal

  output [7:0]leds; //8-bit output for LED control at r15
  output [15:0] rf_output_rs_to_alu_input;
  output [15:0] rf_output_rt_to_alu_input; //16-bit output for rs and rt

  // Declare 16 registers, each 16 bits wide
  reg [15:0] registers [15:0];

  // Asynchronous read for rs and rt
  assign rf_output_rs_to_alu_input = registers[imem_output_to_rf_input_rs];
  assign rf_output_rt_to_alu_input = registers[imem_output_to_rf_input_rt];

  // LEDs connected to the least significant 8 bits of register 15 (r15)
  assign leds = registers[15][7:0]; // Use r15[7:0] for controlling LEDs

  // Synchronous write operation
  always @(posedge clk or posedge rst) begin
    registers[0] <= 16'b0; // set r0 is always 0
    if (rst) begin
      // Reset all registers to 0 when reset is high, except r0
      //registers[0] <= 16'b0; // r0 is always zero no need to reset
      registers[1] <= 16'b0;
      registers[2] <= 16'b0;
      registers[3] <= 16'b0;
      registers[4] <= 16'b0;
      registers[5] <= 16'b0;
      registers[6] <= 16'b0;
      registers[7] <= 16'b0;
      registers[8] <= 16'b0;
      registers[9] <= 16'b0;
      registers[10] <= 16'b0;
      registers[11] <= 16'b0;
      registers[12] <= 16'b0;
      registers[13] <= 16'b0;
      registers[14] <= 16'b0;
      registers[15] <= 16'b0;
    end
    else if (write_en) begin
      // Write data to registers, except for r0 (which is read-only and always 0)
      if (imem_output_to_rf_input_rd != 4'b0000) begin
        registers[imem_output_to_rf_input_rd] <= alu_output_to_rf_input; // Write data to register rd, unless it's r0
      end
      // Note: r0 will always stay 0, no matter what is written to it.
    end
  end
endmodule

```

### 3.6 Program Counter (PC)

A key component of the CPU's instruction fetch step is the Program Counter (PC) subsystem. It communicates directly with the Instruction Memory (IMEM) module and records the address of the subsequent instruction to be carried out. With its synchronous update mechanism and asynchronous reset capabilities, the PC module offers reliability and adaptability while a program is running. The address of the subsequent instruction is stored in a 16-bit register implemented by the Verilog-written PC module. It offers a reset mechanism to initialize the program counter at system startup or reset events, as well as control logic for conditional updates based on a `pc_load` signal.

The interface signals used in the Program Counter subsystem is shown as below:

- `clk` (input): The system clock signal, which synchronizes updates to the program counter.
- `reset` (input): An active-low asynchronous reset signal used to initialize the PC to zero.
- `pc_load` (input): A control signal used to enable or disable loading a new address into the PC.
- `alu_output_to_rf_input` (input): A 16-bit input value, typically coming from the ALU, used as the new address.
- `pc_output_to_imem_input` (output): The 16-bit output representing the current instruction address, which is fed to the Instruction Memory.

The `reg` data type is used to implement the program counter as a 16-bit register. The value of this register is updated only on the positive edge of the system clock or the negative edge of the reset signal. Next, to guarantee that the CPU initializes to a known state independent of the clock, the PC module employs an asynchronous reset signal (`reset`). The program counter is instantly cleared and set to zero when `reset` is deasserted (logic low). Then, the PC register is updated with the new value supplied by the ALU when the system is not in reset and the `pc_load` signal is asserted (logic high). This enables the program's execution flow to be changed by external logic (such as jump or branch instructions).

The block diagram of the PC, Program Counter is shown as below:



Figure 6.0: PC Block Diagram

The codes of PC, Program Counter is shown as below:

```

1 `timescale 1ns / 1ps
2
3 module pc(pc_load, alu_output_to_rf_input, reset, clk, pc_output_to_imem_input);
4   input pc_load, reset, clk;
5   input [15:0] alu_output_to_rf_input; //16 bits data_in
6   output reg [15:0] pc_output_to_imem_input; //16 bits data_out
7
8   always @(posedge clk or negedge reset)
9     begin
10       if (!reset)
11         pc_output_to_imem_input <= 16'b0; // Reset the output to zero
12       else if (pc_load)
13         pc_output_to_imem_input <= alu_output_to_rf_input; //Load data_in to data_out
14     end
15 endmodule

```

### 3.7 Multiplexers (MUX)

Multiplexers (MUXes) are used throughout the datapath to handle multiple instruction formats and enable dynamic routing of data based on control signals from the controller. Our CPU design uses two main types of MUXes which are 4-bit 2-to-1 MUX and 16-bit 2-to-1 MUX. These modules are used repeatedly at different locations in the Datapath for tasks like register selection, immediate selection, write-back source selection, and PC update logic.

Each MUX has two input lines (I0 and I1) and one select line (S). When S = 0, the output Y is assigned to I0. When S = 1, the output switches to I1. This basic logic is implemented using an “always @(\*)” block, ensuring the modules are purely combinational and respond immediately to input changes.

The 4-bit MUX is used to choose between the rt and rd fields of the instruction when selecting the destination register in the Register File. This is important because R-type instructions use rd as the destination, while I-type instructions typically write to rt.

The 16-bit MUX is used in multiple spots:

- To select between a register value and an immediate value for ALU input.
- To choose between the ALU output and Data Memory output for writing back into the RF.
- For PC update logic: choosing between PC+2, branch targets, or jump addresses.

All these MUXes are controlled by dedicated signals from the controller:

- ctrl\_output\_to\_mux\_rf\_selected\_rd\_input
- ctrl\_output\_to\_mux\_alu\_selected\_rt\_input
- ctrl\_output\_to\_mux\_rf\_selected\_data\_in\_input
- ctrl\_output\_to\_mux\_branch
- ctrl\_output\_to\_mux\_jump

This MUX-based architecture provides flexibility while keeping the design modular and clean. Instead of creating custom logic paths for every instruction type, the Datapath relies on a few well-placed multiplexers to handle instruction decoding in a scalable and easy-to-debug way.

The design codes for 4-bit MUX (mux\_4bit\_2to1) and 16-bit MUX (mux\_16bit\_2to1) are shown below:

```

1 //`timescale 1ns / 1ps
2 module mux_4bit_2to1 (
3     input [3:0] I0,      // First 4-bit input
4     input [3:0] I1,      // Second 4-bit input
5     input S,           // Select signal
6     output reg [3:0] Y // 4-bit output
7 );
8     always @(*) begin
9         if (S == 0)
10            Y = I0;    // Select I0 when S=0
11        else
12            Y = I1;    // Select I1 when S=1
13     end
14 endmodule
15
16 module mux_16bit_2to1 (
17     input [15:0] I0,    // First 16-bit input
18     input [15:0] I1,    // Second 16-bit input
19     input S,           // Select signal
20     output reg [15:0] Y // 16-bit output
21 );
22     always @(*) begin
23         if (S == 0)
24            Y = I0;    // Select I0 when S=0
25        else
26            Y = I1;    // Select I1 when S=1
27     end
28 endmodule
29

```

## CHAPTER 4: RESULTS AND DISCUSSION

### 4.1 Top Level Design

The top-level design testbench was designed to test the functionality of the 16-bit CPU design which utilises the program stored in the ROM to perform a task. The top-level design testbench validates the functionality of the datapath and controller with regards to all the submodules used for the CPU design.

The testbench used is as follows:

```
'timescale 1ns / 1ps

module TopLevel_tb;

// Testbench signals
logic clk;
logic reset;
logic [7:0] LEDs;

// Instantiate the TopLevel module
TopLevel uut (
    .clk(clk),
    .reset(reset),
    .LEDs(LEDs)
);

// Clock generation
always begin
    #5 clk = ~clk; // Clock period of 10ns (100MHz)
end

// Stimulus for the testbench
initial begin
    // Initialize clock and reset
    clk = 0;
    reset = 0;

    // Apply reset
    #10 reset = 0; // Apply reset
    #10 reset = 1; // Release reset

    $display("Starting test...");

    // Wait for execution to proceed
    #200;

    // Monitor output
    $display("Final LED Output: %b", LEDs);

    // End of test
    $finish;
end

// Monitor function to display values of important signals at every time step
initial begin
    $monitor("Time = %t | clk = %b | reset = %b | opcode = %b | LEDs = %b | PC = %d | R1 = %b | R2 = %b | R5 = %b | R15 = %b | addr = %b | data_in = %b | RW_ = %b | CS = %b | data_out = %b | mux_output = %b",
        $time, clk, reset, uut.dp_inst.imem_output_to_control_input_opcode, LEDs,
        uut.dp_inst.PC.pc_output_to_imem_input,
        uut.dp_inst.RegisterFile.registers[1], uut.dp_inst.RegisterFile.registers[2],
        uut.dp_inst.RegisterFile.registers[5],
        uut.dp_inst.RegisterFile.registers[15],
        uut.dp_inst.DataMemory.alu_output_to_dmem_addr,
        uut.dp_inst.DataMemory.rf_output_rt_to_alu_input,
        uut.dp_inst.DataMemory.RW_,
        uut.dp_inst.DataMemory.CS,
        uut.dp_inst.DataMemory.dmem_output_to_rf_input,
        uut.dp_inst.mux_rf_selected_data_in.Y
    );
end

// initial begin
//     $dumpfile("dump.vcd");
//     $dumpvars;
// end

endmodule
```

## ROM 1 Contents:

```
// initialize the rom to a known assembler program
initial
begin
    rom[0] = 16'b1111_0000_0101_0001; // addi r0, r5 , 0001 //r5=1, to be used for decrementing r1
    rom[1] = 16'b1111_0000_0010_0111; // addi r0, r2 , 0111 //r2=111 for LED display pattern
    rom[2] = 16'b1111_0000_0001_0111; // addi r0, r1 , 0111 //r1=7
    rom[3] = 16'b0000_0000_0010_1111; // add r0, r2 , r15 //r15=r2 write to LED port
    rom[4] = 16'b1001_0000_0001_1100; // beq r0, r1 , -4 //if (r1==0), goto 2
    rom[5] = 16'b0001_0001_0101_0001; // sub r1, r5, r1 //r1=r1-1
    rom[6] = 16'b0100_0010_0010_0001; // shl r2, r2, 1 //r2<<1
    rom[7] = 16'b1110_0000_0000_0011; // jmp 3 //if (r1==0), goto 2
    rom[8] = 16'b0000_0000_0000_0000; // unused
    rom[9] = 16'b0000_0000_0000_0000;
    rom[10] = 16'b0000_0000_0000_0000;
    rom[11] = 16'b1110_0000_0000_0000; // jump to 0 in case it ever comes here.
    rom[12] = 16'b0000_0000_0000_0000;
    rom[13] = 16'b0000_0000_0000_0000;
    rom[14] = 16'b0000_0000_0000_0000;
    rom[15] = 16'b0000_0000_0000_0000;
end
```

The results from the top-level design with ROM 1:

```
Time = 0 | clk = 0 | reset = 0 | opcode = zzzz | LEDs = 00000000 | PC = 0 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr =xxxxxxxxxxxxxx
Time = 5000 | clk = 1 | reset = 0 | opcode = zzzz | LEDs = 00000000 | PC = 0 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr =xxxxxxxxxxxxxx
Time = 10000 | clk = 0 | reset = 0 | opcode = zzzz | LEDs = 00000000 | PC = 0 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr =xxxxxxxxxxxxxx
Time = 15000 | clk = 1 | reset = 0 | opcode = zzzz | LEDs = 00000000 | PC = 0 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr =xxxxxxxxxxxxxx
Starting test...
Time = 20000 | clk = 0 | reset = 1 | opcode = 1111 | LEDs = 00000000 | PC = 0 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000001
Time = 25000 | clk = 1 | reset = 1 | opcode = 1111 | LEDs = 00000000 | PC = 1 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000111
Time = 30000 | clk = 0 | reset = 0 | opcode = 1111 | LEDs = 00000000 | PC = 1 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000111
Time = 35000 | clk = 1 | reset = 1 | opcode = 1111 | LEDs = 00000000 | PC = 2 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000111
Time = 40000 | clk = 0 | reset = 1 | opcode = 1111 | LEDs = 00000000 | PC = 2 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000111
Time = 45000 | clk = 1 | reset = 1 | opcode = 0000 | LEDs = 00000000 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000111
Time = 50000 | clk = 1 | reset = 0 | opcode = 0000 | LEDs = 00000000 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000111
Time = 55000 | clk = 1 | reset = 1 | opcode = 1001 | LEDs = 00000011 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 11111111111111001
Time = 60000 | clk = 0 | reset = 1 | opcode = 1001 | LEDs = 00000011 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 11111111111111001
Time = 65000 | clk = 1 | reset = 1 | opcode = 0001 | LEDs = 00000011 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 70000 | clk = 0 | reset = 1 | opcode = 0001 | LEDs = 00000011 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 75000 | clk = 1 | reset = 1 | opcode = 0100 | LEDs = 00000011 | PC = 6 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 80000 | clk = 0 | reset = 1 | opcode = 0100 | LEDs = 00000011 | PC = 6 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 85000 | clk = 1 | reset = 1 | opcode = 1110 | LEDs = 00000011 | PC = 7 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 90000 | clk = 0 | reset = 1 | opcode = 1110 | LEDs = 00000011 | PC = 7 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 95000 | clk = 1 | reset = 1 | opcode = 0000 | LEDs = 00000011 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000110
Time = 100000 | clk = 0 | reset = 1 | opcode = 0000 | LEDs = 00000011 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000110
Time = 105000 | clk = 1 | reset = 1 | opcode = 1001 | LEDs = 00001100 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 11111111111111010
Time = 110000 | clk = 0 | reset = 1 | opcode = 1001 | LEDs = 00001100 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 11111111111111010
Time = 115000 | clk = 1 | reset = 1 | opcode = 0001 | LEDs = 00001100 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000001
Time = 120000 | clk = 0 | reset = 1 | opcode = 0001 | LEDs = 00001100 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000001
Time = 125000 | clk = 1 | reset = 1 | opcode = 0100 | LEDs = 00001100 | PC = 6 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 130000 | clk = 0 | reset = 1 | opcode = 0100 | LEDs = 00001100 | PC = 6 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 135000 | clk = 1 | reset = 1 | opcode = 1110 | LEDs = 00001100 | PC = 7 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 140000 | clk = 0 | reset = 1 | opcode = 1110 | LEDs = 00001100 | PC = 7 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 145000 | clk = 1 | reset = 1 | opcode = 0000 | LEDs = 00001100 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 150000 | clk = 0 | reset = 1 | opcode = 0000 | LEDs = 00001100 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 155000 | clk = 1 | reset = 1 | opcode = 1001 | LEDs = 00001100 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 1111111111111101
Time = 160000 | clk = 0 | reset = 1 | opcode = 1001 | LEDs = 00001100 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 1111111111111101
Time = 165000 | clk = 1 | reset = 1 | opcode = 0000 | LEDs = 00001100 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000001
Time = 170000 | clk = 0 | reset = 1 | opcode = 0001 | LEDs = 00011100 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 175000 | clk = 1 | reset = 1 | opcode = 0100 | LEDs = 00011100 | PC = 6 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 180000 | clk = 0 | reset = 1 | opcode = 0100 | LEDs = 00011100 | PC = 6 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000100
Time = 185000 | clk = 1 | reset = 1 | opcode = 1110 | LEDs = 00011100 | PC = 7 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 190000 | clk = 0 | reset = 1 | opcode = 1110 | LEDs = 00011100 | PC = 7 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 195000 | clk = 1 | reset = 1 | opcode = 0000 | LEDs = 00011100 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 200000 | clk = 0 | reset = 1 | opcode = 0000 | LEDs = 00011100 | PC = 3 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000000
Time = 205000 | clk = 1 | reset = 1 | opcode = 1001 | LEDs = 00011100 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 1111111111111100
Time = 210000 | clk = 0 | reset = 1 | opcode = 1001 | LEDs = 00011100 | PC = 4 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 1111111111111100
Time = 215000 | clk = 1 | reset = 1 | opcode = 0001 | LEDs = 00011100 | PC = 5 | R1 = 0000000000000000 | R2 = 0000000000000000 | R5 = 0000000000000000 | R15 = 0000000000000000 | addr = 0000000000000001
```

The results from the test bench shows that the CPU is functioning as expected. The output can be observed from LEDs which shows the binary number being shifted to the left one bit at a time. This testbench code is based on the ROM program saved in the design and the operation of the program can only be changed by changing the values in ROM. The operation of the CPU is based on the opcode obtained from the 16-bit ROM value and the controller and datapath will change outputs based on the input from the ROM value. All signals were displayed in the testbench to observe the changes in any signals which aids in troubleshooting the CPU functions.

Sample output from the FPGA's LEDs are as shown below for the first ROM program.



Figure 7.0: ROM Program 1 from FPGA at the start of the program



Figure 8.0: ROM Program 1 from FPGA after moving to the left

The speed of which the LED shifts to the left can be edited from the clock divider code as shown below. The first ROM program was tested when the clk divider output was set to bit 22. The speed of the LED was then increased by four times by changing the output bit to be 18. The output was observed to shift significantly faster.

```
// Creating a slow clock from 50 MHz DE0-Nano clock
module clk_divider (
    input wire clk, // 50MHz input clock output wire clk_slow // LED output
    output wire clk_slow //led output
);
// create a binary counter
reg [31:0] cnt; // 32-bit counter
initial begin
    cnt <= 32'h00000000; // start at zero
end
always @ (posedge clk) begin
    cnt <= cnt + 1; // count up
end
//assign LED to 25th bit of the counter to blink the LED at a few Hz
assign clk_slow = cnt[22];
endmodule
```

The CPU design was then tested with ROM 2 program to ensure the system functions as expected. The new ROM program shifts the LED to the left before shifting to the right. ROM 2 program is as shown below:

```
// initialize the rom to a known assembler program
// for byte-addressable big-endian memory
initial
begin
    rom[0] = 16'b1111_0000_0101_0001; // addi r0, r5 , 0001 //r5=1, for decrementing r1
    rom[1] = 16'b1000_0000_1110_xxxx; // not r14, r0 , xxxx //r14=0xffff, for LED display
    rom[2] = 16'b1111_0000_0010_0111; // addi r0, r2 , 0111 //r2=111, for LED display
    rom[3] = 16'b1111_0000_0001_0111; // addi r0, r1 , 0111 //r1=7, count=7
    rom[4] = 16'b0011_1110_0010_1111; // and r14, r2 , r15 //r15=r2 write to LED port using AND
    rom[5] = 16'b0000_0000_0000_0000; // sw      //
    rom[6] = 16'b1001_0000_0001_0100; // beq r0, r1 , 4      //if (r1==0), goto 2
    rom[7] = 16'b0001_0001_0101_0001; // sub r1, r5, r1      //r1--
    rom[8] = 16'b0100_0010_0010_0001; // shl r2, r2, 1      //r2<<1
    rom[9] = 16'b1110_0000_0000_0100; // jmp 4          //goto 2
    rom[10] = 16'b0000_0000_0000_0000; // unused
    rom[11] = 16'b1111_0000_0011_0110; // addi r0, r3, 0110 //r3=6, count = 7
    rom[12] = 16'b0010_0000_0010_1111; // or r0, r2, r15      // write to LED port using OR
    rom[13] = 16'b1011_0001_0011_0011; // bgt r1, r3, 3
    rom[14] = 16'b0000_0001_0101_0001; // add r1, r5, r1      // r1++
    rom[15] = 16'b0101_0010_0010_0001; // shr r2, r2, 1      //r2>>1
    rom[16] = 16'b1110_0000_0000_1100; // jmp 12
    rom[17] = 16'b1110_0000_0000_0011; // jmp 3
    rom[18] = 16'b0000_0000_0000_0000; // unused
end
```

The code was then deployed on the FPGA and the LEDs are observed to move to the left before shifting to the right as shown below:



Figure 9.0: FPGA LEDs with ROM 2 Program at the start



Figure 10.0: FPGA LEDs with ROM 2 Program as it shifts left



Figure 11.0: FPGA LEDs with ROM 2 program as it shifts to the end on the left



Figure 12.0: FPGA LEDs with ROM 2 Program as it shifts back to the right

The output observed for both ROM 1 and ROM 2 programs demonstrates the functionality of each sub-module in the CPU which enables the operations as defined by the opcode to be executed successfully.

## 4.2 Arithmetic Logic Unit (ALU)

The design of ALU was tested with the testbench below:

```

1 `timescale 1ns / 1ps
2
3 module ALU_tb;
4   // Inputs
5   reg [15:0] rf_output_rs_to_alu_input;
6   reg [15:0] rf_output_rt_to_alu_input;
7   reg [3:0] imem_output_to_rf_input_opcode;
8
9   // Outputs
10  wire [15:0] alu_output_to_rf_input;
11  wire Zero, Negative, Overflow;
12
13  // Signed versions of inputs and outputs
14  reg signed [15:0] signed_rs, signed_rt, signed_result;
15
16  // Instantiate the ALU module
17  ALU uut (
18    .rf_output_rs_to_alu_input(rf_output_rs_to_alu_input),
19    .rf_output_rt_to_alu_input(rf_output_rt_to_alu_input),
20    .imem_output_to_rf_input_opcode(imem_output_to_rf_input_opcode),
21    .alu_output_to_rf_input(alu_output_to_rf_input),
22    .Zero(Zero),
23    .Negative(Negative),
24    .Overflow(Overflow)
25  );
26
27  // Continuous assignments for signed versions
28  always @(*) begin
29    signed_rs = rf_output_rs_to_alu_input;
30    signed_rt = rf_output_rt_to_alu_input;
31    signed_result = alu_output_to_rf_input;
32  end
33
34  // Test sequence
35  initial begin
36    $monitor("Time = %05t | rs = %b (%d) | rt = %b (%d) | Opcode = %b | Result = %b (%d) | Zero = %b | Negative = %b | Overflow = %b",
37      $time, rf_output_rs_to_alu_input, signed_rs,
38      rf_output_rt_to_alu_input, signed_rt,
39      imem_output_to_rf_input_opcode,
40      alu_output_to_rf_input, signed_result,
41      Zero, Negative, Overflow);
42
43    // Test ADD (Overflow case)
44    rf_output_rs_to_alu_input = 16'hFFFF;
45    rf_output_rt_to_alu_input = 16'h0001;
46    imem_output_to_rf_input_opcode = 4'b0000; #10;
47
48    // Test SUB (Overflow case)
49    rf_output_rs_to_alu_input = 16'h8000;
50    rf_output_rt_to_alu_input = 16'h0001;
51    imem_output_to_rf_input_opcode = 4'b0001; #10;
52
53    // Test OR
54    rf_output_rs_to_alu_input = 16'h00F0;
55    rf_output_rt_to_alu_input = 16'h000F;
56    imem_output_to_rf_input_opcode = 4'b0010; #10;
57
58    // Test AND
59    rf_output_rs_to_alu_input = 16'hFFFF;
60    rf_output_rt_to_alu_input = 16'h0F0F;
61    imem_output_to_rf_input_opcode = 4'b0011; #10;
62
63    // Test Shift Left
64    rf_output_rs_to_alu_input = 16'h0001;
65    rf_output_rt_to_alu_input = 16'h0004;
66    imem_output_to_rf_input_opcode = 4'b0100; #10;
67
68    // Test Shift Right
69    rf_output_rs_to_alu_input = 16'h8000;
70    rf_output_rt_to_alu_input = 16'h0004;
71    imem_output_to_rf_input_opcode = 4'b0101; #10;
72
73    // Test Rotate Left
74    rf_output_rs_to_alu_input = 16'h8001;
75    imem_output_to_rf_input_opcode = 4'b0110; #10;
76
77    // Test Rotate Right
78    rf_output_rs_to_alu_input = 16'h8001;
79    imem_output_to_rf_input_opcode = 4'b0111; #10;
80
81    // Test NOT
82    rf_output_rs_to_alu_input = 16'hFFFF;
83    imem_output_to_rf_input_opcode = 4'b1000; #10;
84
85    // Test ADDI (Sign-extended immediate addition)
86    rf_output_rs_to_alu_input = 16'h0005;
87    rf_output_rt_to_alu_input = 16'h000F;
88    imem_output_to_rf_input_opcode = 4'b1111; #10;
89
90    // Extra Tests:
91    // Test ADD (Negative result case)
92    rf_output_rs_to_alu_input = 16'b0000000000000000;
93    rf_output_rt_to_alu_input = 16'b1000000000000000;
94    imem_output_to_rf_input_opcode = 4'b0000; #10;
95
96    // Test SUB (Negative result case)
97    rf_output_rs_to_alu_input = 16'b1111111111111111;
98    rf_output_rt_to_alu_input = 16'b0111111111111111;
99    imem_output_to_rf_input_opcode = 4'b0001; #10;
100
101   $finish;
102 end
103 endmodule
104

```

The ALU was tested using a custom testbench as shown above, which ran through all instruction types defined under the R-type and S-type categories, including arithmetic, logical, shift, rotate, and immediate operations. The results from the simulation are shown below:

```

Time = 00000 | rs = 0111111111111111 (- 32767) | rt = 0000000000000001 (    1) | Opcode = 0000 | Result = 1000000000000000 (-32768) | Zero = 0 | Negative = 1 | Overflow = 1
Time = 10000 | rs = 1000000000000000 (-32768) | rt = 0000000000000001 (    1) | Opcode = 0001 | Result = 0111111111111111 (- 32767) | Zero = 0 | Negative = 0 | Overflow = 1
Time = 20000 | rs = 0000000000000000 ( 240) | rt = 0000000000000011 (   15) | Opcode = 0010 | Result = 0000000001111111 (- 255) | Zero = 0 | Negative = 0 | Overflow = 0
Time = 30000 | rs = 1111111111111111 (- -1) | rt = 00000111000001111 ( 3855) | Opcode = 0011 | Result = 00001111000001111 (- 3855) | Zero = 0 | Negative = 0 | Overflow = 0
Time = 40000 | rs = 0000000000000001 (    1) | rt = 0000000000000000 (    0) | Opcode = 0100 | Result = 0000000000000000 (    0) | Zero = 0 | Negative = 0 | Overflow = 0
Time = 50000 | rs = 1000000000000000 (-32768) | rt = 0000000000000000 (    0) | Opcode = 0101 | Result = 1111100000000000 (-2048) | Zero = 0 | Negative = 1 | Overflow = 0
Time = 60000 | rs = 1000000000000001 (-32767) | rt = 0000000000000000 (    0) | Opcode = 0110 | Result = 0000000000000001 (-    3) | Zero = 0 | Negative = 0 | Overflow = 0
Time = 70000 | rs = 1000000000000001 (-32767) | rt = 0000000000000000 (    0) | Opcode = 0111 | Result = 1100000000000000 (-16384) | Zero = 0 | Negative = 1 | Overflow = 0
Time = 80000 | rs = 1111111111111111 (- -1) | rt = 0000000000000000 (    0) | Opcode = 1000 | Result = 0000000000000000 (    0) | Zero = 0 | Negative = 0 | Overflow = 0
Time = 90000 | rs = 000000000000000101 (    5) | rt = 0000000000000001111 (   15) | Opcode = 1100 | Result = 0000000000000000 (    0) | Zero = 0 | Negative = 0 | Overflow = 0
Time = 100000 | rs = 0000000000000000 (    0) | rt = 1000000000000000 (-32768) | Opcode = 0000 | Result = 1000000000000000 (-32768) | Zero = 0 | Negative = 1 | Overflow = 0
Time = 110000 | rs = 1111111111111111 (- -1) | rt = 0111111111111111 (- 32767) | Opcode = 0001 | Result = 1000000000000000 (-32768) | Zero = 0 | Negative = 1 | Overflow = 0
testbench.sv:101: $finish called at 120000 (1ps)
Done

```

From the testbench output, we can confirm that the ALU performs correctly across a variety of scenarios:

#### 1. ADD and SUB (with overflow):

- For ADD:  $32767 + 1$  correctly triggered overflow since the result  $(-32768)$  exceeded the max positive 16-bit signed value.
- For SUB:  $-32768 - 1$  resulted in  $32767$ , causing an overflow which is the correct behaviour for two's complement.

#### 2. Logical operations (OR, AND, NOT):

- These behaved exactly as expected. OR and AND produced the correct bitwise results, as  $240 \text{ OR } 15$  gave  $255$ , and  $-1 \text{ AND } 3855$  returned  $3855$ .
- NOT  $-1$  gave  $0$ , as expected for bitwise inversion of all 1s.

#### 3. Shift operations:

- SHL ( $1 << 4$ ) gave  $16$ , which is correct for arithmetic left shift.
- SHR of  $-32768$  by  $4$  produced  $-2048$  which confirmed that the arithmetic right shift preserved the sign bit.

#### 4. Rotate operations:

- ROL of  $-32767$  gave  $-3$ , showing the MSB was rotated into LSB.
- ROR of the same value gave  $-16384$ , which matches the expected bit wrap-around.

#### 5. ADDI (sign-extended immediate):

- $5 + 15$  gave  $4$ , because the 4-bit immediate  $1111$  was sign-extended to  $-1$ . This confirms that sign-extension logic worked properly.

#### 6. Edge case testing:

- $0 + -32768$  resulted in  $-32768$ , and the Negative flag was correctly set.
- $-1 - 32767$  gave  $-32768$ , verifying Negative flag edge case.

Status flags were set exactly as they should be for each operation:

- Zero flag = 1 only when the result was zero.
- Negative flag = 1 for negative results (MSB = 1).
- Overflow flag = 1 only in overflow scenarios, it did not trigger falsely on logic or shift ops.

Overall, the ALU is functioning reliably under all tested conditions. It supports the full range of required operations and handles edge cases like signed overflow and immediate sign-extension correctly. The testbench verified both functional correctness and flag logic for integration with branching and control flow instructions.

### 4.3 Instruction Memory (IMEM)

The design of IMEM above was tested with the testbench as follows:

```

module testbench;

// Declare the signals that connect to the IMEM module
reg [15:0] pc_output_to_imem_input;           // 16-bit address input
reg OE;                                         // Output Enable signal
wire [15:0] imem_output_to_rf_input;           // 16-bit data output

// Instantiate the IMEM module (Unit Under Test)
IMEM uut (
    .pc_output_to_imem_input(pc_output_to_imem_input), // Connect the addr input to the testbench signal
    .OE(OE),                                         // Connect the OE input to the testbench signal
    .imem_output_to_rf_input(imem_output_to_rf_input)   // Connect the data_out output to the testbench signal
);

// Initial block for test stimulus
initial begin
    // Display header
    $display("Time\taddr\t\t\t\tOE\tdata_out");

    // Initialize the simulation signals
    pc_output_to_imem_input = 16'b0;      // Start with address 0
    OE = 0;                            // Initially disable Output Enable (OE = 0)

    // Apply some test cases
    #10; // Wait for 10 time units

    // Test Case 1: OE = 1, address 0 (Expect ROM[0] and ROM[1] combined as big-endian)
    OE = 1; // Enable output
    pc_output_to_imem_input = 16'b0000000000000000; // Address 0
    #10; // Wait for 10 time units
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 2: OE = 1, address 1 (Expect ROM[1] and ROM[2] combined as big-endian)
    pc_output_to_imem_input = 16'b0000000000000001; // Address 1
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 3: OE = 1, address 2 (Expect ROM[2] and ROM[3] combined as big-endian)
    pc_output_to_imem_input = 16'b0000000000000010; // Address 2
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 4: OE = 0, address 3 (OE disabled, no valid output)
    OE = 0; // Disable output
    pc_output_to_imem_input = 16'b0000000000000011; // Address 3
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 5: OE = 1, address 4 (Expect ROM[4] and ROM[5] combined as big-endian)
    OE = 1; // Enable output again
    pc_output_to_imem_input = 16'b00000000000000100; // Address 4
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 6: OE = 1, address 5 (Expect ROM[5] and ROM[6] combined as big-endian)
    pc_output_to_imem_input = 16'b00000000000000101; // Address 5
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 7: OE = 1, address 6 (Expect ROM[6] and ROM[7] combined as big-endian)
    pc_output_to_imem_input = 16'b00000000000000110; // Address 6
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // Test Case 8: OE = 0, address 7 (OE disabled, no valid output)
    OE = 0; // Disable output again
    pc_output_to_imem_input = 16'b00000000000000111; // Address 7
    #10;
    $display("%0t\t%b\t%b\t%b", $time, pc_output_to_imem_input, OE, imem_output_to_rf_input);

    // End of the simulation
    $finish;
end

// Monitor the data_out for debugging purposes (optional)
/*initial begin
    $monitor("At time %t, addr = %h, OE = %b, data_out = %h", $time, addr, OE, data_out);
end*/
endmodule

```

The test results of the testbench are as shown below:

| Time                                               | addr                 | OE | data_out          |
|----------------------------------------------------|----------------------|----|-------------------|
| 20                                                 | 00000000000000000000 | 1  | 1010000101100010  |
| 30                                                 | 00000000000000000001 | 1  | 1100001111010100  |
| 40                                                 | 00000000000000000010 | 1  | 0101111101101110  |
| 50                                                 | 00000000000000000011 | 0  | zzzzzzzzzzzzzzzzz |
| 60                                                 | 0000000000000000100  | 1  | 1010000101100010  |
| 70                                                 | 0000000000000000101  | 1  | 1100001111010100  |
| 80                                                 | 0000000000000000110  | 1  | 0101111101101110  |
| 90                                                 | 0000000000000000111  | 0  | zzzzzzzzzzzzzzzzz |
| \$finish called from file "testbench.sv", Line 72. |                      |    |                   |
| \$finish at simulation time 90                     |                      |    |                   |

This testbench for ROM operates based on setting the output enabler (OE) to 1 to allow the 16-bit ROM value to be read at the output pins, while OE to 0 which has no output at the pins. The ROM function was as designed as in test case 4 and 8 where OE=0 the output data\_out is high impedance of z. Each test cases increments the source address (addr) by 1 bit to observe the values stores in the ROM.

## 4.4 Data Memory (DMEM)

The proposed design above is verified through the testbench shown below:

```
1 module dmem_tb();
2   // Testbench signals
3   reg clk;
4   reg [15:0] alu_output_to_dmem_addr;
5   reg [15:0] rf_output_rt_to_alu_input;
6   reg RW_;
7   reg CS;
8   wire [15:0] dmem_output_to_rf_input;
9
10  // Instantiate the DMEM module
11  dmem dut (
12    .clk(clk),
13    .alu_output_to_dmem_addr(alu_output_to_dmem_addr),
14    .rf_output_rt_to_alu_input(rf_output_rt_to_alu_input),
15    .RW_(RW_),
16    .CS(CS),
17    .dmem_output_to_rf_input(dmem_output_to_rf_input)
18  );
19
20  // Clock generation
21  initial begin
22    clk = 0;
23    forever #5 clk = ~clk; // 100MHz clock (10ns period)
24  end
25
26  // Test stimulus
27  initial begin
28    // Initialize signals
29    alu_output_to_dmem_addr = 16'h0000;
30    rf_output_rt_to_alu_input = 16'h0000;
31    RW_ = 1'b1;
32    CS = 1'b0;
33
34    // Wait for 100ns to ensure stable initialization
35    #100;
36
37    // Test Case 1: Write to address 0x0000
38    $display("Test Case 1: Write to address 0x0000");
39    CS = 1'b1;
40    RW_ = 1'b0;
41    alu_output_to_dmem_addr = 16'h0000;
42    rf_output_rt_to_alu_input = 16'hABCD;
43    @(posedge clk);
44    #1; // Wait for write to complete
45
46    // Test Case 2: Read from address 0x0000
47    $display("Test Case 2: Read from address 0x0000");
48    RW_ = 1'b1;
49    #5; // Wait for read to stabilize
50    if (dmem_output_to_rf_input === 16'hABCD)
51      $display("PASS: Read data matches written data");
52    else
53      $display("FAIL: Read data %h doesn't match written data %h", dmem_output_to_rf_input, 16'hABCD);
54
```

```

55 // Test Case 3: Write to address 0x0002
56 $display("\nTest Case 3: Write to address 0x0002");
57 RW_ = 1'b0;
58 alu_output_to_dmem_addr = 16'h0002;
59 rf_output_rt_to_alu_input = 16'h1234;
60 @(posedge clk);
61 #1;
62
63 // Test Case 4: Read from address 0x0002
64 $display("Test Case 4: Read from address 0x0002");
65 RW_ = 1'b1;
66 #5;
67 if (dmem_output_to_rf_input === 16'h1234)
68     $display("PASS: Read data matches written data");
69 else
70     $display("FAIL: Read data %h doesn't match written data %h", dmem_output_to_rf_input, 16'h1234);
71
72 // Test Case 5: Test chip select (CS) functionality
73 $display("\nTest Case 5: Testing chip select");
74 CS = 1'b0;
75 #5;
76 if (dmem_output_to_rf_input === 16'bz)
77     $display("PASS: Output is high impedance when CS=0");
78 else
79     $display("FAIL: Output is not high impedance when CS=0");
80
81 // Test Case 6: Test out-of-range address
82 $display("\nTest Case 6: Testing out-of-range address");
83 CS = 1'b1;
84 alu_output_to_dmem_addr = 16'h0400; // Beyond 1KB
85 #5;
86 if (dmem_output_to_rf_input === 16'h0000)
87     $display("PASS: Out-of-range address returns zero");
88 else
89     $display("FAIL: Out-of-range address doesn't return zero");
90
91 // Test Case 7: Test consecutive memory locations
92 $display("\nTest Case 7: Testing consecutive memory locations");
93 // Write to consecutive addresses
94 RW_ = 1'b0;
95 alu_output_to_dmem_addr = 16'h0004;
96 rf_output_rt_to_alu_input = 16'h5678;
97 @(posedge clk);
98 #1;
99 alu_output_to_dmem_addr = 16'h0006;
100 rf_output_rt_to_alu_input = 16'h9ABC;
101 @(posedge clk);
102 #1;

104 // Read back and verify
105 RW_ = 1'b1;
106 alu_output_to_dmem_addr = 16'h0004;
107 #5;
108 if (dmem_output_to_rf_input === 16'h5678)
109     $display("PASS: First consecutive read correct");
110 else
111     $display("FAIL: First consecutive read incorrect");
112
113 alu_output_to_dmem_addr = 16'h0006;
114 #5;
115 if (dmem_output_to_rf_input === 16'h9ABC)
116     $display("PASS: Second consecutive read correct");
117 else
118     $display("FAIL: Second consecutive read incorrect");
119
120 // End simulation
121 #100;
122 $display("\nTestbench completed");
123 $finish;
124 end
125
126 // Optional: Generate VCD file for waveform viewing
127 // initial begin
128 //     $dumpfile("dmem_tb.vcd");
129 //     $dumpvars(0, dmem_tb);
130 // end
131
132 endmodule

```

The results of testbench are shown as below:

```
design.sv:24: warning: @* is sensitive to all 1024 words in array 'memory'.
design.sv:24: warning: @* is sensitive to all 1024 words in array 'memory'.
Test Case 1: Write to address 0x0000
Test Case 2: Read from address 0x0000
PASS: Read data matches written data

Test Case 3: Write to address 0x0002
Test Case 4: Read from address 0x0002
PASS: Read data matches written data

Test Case 5: Testing chip select
PASS: Output is high impedance when CS=0

Test Case 6: Testing out-of-range address
PASS: Out-of-range address returns zero

Test Case 7: Testing consecutive memory locations
PASS: First consecutive read correct
PASS: Second consecutive read correct

Testbench completed
testbench.sv:123: $finish called at 256 (1s)
Done
```

A thorough testbench that replicated different memory operations was used to confirm the DMEM (Data Memory) module's functionality. In the first test, the value 0xABCD was written to address 0x0000, and then the same address was read. The outcome demonstrated that the memory writes and reads processes were operating as intended as the data read matched the value that had been previously written. In the second test scenario, an identical operation was performed with address 0x0002 and value 0x1234, which likewise yielded the right data when read. These results demonstrate that the memory can reliably store and retrieve information at particular addresses.

Apart from the fundamental read/write capabilities, the chip select (CS) signal was examined. The output of the memory module appropriately reached a high impedance (Z) condition when the CS was deasserted (set to 0). In order to prevent the memory from interfering with other data bus components while it is not selected, this behaviour is necessary. Memory protection against out-of-bound access was another crucial feature that was examined. The output returned a default value of 0x0000 when an invalid address (0x0400) exceeding the

1KB memory limit was accessed. This shows that the memory module manages illegal memory access correctly and prevents unexpected behaviour.

Lastly, the testbench confirmed that successive memory regions were operating as intended. Addresses 0x0004 and 0x0006 received data values 0x5678 and 0x9ABC, respectively. The memory module enables sequential memory operations without any data overlap or corruption, as demonstrated by the accurate values returned by subsequent read operations from these places.

Overall, the DMEM module successfully completed every test scenario. It showed dependable handling of out-of-range addresses, efficient isolation via chip select control, accurate data storage and retrieval, and appropriate support for sequential memory access. These findings support the module's accuracy and dependability, allowing it to be incorporated into the processor's overall architecture.

#### **4.5 Register File (RF)**

The design of RF above was tested using the following testbench to validate the module functions.

```

module tb_RF;
    // Declare testbench signals
    reg [3:0] imem_output_to_rf_input_rs;
    reg [3:0] imem_output_to_rf_input_rt;
    reg [3:0] imem_output_to_rf_input_rd;           // Register select inputs (4-bit)
    reg [15:0] alu_output_to_rf_input;              // Data input (16-bit)
    reg write_en, rst, clk;                         // Write enable, reset, clock
    wire [15:0] rf_output_rs_to_alu_input;
    wire [15:0] rf_output_rt_to_alu_input;          // Register file outputs (16-bit)
    wire [7:0] leds;                               // LED output from r15[7:0] (8-bit)

    // Instantiate the RF module (the unit under test)
    RF uut (
        .imem_output_to_rf_input_rs(imem_output_to_rf_input_rs),
        .imem_output_to_rf_input_rt(imem_output_to_rf_input_rt),
        .imem_output_to_rf_input_rd(imem_output_to_rf_input_rd),
        .alu_output_to_rf_input(alu_output_to_rf_input),
        .write_en(write_en),
        .rst(rst),
        .clk(clk),
        .rf_output_rs_to_alu_input(rf_output_rs_to_alu_input),
        .rf_output_rt_to_alu_input(rf_output_rt_to_alu_input),
        .leds(leds)
    );
    // Clock generation: 50 MHz clock
    always begin
        #10 clk = ~clk; // Toggle clock every 10 time units
    end

    // Initial block to set up the test
    initial begin
        // Initialize signals
        clk = 0;
        rst = 0;
        write_en = 0;
        imem_output_to_rf_input_rs = 0;
        imem_output_to_rf_input_rt = 0;
        imem_output_to_rf_input_rd = 0;
        alu_output_to_rf_input = 0;

        // Apply reset
        #5 rst = 1; // Assert reset
        #10 rst = 0; // Deassert reset
        $display("Testbench reset complete");

        // Test 1: Write to a normal register (e.g., r1)
        #10;
        write_en = 1; // Enable write
        imem_output_to_rf_input_rd = 4'b0001; // Select register r1
        alu_output_to_rf_input = 16'b0001001000110100; // Data to write (binary)
        #10; // Wait for one clock cycle
        write_en = 0; // Disable write
        $display("Test 1: Write to r1 complete");
    end

```

```

// Test 2: Read from r1
#10;
imem_output_to_rf_input_rs = 4'b0001;           // Select register r1
imem_output_to_rf_input_rt = 4'b0001;           // Select register r0 (for testing)
#10;
// Expected: rs_data_out = 16'b00001001000110100, rt_data_out = 16'b0000000000000000 (r0 is always 0)
$display("Test 2: Read from r1 complete");

// Test 3: Write to r0 (should not change)
#10;
write_en = 1;
imem_output_to_rf_input_rd = 4'b0000;           // Select register r0
alu_output_to_rf_input = 16'b1010101010101010; // Data to write (binary)
#10;
write_en = 0;        // Disable write
imem_output_to_rf_input_rs = 4'b0000;           // Read from r0
#10;
// Expected: rs_data_out = 16'b0000000000000000 (r0 should always return 0)
$display("Test 3: Write to r0 complete");

// Test 4: Write to r15 (affects LEDs)
#10;
write_en = 1;
imem_output_to_rf_input_rd = 4'b1111;           // Select register r15
alu_output_to_rf_input = 16'b0000000000001111; // Data to write (binary) (LEDs should be 8'b00001111)
#10;
//write_en = 0;// Wait for write to complete
#10
imem_output_to_rf_input_rs = 4'b1111;           // Read from r15 (check LEDs)
imem_output_to_rf_input_rt = 4'b1111;
#10;                                            // Wait for LED read
write_en = 0;

// Expected: leds = 8'b11111111 (r15[7:0] should control LEDs)
$display("Test 4: Write to r15 complete. LEDs should be 00001111.");

// Test 5: Check LEDs with a different value in r15
#10;
write_en = 1;
imem_output_to_rf_input_rd = 4'b1111;           // Select register r15
alu_output_to_rf_input = 16'b0001001100100100; // Data to write (binary)
#10;
write_en = 0;
imem_output_to_rf_input_rs = 4'b1111;           // Read from r15 (check LEDs again)
#10;
// Expected: leds = 8'b000110100 (r15[7:0] = 8'b000110100, controlling LEDs)
$display("Test 5: LEDs in r15 updated to 00110100.");

// Test 6: Apply reset and verify registers
#10;
rst = 1; // Assert reset
#10;
rst = 0; // Deassert reset
#10;
imem_output_to_rf_input_rs = 4'b0001; // Read from r1
imem_output_to_rf_input_rs = 4'b1111;
#10;
// Expected: rs_data_out = 16'b00001001000110100 (r1 should hold previous value)
imem_output_to_rf_input_rs = 4'b0000; // Read from r0
#10;
// Expected: rs_data_out = 16'b0000000000000000 (r0 should always return 0)
$display("Test 6: Reset applied, registers cleared except r0.");

// Test 7: Write to a register and check output
#10;
write_en = 1;
imem_output_to_rf_input_rd = 4'b0010;           // Select register r2
alu_output_to_rf_input = 16'b0101011001111000; // Data to write (binary)
#10;
write_en=0;
#10;
imem_output_to_rf_input_rs = 4'b0010;           // Read from r2
imem_output_to_rf_input_rt = 4'b0010;
#10;
// Expected: rs_data_out = 16'b0101011001111000 (r2 should hold written value)
$display("Test 7: Write to r2 complete.");

// End the simulation after some time
#10;
$finish;
end

// Monitor the output signals for debugging (displaying in binary)
initial begin
$monitor("At time %0t, \tdata_in=%b, rd=%b, rs = %b, rt = %b, rs_data_out = %b, rt_data_out = %b, leds = %b",
$time, alu_output_to_rf_input, imem_output_to_rf_input_rd, imem_output_to_rf_input_rs,
imem_output_to_rf_input_rt, rf_output_rs_to_alu_input, rf_output_rt_to_alu_input, leds);
end

//show wave form
// to show waveform
initial begin
$dumpfile("dump.vcd");
$dumpvars;
end

endmodule

```

The testbench results are as follows:

```
At time 0,      data_in=0000000000000000, rd=0000, rs = 0000, rt = 0000, rs_data_out = xxxxxxxxxxxxxxxx, rt_data_out = xxxxxxxxxxxxxxxx, leds = xxxxxxxx
At time 5,      data_in=0000000000000000, rd=0000, rs = 0000, rt = 0000, rs_data_out = 0000000000000000, rt_data_out = 0000000000000000, leds = 00000000
Testbench reset complete
At time 25,     data_in=0001001000110100, rd=0001, rs = 0000, rt = 0000, rs_data_out = 0000000000000000, rt_data_out = 0000000000000000, leds = 00000000
Test 1: Write to r1 complete
At time 45,     data_in=0001001000110100, rd=0001, rs = 0001, rt = 0001, rs_data_out = 0001001000110100, rt_data_out = 0001001000110100, leds = 00000000
Test 2: Read from r1 complete
At time 65,     data_in=1010101010101010, rd=0000, rs = 0001, rt = 0001, rs_data_out = 0001001000110100, rt_data_out = 0001001000110100, leds = 00000000
At time 75,     data_in=1010101010101010, rd=0000, rs = 0000, rt = 0001, rs_data_out = 0000000000000000, rt_data_out = 0001001000110100, leds = 00000000
Test 3: Write to r0 complete
At time 95,     data_in=0000000000001111, rd=1111, rs = 0000, rt = 0001, rs_data_out = 0000000000000000, rt_data_out = 0001001000110100, leds = 00000000
At time 110,    data_in=0000000000001111, rd=1111, rs = 0000, rt = 0001, rs_data_out = 0000000000000000, rt_data_out = 0001001000110100, leds = 00001111
At time 115,    data_in=0000000000001111, rd=1111, rs = 1111, rt = 1111, rs_data_out = 0000000000001111, rt_data_out = 0000000000001111, leds = 00001111
Test 4: Write to r15 complete. LEDs should be 00001111.
At time 135,   data_in=0001001100100100, rd=1111, rs = 1111, rt = 1111, rs_data_out = 0000000000001111, rt_data_out = 0000000000001111, leds = 00001111
Test 5: LEDs in r15 updated to 00110100.
At time 165,   data_in=0001001100100100, rd=1111, rs = 1111, rt = 1111, rs_data_out = 0000000000000000, rt_data_out = 0000000000000000, leds = 00000000
At time 195,   data_in=0001001100100100, rd=1111, rs = 0000, rt = 1111, rs_data_out = 0000000000000000, rt_data_out = 0000000000000000, leds = 00000000
Test 6: Reset applied, registers cleared except r0.
At time 215,   data_in=0101011001111000, rd=0010, rs = 0000, rt = 1111, rs_data_out = 0000000000000000, rt_data_out = 0000000000000000, leds = 00000000
At time 235,   data_in=0101011001111000, rd=0010, rs = 0010, rt = 0010, rs_data_out = 0000000000000000, rt_data_out = 0000000000000000, leds = 00000000
Test 7: Write to r2 complete.
$finish called from file "testbench.sv", line 136.
$finish at simulation time          255
```

When write\_en is set to 1, the RF writes the 16-bit input into the address rd. This is shown in Test case 1 where the data\_in was saved and successfully read at rs and rt in test case 2. Test case 3 test the module if writing to r0 was possible, where the result shows that r0 remains as 0 as it is a special register.

Test case 4 shows the operation of the RF module updating the values to LED successfully. Case 6 test the reset function of the module and shows that all values saved in the registers were cleared and set to zero. This shows the RF module functions are as designed according to the specifications given.

## 4.6 Program Counter (PC)

The proposed design above is verified through the testbench shown below:

```
1 module pc_tb;
2   reg pc_load, reset, clk;
3   reg [15:0] alu_output_to_rf_input;
4   wire [15:0] pc_output_to_imem_input;
5
6   // Instantiate the program counter module
7   pc DUT (pc_load, alu_output_to_rf_input, reset, clk, pc_output_to_imem_input);
8
9   // Clock generation (period = 20 time units)
10  always #10 clk = ~clk;
11
12  // Task to check expected output
13  task check_output(input [15:0] expected);
14    begin
15      #1; // Small delay to allow output to settle
16      if (pc_output_to_imem_input === expected)
17          $display("TEST PASSED at time %0t | Expected: %d | Got: %d", $time, expected, pc_output_to_imem_input);
18      else
19          $display("TEST FAILED at time %0t | Expected: %d | Got: %d", $time, expected, pc_output_to_imem_input);
20    end
21  endtask
22
23 initial begin
24   // Initialize signals
25   clk = 0; reset = 1; pc_load = 0;
26   alu_output_to_rf_input = 16'd5;
27
28   #15 reset = 0; // Release reset
29   #20 check_output(16'd0); // Expecting 0 after reset
30
31   // Test Case 1: Load value 5
32   #20 pc_load = 1;
33   #20 pc_load = 0;
34   #10 check_output(16'd5); // Expecting 5 after loading
35
36   // Test Case 2: Load new value 10
37   #20 alu_output_to_rf_input = 16'd10;
38   #20 pc_load = 1;
39   #20 pc_load = 0;
40   #10 check_output(16'd10); // Expecting 10 after loading
41
42   // Test Case 3: Reset PC
43   #20 reset = 1;
44   #20 reset = 0;
45   #10 check_output(16'd0); // Expecting 0 after reset
46
47   #20 $finish; // End simulation
48 end
49
50 // Monitor values
51 initial begin
52   $monitor("Time = %0t | Reset = %b | PC Load = %b | Input = %d | PC Output = %d",
53           $time, reset, pc_load, alu_output_to_rf_input, pc_output_to_imem_input);
54 end
55
56 endmodule
```

The results of testbench are shown as below:

```

Time = 0 | Reset = 1 | PC Load = 0 | Input =      5 | PC Output =      0
Time = 15 | Reset = 0 | PC Load = 0 | Input =      5 | PC Output =      0
TEST PASSED at time 36 | Expected:      0 | Got:      0
Time = 56 | Reset = 0 | PC Load = 1 | Input =      5 | PC Output =      0
Time = 70 | Reset = 0 | PC Load = 1 | Input =      5 | PC Output =      5
Time = 76 | Reset = 0 | PC Load = 0 | Input =      5 | PC Output =      5
TEST PASSED at time 87 | Expected:      5 | Got:      5
Time = 107 | Reset = 0 | PC Load = 0 | Input =     10 | PC Output =      5
Time = 127 | Reset = 0 | PC Load = 1 | Input =     10 | PC Output =      5
Time = 130 | Reset = 0 | PC Load = 1 | Input =     10 | PC Output =     10
Time = 147 | Reset = 0 | PC Load = 0 | Input =     10 | PC Output =     10
TEST PASSED at time 158 | Expected:     10 | Got:     10
Time = 178 | Reset = 1 | PC Load = 0 | Input =     10 | PC Output =      0
Time = 198 | Reset = 0 | PC Load = 0 | Input =     10 | PC Output =      0
TEST PASSED at time 209 | Expected:      0 | Got:      0
testbench.sv:47: $finish called at 229 (1s)

```

**Done**

The Program Counter (PC) module's functionality, including reset, loading, and output behavior, was extensively examined using a specialized testbench. The reset functionality was confirmed in the first section of the simulation. The counter was appropriately initialized at simulation time 0 since the reset signal was asserted and the PC output correctly stayed at 0. The output was examined and verified to still be 0 as anticipated following the deassert of the reset. This confirmed that the computer could properly initialize after being reset.

In the first test case, the PC was told to use the ALU's input to load a value of 5. The PC output appropriately updated to 5 once the pc\_load signal, which had been asserted to permit the loading, was deasserted. This showed that when the load was enabled, the PC was able to latch and retain the value supplied by the ALU input. The identical loading process was used for the second test case, but the input value was modified to 10. The PC successfully updated to the new value after asserting and deasserting the pc\_load signal, demonstrating that it can successfully overwrite its old value with a new input when necessary. In order to implement control flow instructions like leaps and branches during program execution, this functionality is essential. Finally, the final test case re-validated the reset functionality. The PC output instantly went back to 0 when the reset signal was asserted once more after loading the number 10. This demonstrated that, irrespective of the value that was previously saved, the PC consistently resets its state.

To conclude, every test case has passed, and during the simulation, the PC output matched the anticipated outputs. These findings confirm that the program counter is

functioning properly and reliably, enabling initialization, value updates, and resets as needed by a processor's instruction control logic.

## 4.7 Multiplexers (MUX)

The design of the MUXes was tested with the testbench below:

```

1 `timescale 1ns / 1ps
2
3 module mux_tb;
4 // Test signals for 4-bit MUX
5 reg [3:0] I0_4bit, I1_4bit;
6 reg S_4bit;
7 wire [3:0] Y_4bit;
8
9 // Test signals for 16-bit MUX
10 reg [15:0] I0_16bit, I1_16bit;
11 reg S_16bit;
12 wire [15:0] Y_16bit;
13
14 // Instantiate 4-bit MUX
15 mux_4bit_2to1 uut_4bit (
16     .I0(I0_4bit),
17     .I1(I1_4bit),
18     .S(S_4bit),
19     .Y(Y_4bit)
20 );
21
22 // Instantiate 16-bit MUX
23 mux_16bit_2to1 uut_16bit (
24     .I0(I0_16bit),
25     .I1(I1_16bit),
26     .S(S_16bit),
27     .Y(Y_16bit)
28 );
29
30 // Test sequence
31 initial begin
32     $monitor("%nTime = %05t\n4-bit MUX -> S = %b | I0 = %b | I1 = %b | Y = %b\n16-bit MUX -> S = %b | I0 = %h | I1 = %h | Y = %h\n",
33         $time, S_4bit, I0_4bit, I1_4bit, Y_4bit,
34         S_16bit, I0_16bit, I1_16bit, Y_16bit);
35
36     // Test case 1: S=0, should select I0
37     I0_4bit = 4'b0001; I1_4bit = 4'b1110; S_4bit = 0;
38     I0_16bit = 16'h1234; I1_16bit = 16'h4321; S_16bit = 0;
39     #10;
40
41     // Test case 2: S=1, should select I1
42     S_4bit = 1; S_16bit = 1;
43     #10;
44
45     // End simulation
46     $finish;
47 end
48 endmodule

```

The output of the testbench is as below:

```

Time = 00000
4-bit MUX -> S = 0 | I0 = 0001 | I1 = 1110 | Y = 0001
16-bit MUX -> S = 0 | I0 = 1234 | I1 = 4321 | Y = 1234

Time = 10000
4-bit MUX -> S = 1 | I0 = 0001 | I1 = 1110 | Y = 1110
16-bit MUX -> S = 1 | I0 = 1234 | I1 = 4321 | Y = 4321

testbench.sv:46: $finish called at 20000 (1ps)
Done

```

The 4-bit and 16-bit multiplexers were tested using a simple testbench to verify correct switching behaviour based on the select signal S. The test results, as shown above, confirm that both MUXes work correctly for all basic cases.

The simulation ran two test scenarios:

Case 1 ( $S = 0$ ): The output Y was equal to input I0 for both 4-bit and 16-bit MUXes.

- 4-bit:  $I0 = 0001, I1 = 1110 \rightarrow \text{Output} = 0001$
- 16-bit:  $I0 = 0x1234, I1 = 0x4321 \rightarrow \text{Output} = 0x1234$

Case 2 ( $S = 1$ ): The output Y correctly switched to input I1.

- 4-bit:  $I0 = 0001, I1 = 1110 \rightarrow \text{Output} = 1110$
- 16-bit:  $I0 = 0x1234, I1 = 0x4321 \rightarrow \text{Output} = 0x4321$

These results prove that the select line logic is functioning correctly and that there are no data corruption or switching delays in the combinational MUX blocks. The “always @(\*)” sensitivity ensures that changes in any of the inputs or the select line immediately reflect in the output without waiting for a clock cycle, which is expected for these types of control elements.

These MUXes are used throughout the Datapath for tasks such as selecting between rt and rd, choosing immediate vs register values, and determining whether the output to the Register File comes from the ALU or DMEM. With this test confirming their functionality, we can confidently rely on these MUXes to correctly handle all data flow control paths inside the CPU.

## CHAPTER 5: CONCLUSION & RECOMMENDATIONS

This project successfully met the objective of designing and implementing a working 16-bit single-cycle CPU based on the ADSD-RISC instruction set architecture. All core submodules, including the Arithmetic Logic Unit (ALU), Datapath, Controller, Instruction Memory (IMEM), Data Memory (DMEM), Register File (RF), Program Counter (PC), and Multiplexers (MUX), were developed individually, tested thoroughly, and then integrated into a complete system. The design was validated through both simulation and deployment on an FPGA board, where the CPU was able to perform instruction fetch, decode, execution, and memory operations correctly. The output from the test programs running on the FPGA's LEDs confirmed that the CPU executed control flow and data manipulation instructions as intended.

The modular approach used in this project made testing and debugging more manageable. Each module was designed to be reusable and easy to modify, which allowed for smoother integration and testing during system development. The ALU demonstrated correct behaviour for all supported instruction types, including accurate flag setting for Zero, Negative, and Overflow. The Datapath design ensured proper communication between modules while supporting all five instruction types in the ADSD-RISC ISA. The Controller generated the appropriate control signals in response to different opcodes, allowing the CPU to respond correctly to logic, arithmetic, branching, and memory operations.

For future improvements, one potential enhancement is to implement pipelining. This would increase the overall speed and throughput of the processor but would also introduce additional complexity, such as data and control hazard management. A hazard detection and forwarding unit could be added to deal with instruction dependencies more effectively. Another recommendation is to implement interrupt handling, which would make the CPU more suitable for embedded systems and real-time applications. Finally, enhancing the testbench with automated checking and waveform validation scripts would help streamline future debugging and regression testing.

In conclusion, this project provided valuable hands-on experience in digital design, Verilog programming, and CPU architecture. The team successfully built a functional processor capable of handling a wide range of instructions. The design is stable and provides a strong foundation for further extension or optimization in more advanced system-on-chip or pipelined processor designs.