

# **PROJECT REPORT**

## **32-Bit 5-Stage Pipelined MIPS Processor**

### **(Phase 1: RTL Architecture)**

**Author:** Mina S. ElHanash

Junior, Electronic and Communication Engineering

**Institution:** Ain Shams University, Faculty of Engineering

**Date:** February 2026

## Table of Contents

|                                                                       |    |
|-----------------------------------------------------------------------|----|
| ABSTRACT .....                                                        | 4  |
| 1.0 INTRODUCTION .....                                                | 5  |
| 1.1 Background on MIPS Architecture .....                             | 5  |
| 1.2 Project Objectives .....                                          | 6  |
| 2.0 PROJECT SPECIFICATIONS.....                                       | 7  |
| 3.0 DESIGN METHODOLOGY .....                                          | 10 |
| 3.1 Phase 1 of the Design: Single-Cycle Architecture Validation ..... | 10 |
| 3.1.1 Program counter (pc) next-state logic expansion .....           | 10 |
| 3.1.2 Decoupled data memory addressing .....                          | 11 |
| 3.1.3 Extended register file destination routing .....                | 12 |
| 3.2 Phase 2: Pipelined Architecture and Hazard Management.....        | 13 |
| 4.0 VERILOG CODES AND MODULE HIERARCHY .....                          | 14 |
| 4.1 Program Counter.....                                              | 14 |
| 4.2 MUX 2x1 32-Bits .....                                             | 15 |
| 4.3 Instruction Memory.....                                           | 16 |
| 4.4 Adder .....                                                       | 17 |
| 4.5 IF ID Stage Register .....                                        | 18 |
| 4.6 Register File .....                                               | 19 |
| 4.7 Sign Extender.....                                                | 20 |
| 4.8 Control Unit.....                                                 | 21 |
| 4.9 Hazard Detection Unit .....                                       | 23 |
| 4.10 ID EX Stage Register.....                                        | 24 |
| 4.11 ALU .....                                                        | 28 |
| 4.12 ALU Control Unit .....                                           | 29 |
| 4.13 MUX 4x1 32-Bits .....                                            | 30 |
| 4.14 MUX 4x1 5-Bits .....                                             | 30 |
| 4.15 Forwarding Unit.....                                             | 31 |
| 4.16 EX MEM Stage Register .....                                      | 32 |

|                                        |    |
|----------------------------------------|----|
| 4.17 Data Memory .....                 | 36 |
| 4.18 MEM WB Stage Register .....       | 37 |
| 4.19 Top Module .....                  | 39 |
| 5.0 SYNTHESIS AND IMPLEMENTATION ..... | 51 |
| 6.0 CONCLUSION.....                    | 56 |

## **ABSTRACT**

This report details the Phase 1 design, synthesis, and implementation of a 32-bit, 5-stage pipelined MIPS processor. The core was developed using Verilog HDL with a focus on structural microarchitecture, incorporating full hazard detection and data forwarding to resolve Read-After-Write (RAW) and control hazards. The design was synthesized using Xilinx Vivado targeting a Spartan-7 FPGA (xc7s50csga324-1). Post-implementation timing analysis confirms the processor successfully meets a 100 MHz clock constraint, achieving a Maximum Clock Frequency (Fmax) of 105.7 MHz. This completed RTL architecture establishes the foundation for Phase 2, which will focus on functional verification using SystemVerilog and the Universal Verification Methodology (UVM).

# 1.0 INTRODUCTION

## 1.1 Background on MIPS Architecture

The MIPS (Microprocessor without Interlocked Pipelined Stages) architecture is a cornerstone of modern digital design, widely recognized as one of the most elegant implementations of Reduced Instruction Set Computer (RISC) principles. Developed in the early 1980s, the MIPS ISA was designed with a strict load/store architecture, meaning all arithmetic and logical operations are performed exclusively on data held within the processor's internal registers, while memory is accessed solely through dedicated load and store instructions. This deliberate simplicity, combined with a fixed 32-bit instruction length, allows the hardware to decode and execute instructions with high efficiency.

To maximize instruction throughput, the MIPS architecture inherently supports pipelining. By dividing instruction execution into five distinct, sequential stages-Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB)-the processor can process multiple instructions concurrently. While pipelining dramatically increases the theoretical execution rate by ensuring the CPU completes one instruction per clock cycle under ideal conditions, it introduces significant microarchitectural challenges known as hazards.

Hazards occur when the pipeline must be stalled to prevent incorrect execution. **Structural hazards** arise from hardware resource conflicts, **data hazards** (such as Read-After-Write) occur when an instruction depends on the result of a preceding, uncompleted instruction, and **control hazards** are caused by branch instructions altering the flow of the program counter. A sophisticated MIPS implementation must move beyond basic interlocked stalling and incorporate dynamic hardware solutions, such as data forwarding and branch prediction, to maintain pipeline efficiency.

## 1.2 Project Objectives

The primary objective of this project is to architect, synthesize, and validate a fully functional 32-bit, 5-stage pipelined MIPS processor at the Register Transfer Level (RTL). Rather than relying on high-level behavioral modeling, this project focuses on structural, synthesizable microarchitecture, serving as a comprehensive exercise in digital logic design and physical hardware implementation.

The specific objectives of Phase 1 include:

- **Gate-Level RTL Implementation:** To develop a complete Verilog HDL codebase that accurately models the MIPS data path and control logic, strictly adhering to synthesizable coding guidelines.
- **Dynamic Hazard Resolution:** To implement an active Forwarding Unit capable of bypassing the register file to resolve Read-After-Write (RAW) data hazards dynamically. Additionally, to construct a Hazard Detection Unit that injects hardware bubbles (stalls) only when data forwarding is mathematically insufficient, such as during Load-Use hazards.
- **Physical Synthesis and Benchmarking:** To move the design beyond idealized software simulation by targeting a modern Xilinx Spartan-7 FPGA (xc7s50csga324-1). This includes analyzing the physical mapping of logic gates to extract real-world hardware utilization metrics (Slice LUTs and Registers) and performing static timing analysis to verify the processor achieves a minimum Maximum Clock Frequency (Fmax) of 100 MHz.
- **Establishing a Verification Baseline:** To create a mathematically sound and physically proven RTL core (Phase 1) that will serve as the Design Under Test (DUT) for Phase 2. The ultimate objective of this multi-phase project is to subject this synthesizable core to rigorous functional verification using SystemVerilog and the Universal Verification Methodology (UVM) to achieve complete code and functional coverage.

# 2.0 PROJECT SPECIFICATIONS

Course Project (Major Task)

Fall 2025

## Aim

This project aims to design, simulate, and synthesize a modified single-cycle and pipelined RISC-style processor in Verilog on an FPGA that supports a subset of the RISC instruction set. The project includes the architecture form, where the students will focus on developing a tailored architecture to perform a specific function.

The *processor* should support:

- the arithmetic and logic operations: **add, sub, and, or, andi, addi, slt**
- Memory-reference: **lw, sw**
- jumping and branching: **j, beq**
- In addition to three new instructions: **jmn, swi, PMC**

where the format and description of each instruction is as follows:

|        | 6-Bits | 5-Bits | 5-Bits | 5-Bits | 5-Bits    | 6-Bits |
|--------|--------|--------|--------|--------|-----------|--------|
| I-type | op     | rs     | rt     |        | immediate |        |
| J-type | op     | ---    |        |        | address   |        |
| R-type | op     | rs     | rt     | rd     | ---       | func   |

|                        |                                                                                                                                                                                                                                                                  |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>pmc rt, imm(rs)</b> | # Perform two operations:<br>➢ <b>PC = Memory[R[rt]]</b> ; set the value of PC to the loaded data from memory location <b>Memory[R[rt]]</b><br>➢ <b>Memory[R[rs] + imm] = PC + 4</b> ; store the new value of PC to a memory location <b>Memory[R[rs] + imm]</b> |
| <b>jmn imm(rs)</b>     | #Indirect Jump. PC = <b>Memory[imm + R[rs]]</b> , I-format instruction will cause the processor to jump to the address stored in the word at memory location <b>imm + R[rs]</b> .                                                                                |
| <b>swi rt, imm(rs)</b> | #I-format instruction will perform two operations:<br>➢ it stores the contents of <b>R[rt]</b> at the memory address <b>(R[rs] + imm)</b> then, increments <b>R[rs]</b> by the immediate <b>imm</b> .                                                            |

## Phase 1 — ALU & Register File

Design and implement the ALU and register file:

- ALU ops: add, sub, and, or, andi, addi, slt, maybe logical immediates.
- Register file: 32x32 registers, two read ports, one write port, synchronous write, async read.

## Deliverables

- RTL code (fully commented)
- Testbench and waveform screenshots

## Phase 2 — Single-Cycle Processor

Integrate ALU and register file into a single-cycle CPU (fetch → decode → execute → mem → writeback) that executes the proposed instruction subset.

### Specs

- Minimal ISA mentioned above.
- Memories: instruction memory (initial content from assembled test programs), data RAM (read/write).

### Verification

- Use assembly test programs saved in Imem or load from a hex file.
- Check expected memory and register states after program execution.
- Create self-checking testbenches that compare expected registers/memory.

### Synthesis

- Run the CAD tool to produce a flattened gate netlist.

### Deliverables

- Single-cycle RTL + testbenches, simulation traces, synthesized netlist.
- Synthesis and timing reports describing datapath and control signals.

## Phase 3 — Pipelined Processor

Convert a single-cycle CPU into a 5-stage pipelined CPU: IF, ID, EX, MEM, WB.

### Architecture

- Add pipeline registers between stages and Split control and datapath accordingly
- Manage instruction fetch PC update, branch resolution (choose branch in EX or ID — pick one and document)

### Verification & Simulation

- Use longer assembly test programs that reveal hazards (e.g., dependent instruction sequences)
- Create a self-checking harness that runs for  $N$  cycles and compares the final register/memory content.

### Synthesis

- Synthesize pipeline; check timing reports using FPGA flow

### Deliverables

- RTL with pipeline registers, test programs that show pipeline behavior, simulation waveforms
- Proof of correct output despite pipeline (before hazard handling)
- Synthesis and timing reports describing datapath and control signals.

#### **Phase 4 — Hazard Detection & Forwarding (Stalling & Bypass)**

Implement the hazard detection unit (HDU) and forwarding (bypass) unit to eliminate or reduce stalls caused by data hazards; implement control for load-use stall.

##### **Verification**

- Write specific test programs with sequences requiring forwarding and load-use:
  - e.g., ADD R1, R2, R3 followed by SUB R4, R1, R5 (forwarding from EX)
  - LW R1, 0(R2) followed by ADD R3, R1, R4 (load-use: one cycle stall required)
- Self-checking testbenches verify register contents at known cycle counts
- Provide waveforms showing forwarding mux selection and inserted bubble
- **Corner cases**
  - Multi-cycle memory/IO (if implemented)
  - Branch + load interactions
  - Structural hazards if you reuse memory ports — avoid by design or document

##### **Deliverables**

- Hazard and forwarding RTL, testbenches with annotated waveforms, final reports describing the algorithm, synthesis and verification

## 3.0 DESIGN METHODOLOGY

The processor was developed utilizing a top-down, modular design methodology, structured across iterative phases to isolate and verify complexity at each step.

### 3.1 Phase 1 of the Design: Single-Cycle Architecture Validation

The initial phase of the design focused on developing a fundamental single-cycle MIPS microarchitecture. This approach established the baseline data path and validated the core control logic, ensuring that all operations executed correctly within a single clock period.



Figure 1: MIPS 32 Single-Cycle Data-Path [1].

During this stage, the core was modified to accommodate the project's specific custom instructions.

#### 3.1.1 Program counter (pc) next-state logic expansion

The instruction fetch mechanism was modified by expanding the PC input routing into a three-stage multiplexer network to support advanced control flow. The first stage resolves conditional branches by evaluating a logical AND between the control unit's branch signal and the ALU's zero flag. The second stage resolves standard unconditional jump targets. The critical modification occurs in the third stage, which introduces a custom data path allowing the Program Counter to be loaded directly from the Data Memory output. This structural addition enables specialized instructions, such as indirect jumps or returning directly from a memory-stored address.



Figure 2: Added pmc MUX.

### 3.1.2 Decoupled data memory addressing

The standard memory interface was restructured to decouple the read and write address computations. In a traditional MIPS architecture, the ALU result strictly dictates the memory address. To support custom instructions, dedicated multiplexers were introduced at the address ports. The read address can now be dynamically sourced from either the ALU result or directly from the second register operand (Read Data 2). Concurrently, the write address can be sourced from the ALU result or the PC+4 path. This allows the architecture to execute custom operations, such as pushing the current program counter directly into memory or performing register-direct memory accesses without routing through the ALU.



Figure 3: Dual port data memory with 2 MUXs to select the read address and the write data.

### 3.1.3 Extended register file destination routing

The write-back stage logic was modified by upgrading the standard 2-to-1 write destination multiplexer into a 3-to-1 multiplexer. While traditional MIPS limits the destination register to either the rt field (instruction[20:16]) for I-type or the rd field (instruction[15:11]) for R-type instructions, this architectural change introduces a third selection path. It allows the rs field (instruction[25:21]) to act as the write destination. This modification specifically accommodates custom instructions designed to compute a result and directly overwrite the first source register.



Figure 4: Expanded write address MUX.

### 3.2 Phase 2: Pipelined Architecture and Hazard Management

Following the validation of the single-cycle core, the architecture was partitioned and upgraded into a high-performance 5-stage pipelined design. To maintain data integrity across concurrent instruction executions, the following structural components were integrated:

- **Data Path:** Registers separate each of the five pipeline stages (IF/ID, ID/EX, EX/MEM, MEM/WB) to hold intermediate calculations and control signals.
- **Control Unit:** A hardwired main control unit decodes the 6-bit opcode in the ID stage and propagates the control signals through the pipeline registers to the EX, MEM, and WB stages.
- **Forwarding Unit:** To prevent stalling during RAW hazards, the forwarding unit monitors the destination registers of instructions in the EX/MEM and MEM/WB stages. If a dependent instruction enters the EX stage, the multiplexers bypass the Register File and forward the computed data directly to the ALU inputs.
- **Hazard Detection Unit:** If a Load-Use hazard occurs (e.g., an LW instruction followed immediately by an instruction that needs the loaded data), forwarding alone is insufficient. The Hazard Detection unit freezes the PC and IF/ID registers and flushes the ID/EX register (inserting a hardware bubble) for one clock cycle.



Figure 5: MIPS Pipeline CPU Architecture [2].

## 4.0 VERILOG CODES AND MODULE HIERARCHY

### 4.1 Program Counter

```
program_counter.v
1  module program_counter (
2    input [31:0] next_pc,
3    input rst, clk, overflow_flag, write_enable,
4    output reg [31:0] current_pc
5  );
6
7  always @(posedge clk or posedge rst) // asynchronous reset, PC will reset regardless of the clk
8  begin
9    if (rst)
10      begin
11        |   current_pc <= 32'd0;
12      end
13    else if (overflow_flag)
14      begin
15        |   current_pc <= current_pc;
16      end
17    else if (write_enable) begin
18      |   current_pc <= next_pc;
19    end
20    else
21      begin
22        |   current_pc <= current_pc;
23      end
24  end
25
26 endmodule
```

## 4.2 MUX 2x1 32-Bits

```
 mux_2to1_32bits.v
1  module mux_2to1_32bits (
2    |   input [31:0] input_1, input_2,
3    |   input select,
4    |   output [31:0] mux_out
5  );
6    |   assign mux_out = select?input_2:input_1;
7
8
9  endmodule
```

### 4.3 Instruction Memory

```
instruction_memory.v
1  module instruction_memory (
2      input [31:0] pc,
3      output [31:0] instruction
4  );
5      wire [31:0] pc_by_4; // as address 1 in instruction memory is equal to "pc = 4"
6      // in other words, MIPS is byte aligned and each word is 4 bytes
7
8      reg [31:0] instruction_memory_registers [0:255];
9
10     assign pc_by_4 = pc >> 2;
11     assign instruction = instruction_memory_registers [pc_by_4];
12
13     initial // load "instructions.hex" into the instruction memory
14     begin
15         $readmemh("instructions.hex", instruction_memory_registers );
16     end
17
18 endmodule
```

#### 4.4 Adder

```
≡ adder.v
1 ∵ module adder (
2   |   input [31:0] input_1, input_2,
3   |   output [31:0] adder_output
4 ∵ );
5   |
6   |   assign adder_output = input_1 + input_2;
7
8 endmodule
```

## 4.5 IF ID Stage Register

```
if_id_stage.v
1  module if_id_stage (
2    input clk, rst,
3    input write_enable, flush,
4    input [31:0] instruction_in, pc_in, pc_plus_4_in,
5    output reg [31:0] instruction_out, pc_out, pc_plus_4_out
6  );
7
8  always @ (posedge clk or posedge rst) begin
9    if (rst) begin
10      instruction_out <= 32'd0;
11      pc_plus_4_out <= 32'd0;
12      pc_out <= 32'd0;
13    end
14    else if (flush) begin
15      instruction_out <= 32'd0;
16      pc_plus_4_out <= 32'd0;
17      pc_out <= 32'd0;
18    end
19    else if (write_enable) begin
20      instruction_out <= instruction_in;
21      pc_plus_4_out <= pc_plus_4_in;
22      pc_out <= pc_in;
23    end
24    else begin
25      instruction_out <= instruction_out;
26      pc_plus_4_out <= pc_plus_4_out;
27      pc_out <= pc_out;
28    end
29  end
30
31 endmodule
```

## 4.6 Register File

```
register_file.v
1  module register_file (
2    input [4:0] read_address_1, read_address_2, write_address,
3    input [31:0] write_data,
4    input reg_write, clk, rst,
5    output [31:0] reg_out_1, reg_out_2
6  );
7  reg [31:0] registers [0:31]; // 32 register each with width of 32 bit
8  integer i; // used in the for loop for clearing the content of the registers
9
10
11  always @(posedge clk)
12    begin
13      if (rst) // reset the content of all register to zero
14        begin
15          for (i = 1 ; i<32 ; i = i+1 ) begin
16              registers[i] <= 32'd0;
17          end
18        end
19      else if (reg_write && (write_address!= 5'd0)) // write if it is not the 0th register
20        begin
21            registers[write_address] <= write_data;
22        end
23    end
24
25  // hard wire the 0th register to zero
26  assign reg_out_1 = (read_address_1 == 5'd0) ? 32'd0 : registers[read_address_1];
27  assign reg_out_2 = (read_address_2 == 5'd0) ? 32'd0 : registers[read_address_2];
28
29 endmodule
```

## 4.7 Sign Extender

```
sign_extender.v
1 module sign_extender (
2     input [15:0] in,
3     output [31:0] out
4 );
5     assign out = {{16{in[15]}}, in};
6 endmodule
```

## 4.8 Control Unit

```
control_unit.v
1  module control_unit (
2    input [5:0] op_code,
3    output reg [1:0] register_destination, alu_op,
4    output reg jump, branch, memory_read, memory_write, memory_to_register, alu_source,
5    output reg reg_write, pc_control, memory_write_source, memory_read_source
6  );
7  always @(*) begin
8
9    // initialize everything to zero
10   register_destination = 2'b00;
11   alu_op = 2'b00;
12   jump = 1'b0;
13   branch = 1'b0;
14   memory_read = 1'b0;
15   memory_write = 1'b0;
16   memory_to_register = 1'b0;
17   alu_source = 1'b0;
18   reg_write = 1'b0;
19   pc_control = 1'b0;
20   memory_write_source = 1'b0;
21   memory_read_source = 1'b0;
22
23   case (op_code)
24
25     6'b000000: begin // all R-type instructions
26       alu_op = 2'b10;
27       reg_write = 1'b1;
28       register_destination = 2'b01;
29     end
30
31     6'b100011: begin // lw instruction
32       alu_source = 1'b1;
33       memory_read = 1'b1;
34       memory_to_register = 1'b1;
35       reg_write = 1'b1;
36     end
37
38     6'b101011: begin // sw instruction
39       memory_write = 1'b1;
40       alu_source = 1'b1;
41     end
42
43     6'b000100: begin // branch instruction
44       branch = 1'b1;
45       alu_op = 2'b01;
46     end
```

## 4.8 Control Unit (Cont.)

```
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
```

```
6'b001000: begin // addi instruction
    alu_source = 1'b1;
    reg_write = 1'b1;
    alu_op = 2'b00; // force add, it is 00 by default but i add it anyways to show that we treat addi as a normal add instruction
end

6'b001100: begin // andi instruction
    alu_source = 1'b1;
    alu_op = 2'b11;
    reg_write = 1'b1;
end

6'b000010: begin // jump instruction
    jump = 1'b1;
end

// custom instructions

6'b110000: begin // jump mem indirect instruction
    alu_source = 1'b1;
    memory_read = 1'b1;
    alu_op = 2'b00; // force add
    pc_control = 1'b1;
end

6'b110001: begin // store and increment instruction
    alu_source = 1'b1;
    register_destination = 2'b10;
    reg_write = 1'b1;
    memory_write = 1'b1;
end

6'b110010: begin // program mem copy instruction
    alu_source = 1'b1;
    pc_control = 1'b1;
    memory_read = 1'b1;
    memory_write = 1'b1;
    memory_write_source = 1'b1;
    memory_read_source = 1'b1;
end

endcase
end
endmodule
```

## 4.9 Hazard Detection Unit

```
≡ hazard_detection_unit.v
1  module hazard_detection_unit (
2    input [4:0] id_rs, id_rt, ex_rt,
3    input mem_read,
4    output reg hazard_flag, if_id_write_enable, pc_write_enable
5  );
6    always @(*) begin
7      hazard_flag = 1'b0;
8      if_id_write_enable = 1'b1;
9      pc_write_enable = 1'b1;
10
11     if (mem_read && ( (id_rs == ex_rt) || (id_rt == ex_rt) ) ) begin
12       hazard_flag = 1'b1;
13       if_id_write_enable = 1'b0;
14       pc_write_enable = 1'b0;
15     end
16
17   end
18 endmodule
```

## 4.10 ID EX Stage Register

```
≡ id_ex_stage.v
1  module id_ex_stage (
2
3      //*****
4      //inputs
5      //*****
6
7      // system signals
8      input clk, rst,
9      input flush,
10
11     // data signals
12     input [31:0] pc_in, pc_plus_4_in, reg_file_out_1_in, reg_file_out_2_in, sign_extended_in,
13     input [4:0] reg_rs_address_in, reg_rt_address_in, reg_rd_address_in,
14     input [5:0] funct_in,
15
16     // control signals
17     input [1:0] register_destination_in, alu_op_in,
18     input branch_in, memory_read_in, memory_write_in, memory_to_register_in,
19     input alu_source_in, reg_write_in, pc_control_in, memory_write_source_in, memory_read_source_in,
20
21     //*****
22     // outputs
23     //*****
24
25     // data signals
26     output reg [31:0] pc_out, pc_plus_4_out, reg_file_out_1_out, reg_file_out_2_out, sign_extended_out,
27     output reg [4:0] reg_rs_address_out, reg_rt_address_out, reg_rd_address_out,
28     output reg [5:0] funct_out,
29
30     // control signals
31     output reg [1:0] register_destination_out, alu_op_out,
32     output reg branch_out, memory_read_out, memory_write_out, memory_to_register_out,
33     output reg alu_source_out, reg_write_out, pc_control_out, memory_write_source_out, memory_read_source_out
34 );

```

## 4.10 ID EX Stage Register (Cont.)

```
36  always @(posedge clk or posedge rst) begin
37      if (rst) begin
38          pc_out <= 32'd0;
39          pc_plus_4_out <= 32'd0;
40          reg_file_out_1_out <= 32'd0;
41          reg_file_out_2_out <= 32'd0;
42          sign_extended_out <= 32'd0;
43
44          reg_rs_address_out <= 5'd0;
45          reg_rt_address_out <= 5'd0;
46          reg_rd_address_out <= 5'd0;
47
48          funct_out <= 6'd0;
49
50          register_destination_out <= 2'd0;
51          alu_op_out <= 2'd0;
52
53          branch_out <= 1'd0;
54          memory_read_out <= 1'd0;
55          memory_write_out <= 1'd0;
56          memory_to_register_out <= 1'd0;
57          alu_source_out <= 1'd0;
58          reg_write_out <= 1'd0;
59          pc_control_out <= 1'd0;
60          memory_write_source_out <= 1'd0;
61          memory_read_source_out <= 1'd0;
62      end
```

## 4.10 ID EX Stage Register (Cont.)

```
63      |      else if (flush) begin
64      |      |      pc_out <= 32'd0;
65      |      |      pc_plus_4_out <= 32'd0;
66      |      |      reg_file_out_1_out <= 32'd0;
67      |      |      reg_file_out_2_out <= 32'd0;
68      |      |      sign_extended_out <= 32'd0;
69
70      |      |      reg_rs_address_out <= 5'd0;
71      |      |      reg_rt_address_out <= 5'd0;
72      |      |      reg_rd_address_out <= 5'd0;
73
74      |      |      funct_out <= 6'd0;
75
76      |      |      register_destination_out <= 2'd0;
77      |      |      alu_op_out <= 2'd0;
78
79      |      |      branch_out <= 1'd0;
80      |      |      memory_read_out <= 1'd0;
81      |      |      memory_write_out <= 1'd0;
82      |      |      memory_to_register_out <= 1'd0;
83      |      |      alu_source_out <= 1'd0;
84      |      |      reg_write_out <= 1'd0;
85      |      |      pc_control_out <= 1'd0;
86      |      |      memory_write_source_out <= 1'd0;
87      |      |      memory_read_source_out <= 1'd0;
88      |      end
```

## 4.10 ID EX Stage Register (Cont.)

```
89      else begin
90          pc_out <= pc_in;
91          pc_plus_4_out <= pc_plus_4_in;
92          reg_file_out_1_out <= reg_file_out_1_in;
93          reg_file_out_2_out <= reg_file_out_2_in;
94          sign_extended_out <= sign_extended_in;
95
96          reg_rs_address_out <= reg_rs_address_in;
97          reg_rt_address_out <= reg_rt_address_in;
98          reg_rd_address_out <= reg_rd_address_in;
99
100         funct_out <= funct_in;
101
102         register_destination_out <= register_destination_in;
103         alu_op_out <= alu_op_in;
104
105         branch_out <= branch_in;
106         memory_read_out <= memory_read_in;
107         memory_write_out <= memory_write_in;
108         memory_to_register_out <= memory_to_register_in;
109         alu_source_out <= alu_source_in;
110         reg_write_out <= reg_write_in;
111         pc_control_out <= pc_control_in;
112         memory_write_source_out <= memory_write_source_in;
113         memory_read_source_out <= memory_read_source_in;
114     end
115
116 end
117
118 endmodule
```

## 4.11 ALU

```
aluv
1 module alu (
2     input [31:0] input_1, input_2,
3     input [2:0] alu_control,
4     output reg [31:0] alu_result,
5     output reg zero_flag,
6     output reg overflow_flag
7 );
8     always @(*)
9         begin
10             overflow_flag = 1'b0; // default to zero and change to 1 if an overflow is detected
11             case (alu_control)
12                 3'b000: begin
13                     alu_result = input_1+input_2; // add
14
15                     if ( (input_1[31] == input_2[31]) & (alu_result[31] != input_1[31]) )
16                         // if the inputs are +ve and the result is negative, then overflow, or visversa
17                         begin
18                             |   overflow_flag = 1'b1;
19                         end
20
21                 3'b001: begin
22                     alu_result = input_1-input_2; // sub
23
24                     if ( (input_1[31] != input_2[31]) & (alu_result[31] == input_2[31]) )
25                         // if +ve - -ve equal to -ve, then overflow
26                         // if -ve - +ve equal to +ve, then overflow
27                         begin
28                             |   overflow_flag = 1'b1;
29                         end
30
31                 3'b010: alu_result = input_1&input_2; // and
32                 3'b011: alu_result = input_1|input_2; // or
33                 3'b100: alu_result = (input_1 < input_2)?32'd1:32'd0; // slt
34                 default: alu_result = input_1+input_2; // default case is add to avoid latches
35             endcase
36         end
37
38     assign zero_flag = (alu_result == 32'd0) ? 1'b1 : 1'b0; // zero flag
39
40 endmodule
```

## 4.12 ALU Control Unit

```
alu_control_unit.v
1  module alu_control_unit (
2    input [1:0] alu_op,
3    input [5:0] funct,
4    output reg [2:0] alu_control_out
5  );
6
7  always @(*)
8    begin
9      case (alu_op)
10        2'b00: alu_control_out = 3'b000; // add, force add for lw and sw instructions
11        2'b01: alu_control_out = 3'b001; // sub, force sub for branch instruction
12        2'b10:
13          begin
14            case (funct)
15              6'b100000: alu_control_out = 3'b000; // add
16              6'b100010: alu_control_out = 3'b001; // sub
17              6'b100100: alu_control_out = 3'b010; // and
18              6'b100101: alu_control_out = 3'b011; // or
19              6'b101010: alu_control_out = 3'b100; // slt
20              default: alu_control_out = 3'b000; // add
21            endcase
22          end
23        2'b11: alu_control_out = 3'b010; //and, for andi instruction
24        default: alu_control_out = 3'b000; // add
25      endcase
26    end
27
28 endmodule
```

## 4.13 MUX 4x1 32-Bits

```
 mux_4to1_32bits.v
 1 module mux_4to1_32bits (
 2   input [31:0] input_1, input_2, input_3, input_4,
 3   input [1:0] select,
 4   output reg [31:0] mux_out
 5 );
 6
 7   always @(*) begin
 8     case (select)
 9       2'b00: mux_out = input_1; // default alu input
10       2'b01: mux_out = input_2; // destination register of ex_mem stage
11       2'b10: mux_out = input_3; // destination register of mem_wb stage
12       2'b11: mux_out = input_4; // 32'd0 for if the destination register was $0
13     default: mux_out = input_1; // default alu input
14   endcase
15 end
16 endmodule
```

## 4.14 MUX 4x1 5-Bits

```
 mux_4to1_5bits.v
 1 module mux_4to1_5bits (
 2   input [4:0] input_1, input_2, input_3, input_4,
 3   input [1:0] select,
 4   output reg [4:0] mux_out
 5 );
 6
 7   always @(*) begin
 8     case (select)
 9       2'b00: mux_out = input_1; // for I-type
10       2'b01: mux_out = input_2; // for R-type
11       2'b10: mux_out = input_3; // for swi (custom instruction)
12       2'b11: mux_out = input_4;
13     default: mux_out = input_2; // default will be like normal I-type
14   endcase
15 end
16 endmodule
```

## 4.15 Forwarding Unit

```
forwarding_unit.v
1  module forwarding_unit (
2    input [4:0] destination_register_of_1st_previous_instruction,
3    input [4:0] destination_register_of_2nd_previous_instruction,
4    input [4:0] source_register_1, source_register_2,
5    input reg_write_1st_instruction, reg_write_2nd_instruction,
6
7    output reg [1:0] alu_input_1, alu_input_2
8  );
9
10
11  always @(*) begin
12    if (source_register_1 == 5'd0) begin
13      | alu_input_1 = 2'b11; // if source is $0 register then pass 32'd0
14    end
15    else if ((source_register_1 == destination_register_of_1st_previous_instruction) && reg_write_1st_instruction) begin
16      | alu_input_1 = 2'b01; // pass the value from ex_mem registe
17    end
18    else if ((source_register_1 == destination_register_of_2nd_previous_instruction) && reg_write_2nd_instruction) begin
19      | alu_input_1 = 2'b10; // pass the value from mem_wb registe
20    end
21    else alu_input_1 = 2'b00; // pass the degault value form the single cycle (read data 1)
22  end
23
24  always @(*) begin
25    if (source_register_2 == 5'd0) begin
26      | alu_input_2 = 2'b11; // if source is $0 register then pass 32'd0
27    end
28    else if ((source_register_2 == destination_register_of_1st_previous_instruction) && reg_write_1st_instruction) begin
29      | alu_input_2 = 2'b01; // pass the value from ex_mem registe
30    end
31    else if ((source_register_2 == destination_register_of_2nd_previous_instruction) && reg_write_2nd_instruction) begin
32      | alu_input_2 = 2'b10; // pass the value from mem_wb registe
33    end
34    else alu_input_2 = 2'b00; // pass the degault value form the single cycle (alu_source mux output)
35  end
36 endmodule
```

## 4.16 EX MEM Stage Register

```
id_ex_stage.v
1 module id_ex_stage (
2     //*****
3     //inputs
4     //*****
5
6     // system signals
7     input clk, rst,
8     input flush,
9
10    // data signals
11    input [31:0] pc_in, pc_plus_4_in, reg_file_out_1_in, reg_file_out_2_in, sign_extended_in,
12    input [4:0] reg_rs_address_in, reg_rt_address_in, reg_rd_address_in,
13    input [5:0] funct_in,
14
15    // control signals
16    input [1:0] register_destination_in, alu_op_in,
17    input branch_in, memory_read_in, memory_write_in, memory_to_register_in,
18    input alu_source_in, reg_write_in, pc_control_in, memory_write_source_in, memory_read_source_in,
19
20    //*****
21    // outputs
22    //*****
23
24    // data signals
25    output reg [31:0] pc_out, pc_plus_4_out, reg_file_out_1_out, reg_file_out_2_out, sign_extended_out,
26    output reg [4:0] reg_rs_address_out, reg_rt_address_out, reg_rd_address_out,
27    output reg [5:0] funct_out,
28
29    // control signals
30    output reg [1:0] register_destination_out, alu_op_out,
31    output reg branch_out, memory_read_out, memory_write_out, memory_to_register_out,
32    output reg alu_source_out, reg_write_out, pc_control_out, memory_write_source_out, memory_read_source_out
33
34 );
```

## 4.16 EX MEM Stage Register (Cont.)

```
36      always @(posedge clk or posedge rst) begin
37          if (rst) begin
38              pc_out <= 32'd0;
39              pc_plus_4_out <= 32'd0;
40              reg_file_out_1_out <= 32'd0;
41              reg_file_out_2_out <= 32'd0;
42              sign_extended_out <= 32'd0;
43
44              reg_rs_address_out <= 5'd0;
45              reg_rt_address_out <= 5'd0;
46              reg_rd_address_out <= 5'd0;
47
48              funct_out <= 6'd0;
49
50              register_destination_out <= 2'd0;
51              alu_op_out <= 2'd0;
52
53              branch_out <= 1'd0;
54              memory_read_out <= 1'd0;
55              memory_write_out <= 1'd0;
56              memory_to_register_out <= 1'd0;
57              alu_source_out <= 1'd0;
58              reg_write_out <= 1'd0;
59              pc_control_out <= 1'd0;
60              memory_write_source_out <= 1'd0;
61              memory_read_source_out <= 1'd0;
62      end
```

#### 4.16 EX MEM Stage Register (Cont.)

```
63      |      else if (flush) begin
64      |      |      pc_out <= 32'd0;
65      |      |      pc_plus_4_out <= 32'd0;
66      |      |      reg_file_out_1_out <= 32'd0;
67      |      |      reg_file_out_2_out <= 32'd0;
68      |      |      sign_extended_out <= 32'd0;
69
70      |      |
71      |      |      reg_rs_address_out <= 5'd0;
72      |      |      reg_rt_address_out <= 5'd0;
73      |      |      reg_rd_address_out <= 5'd0;
74
75      |      |
76      |      |      funct_out <= 6'd0;
77
78      |      |
79      |      |      register_destination_out <= 2'd0;
80      |      |      alu_op_out <= 2'd0;
81
82      |      |
83      |      |      branch_out <= 1'd0;
84      |      |      memory_read_out <= 1'd0;
85      |      |      memory_write_out <= 1'd0;
86      |      |      memory_to_register_out <= 1'd0;
87      |      |      alu_source_out <= 1'd0;
88      |      |      reg_write_out <= 1'd0;
      |      |
      |      end
```

#### 4.16 EX MEM Stage Register (Cont.)

```
89      |     else begin
90      |     |     pc_out <= pc_in;
91      |     |     pc_plus_4_out <= pc_plus_4_in;
92      |     |     reg_file_out_1_out <= reg_file_out_1_in;
93      |     |     reg_file_out_2_out <= reg_file_out_2_in;
94      |     |     sign_extended_out <= sign_extended_in;
95
96      |     |     reg_rs_address_out <= reg_rs_address_in;
97      |     |     reg_rt_address_out <= reg_rt_address_in;
98      |     |     reg_rd_address_out <= reg_rd_address_in;
99
100     |     |     funct_out <= funct_in;
101
102     |     |     register_destination_out <= register_destination_in;
103     |     |     alu_op_out <= alu_op_in;
104
105     |     |     branch_out <= branch_in;
106     |     |     memory_read_out <= memory_read_in;
107     |     |     memory_write_out <= memory_write_in;
108     |     |     memory_to_register_out <= memory_to_register_in;
109     |     |     alu_source_out <= alu_source_in;
110     |     |     reg_write_out <= reg_write_in;
111     |     |     pc_control_out <= pc_control_in;
112     |     |     memory_write_source_out <= memory_write_source_in;
113     |     |     memory_read_source_out <= memory_read_source_in;
114
115     |     end
116
117
118   endmodule
```

## 4.17 Data Memory

```
data_memory.v
1  module data_memory (
2    input memory_read, memory_write, clk,
3    input [31:0] read_address, write_address, write_data,
4    output reg [31:0] output_data
5  );
6
7  wire [31:0] actual_read_address, actual_write_address;
8
9  // as each word (register) holds 4 bytes
10 // in other words, MIPS is byte aligned and each word is 4 bytes, so address if the alu result is 8, then this is the 2nd register and not the 8th
11 assign actual_read_address = read_address >> 2;
12 assign actual_write_address = write_address >> 2;
13
14 reg [31:0] data_memory_registers [0:1023];
15
16 initial // load "data.hex" into the data memory
17 begin
18   $readmemh("data.hex", data_memory_registers );
19 end
20
21 always @(*) begin
22
23   output_data = 32'd0; // default value to avoid latches
24
25   if (memory_read) begin
26     if (actual_read_address > 32'd1023) begin
27       | output_data = data_memory_registers[1023];
28     end
29     else begin
30       | output_data = data_memory_registers[actual_read_address];
31     end
32   end
33 end
34
35 always @ (posedge clk) begin // write is synchronous to avoid potential errors, as we must make sure that the write address is ready and stable before actually write in it
36   if (memory_write) begin
37     if (actual_write_address > 32'd1023) begin
38       | data_memory_registers[1023] <= write_data;
39     end
40     else begin
41       | data_memory_registers[actual_write_address] <= write_data;
42     end
43   end
44 end
45 endmodule
```

#### 4.18 MEM WB Stage Register

```
mem_wb_stage.v
1  module mem_wb_stage (
2
3      //*****
4      //inputs
5      //*****
6
7      input clk, rst,
8
9      input [31:0] pc_in,
10
11     // data signals
12     input [31:0] memory_data_in, alu_result_in,
13     input [4:0] register_destination_in,
14
15     // control signals
16     input memory_to_register_in, reg_write_in,
17     input overflow_flag_in,
18
19     //*****
20     //outputs
21     //*****
22
23     output reg [31:0] pc_out,
24
25     // data signals
26     output reg [31:0] memory_data_out, alu_result_out,
27     output reg [4:0] register_destination_out,
28
29     // control signals
30     output reg memory_to_register_out, reg_write_out,
31     output reg overflow_flag_out
32 );
```

## 4.18 MEM WB Stage Register (Cont.)

```
34      always @(posedge clk or posedge rst) begin
35          if (rst) begin
36              pc_out <= 32'd0;
37
38              memory_data_out <= 32'd0;
39              alu_result_out <= 32'd0;
40
41              register_destination_out <= 5'd0;
42
43              memory_to_register_out <= 1'd0;
44              reg_write_out <= 1'd0;
45
46              overflow_flag_out <= 1'd0;
47          end
48      else begin
49          pc_out <= pc_in;
50
51          memory_data_out <= memory_data_in;
52          alu_result_out <= alu_result_in;
53
54          register_destination_out <= register_destination_in;
55
56          memory_to_register_out <= memory_to_register_in;
57          reg_write_out <= reg_write_in;
58
59          overflow_flag_out <= overflow_flag_in;
60      end
61  end
62
63 endmodule
```

## 4.19 Top Module

```
≡ top_module.v
1 module top_module (
2     input clk, rst
3 );
4
5 //*****
6 //wires declaration
7 //*****
8
9 //*****
10 //if stage
11 //*****
12
13 wire [31:0] if_current_pc, if_instruction, if_current_pc_plus_4;
14
15 wire [31:0] if_branch_mux_output, if_jump_mux_output, if_pmc_mux_output;
16
17 //*****
18 //id stage
19 //*****
20
21 wire [31:0] id_instruction, id_pc, id_pc_plus_4;
22
23 // control signals
24 wire [1:0] id_register_destination, id_alu_op;
25 wire id_jump, id_branch, id_memory_read, id_memory_write, id_memory_to_register, id_alu_source;
26 wire id_reg_write, id_pc_control, id_memory_write_source, id_memory_read_source;
27
28 // register file signals
29 wire [31:0] id_register_out_1, id_register_out_2;
30
31 // sign extender signal
32 wire [31:0] id_sign_extended;
33
34 // hazard detection signals
35 wire id_hazard_flag, id_pc_write_enable, if_id_write_enable;
36
37 // jump signals
38 wire [31:0] id_jump_address;
39     wire [27:0] id_jump_imm_shifted_by_2;
```

## 4.19 Top Module (Cont.)

```
41 //*****
42 //ex stage
43 //*****
44
45 wire [31:0] ex_pc;
46
47 // MUXs inputs
48 wire [31:0] ex_mux_1_input_1, ex_mux_1_input_2, ex_mux_1_input_3, ex_mux_1_input_4;
49 wire [31:0] ex_mux_2_input_1, ex_mux_2_input_2, ex_mux_2_input_3, ex_mux_2_input_4;
50 wire [31:0] ex_alu_source_mux_input_1, ex_alu_source_mux_input_2;
51
52 // MUXs outputs
53 wire [31:0] ex_mux_1_output, ex_mux_2_output, ex_alu_source_mux_output;
54
55 // MUXs control signals
56 wire ex_alu_source;
57 wire [1:0] ex_alu_input_1_select, ex_alu_input_2_select;
58
59 // alu control unit signals
60 wire [1:0] ex_alu_op;
61 wire [2:0] ex_alu_control;
62 wire [5:0] ex_funct;
63
64 // register addresses
65 wire [4:0] ex_rt_address, ex_rd_address, ex_rs_address;
66 wire [4:0] ex_destination_register;
67
68 // alu signals
69 wire [31:0] ex_alu_result;
70 wire ex_zero_flag, ex_overflow_flag;
71
72 // branch signals
73 wire [31:0] ex_pc_plus_4, ex_imm, ex_imm_shifted_by_2;
74 wire [31:0] ex_branch_address;
75
76 // control signals
77 wire [1:0] ex_register_destination;
78 wire ex_branch, ex_memory_read, ex_memory_write, ex_memory_to_register;
79 wire ex_reg_write, ex_pc_control, ex_memory_write_source, ex_memory_read_source;
```

## 4.19 Top Module (Cont.)

```
81      //*****
82      //mem stage
83      //*****
84
85      wire [31:0] mem_pc;
86
87      // memory MUXs signals
88      wire [31:0] mem_alu_result, mem_register_file_output_2, mem_pc_plus_4;
89
90      // control signals
91      wire mem_branch_control;
92
93      // memory singals
94      wire [31:0] mem_memory_read_address, mem_memory_write_data;
95      wire [31:0] mem_memory_in_data, mem_memory_out_data;
96
97      // branch address
98      wire [31:0] mem_branch_address;
99
100     // control signals
101     wire mem_branch, mem_memory_read, mem_memory_write, mem_memory_to_register;
102     wire mem_reg_write, mem_pc_control, mem_memory_write_source, mem_memory_read_source;
103
104     wire [4:0] mem_destination_register;
105     wire mem_zero_flag, mem_overflow_flag;
106
107     //*****
108     //wb stage
109     //*****
110
111     wire [31:0] wb_pc;
112
113     // memory to register mux signals
114     wire [31:0] wb_memory_data, wb_alu_result, wb_register_file_data_wrtie;
115
116     // control signals
117     wire wb_memory_to_register;
118     wire wb_reg_write;
119     wire [4:0] wb_register_destination;
120
121     wire wb_overflow_flag;
```

## 4.19 Top Module (Cont.)

```
123 //*****
124 //some extra assigns
125 //*****
126
127 // jump address
128 assign id_jump_imm_shifted_by_2 = {id_instruction[25:0], 2'b00};
129 assign id_jump_address = {id_pc_plus_4[31:28], id_jump_imm_shifted_by_2};
130
131 // branch address
132 assign ex_imm_shifted_by_2 = {ex_alu_source_mux_input_2[29:0], 2'b00};
133
134 // 4th input for alu
135 assign ex_mux_1_input_4 = 32'd0;
136 assign ex_mux_2_input_4 = 32'd0;
137
138 // branch control
139 assign mem_branch_control = mem_branch & mem_zero_flag;
140
141 // flush signals
142 assign mem_stage_flush_trigger = mem_branch_control | mem_pc_control;
143 assign if_id_flush = id_jump | mem_stage_flush_trigger;
144 assign id_ex_flush = mem_stage_flush_trigger | id_hazard_flag;
```

## 4.19 Top Module (Cont.)

```
147     mux_2to1_32bits branch_mux (
148         .input_1(if_current_pc_plus_4), .input_2(mem_branch_address),
149         .select(mem_branch_control),
150         .mux_out(if_branch_mux_output)
151     );
152
153     mux_2to1_32bits jump_mux (
154         .input_1(if_branch_mux_output), .input_2(id_jump_address),
155         .select(id_jump),
156         .mux_out(if_jump_mux_output)
157     );
158
159     mux_2to1_32bits pmc_mux (
160         .input_1(if_jump_mux_output), .input_2(mem_memory_out_data),
161         .select(mem_pc_control),
162         .mux_out(if_pmc_mux_output)
163     );
164
165     program_counter program_counter_module (
166         .next_pc(if_pmc_mux_output),
167         .clk(clk), .rst(rst), .overflow_flag(wb_overflow_flag), .write_enable(id_pc_write_enable),
168         .current_pc(if_current_pc)
169     );
170
171     instruction_memory instruction_memory_module (
172         .pc(if_current_pc),
173         .instruction(if_instruction)
174     );
175
176     adder pc_adder_module (
177         .input_1(32'd4), .input_2(if_current_pc),
178         .adder_output(if_current_pc_plus_4)
179     );
180
181     if_id_stage if_id_stage_register (
182         .clk(clk), .rst(rst),
183         .write_enable(if_id_write_enable), .flush(if_id_flush),
184         .instruction_in(if_instruction), .pc_in(if_current_pc), .pc_plus_4_in(if_current_pc_plus_4),
185         .instruction_out(id_instruction), .pc_out(id_pc), .pc_plus_4_out(id_pc_plus_4)
186     );
```

## 4.19 Top Module (Cont.)

```
188     register_file register_file_module (
189         .read_address_1(id_instruction[25:21]), .read_address_2(id_instruction[20:16]),
190         .write_address(wb_register_destination), .write_data(wb_register_file_data_wrtie),
191         .reg_write(wb_reg_write), .clk(clk), .rst(rst),
192         .reg_out_1(id_register_out_1), .reg_out_2(id_register_out_2)
193     );
194
195     sign_extender sign_extender_module (
196         .in(id_instruction[15:0]), .out(id_sign_extended)
197     );
198
199     control_unit control_unit_module (
200         .op_code(id_instruction[31:26]),
201         .register_destination(id_register_destination), .alu_op(id_alu_op),
202         .jump(id_jump), .branch(id_branch), .memory_read(id_memory_read),
203         .memory_write(id_memory_write), .memory_to_register(id_memory_to_register),
204         .alu_source(id_alu_source), .reg_write(id_reg_write), .pc_control(id_pc_control),
205         .memory_write_source(id_memory_write_source), .memory_read_source(id_memory_read_source)
206     );
207
208     hazard_detection_unit hazard_detection_unit_module (
209         .id_rs(id_instruction[25:21]), .id_rt(id_instruction[20:16]), .ex_rt(ex_rt_address),
210         .mem_read(ex_memory_read),
211         .hazard_flag(id_hazard_flag), .if_id_write_enable(if_id_write_enable), .pc_write_enable(id_pc_write_enable)
212     );
```

## 4.19 Top Module (Cont.)

```
214     id_ex_stage id_ex_stage_register (
215         //*****
216         //inputs
217         //*****
218
219         // system signals
220         .clk(clk), .rst(rst),
221         .flush(id_ex_flush),
222
223         // data signals
224         .pc_in(id_pc), .pc_plus_4_in(id_pc_plus_4), .reg_file_out_1_in(id_register_out_1),
225         .reg_file_out_2_in(id_register_out_2), .sign_extended_in(id_sign_extended),
226         .reg_rs_address_in(id_instruction[25:21]), .reg_rt_address_in(id_instruction[20:16]),
227         .reg_rd_address_in(id_instruction[15:11]), .funct_in(id_instruction[5:0]),
228
229         // control signals
230         .register_destination_in(id_register_destination), .alu_op_in(id_alu_op),
231         .branch_in(id_branch), .memory_read_in(id_memory_read),
232         .memory_write_in(id_memory_write), .memory_to_register_in(id_memory_to_register),
233         .alu_source_in(id_alu_source), .reg_write_in(id_reg_write), .pc_control_in(id_pc_control),
234         .memory_write_source_in(id_memory_write_source), .memory_read_source_in(id_memory_read_source),
235
236         //*****
237         // outputs
238         //*****
239
240         // data signals
241         .pc_out(ex_pc), .pc_plus_4_out(ex_pc_plus_4), .reg_file_out_1_out(ex_mux_1_input_1),
242         .reg_file_out_2_out(ex_alu_source_mux_input_1), .sign_extended_out(ex_alu_source_mux_input_2),
243         .reg_rs_address_out(ex_rs_address), .reg_rt_address_out(ex_rt_address),
244         .reg_rd_address_out(ex_rd_address), .funct_out(ex_funct),
245
246         // control signals
247         .register_destination_out(ex_register_destination), .alu_op_out(ex_alu_op),
248         .branch_out(ex_branch), .memory_read_out(ex_memory_read),
249         .memory_write_out(ex_memory_write), .memory_to_register_out(ex_memory_to_register),
250         .alu_source_out(ex_alu_source), .reg_write_out(ex_reg_write), .pc_control_out(ex_pc_control),
251         .memory_write_source_out(ex_memory_write_source), .memory_read_source_out(ex_memory_read_source)
252     );
```

## 4.19 Top Module (Cont.)

```
254 adder branch_adder_module (
255     .input_1(ex_pc_plus_4), .input_2(ex_imm_shifted_by_2),
256     .adder_output(ex_branch_address)
257 );
258
259 mux_4to1_32bits alu_input_1_mux (
260     .input_1(ex_mux_1_input_1), .input_2(ex_mux_1_input_2),
261     .input_3(ex_mux_1_input_3), .input_4(ex_mux_1_input_4),
262     .select(ex_alu_input_1_select),
263     .mux_out(ex_mux_1_output)
264 );
265
266 mux_2to1_32bits alu_source_mux (
267     .input_1(ex_alu_source_mux_input_1), .input_2(ex_alu_source_mux_input_2),
268     .select(ex_alu_source),
269     .mux_out(ex_mux_2_input_1)
270 );
271
272 mux_4to1_32bits alu_input_2_mux (
273     .input_1(ex_mux_2_input_1), .input_2(ex_mux_2_input_2),
274     .input_3(ex_mux_2_input_3), .input_4(ex_mux_2_input_4),
275     .select(ex_alu_input_2_select),
276     .mux_out(ex_mux_2_output)
277 );
278
279 alu alu_module (
280     .input_1(ex_mux_1_output), .input_2(ex_mux_2_output),
281     .alu_control(ex_alu_control),
282     .alu_result(ex_alu_result),
283     .zero_flag(ex_zero_flag),
284     .overflow_flag(ex_overflow_flag)
285 );
286
287 alu_control_unit alu_control_unit_module (
288     .alu_op(ex_alu_op),
289     .funct(ex_funct),
290     .alu_control_out(ex_alu_control)
291 );
292
293 mux_4to1_5bits destination_register_mux (
294     .input_1(ex_rt_address), .input_2(ex_rd_address),
295     .input_3(ex_rs_address), .input_4(5'd0),
296     .select(ex_register_destination),
297     .mux_out(ex_destination_register)
298 );
```

## 4.19 Top Module (Cont.)

```
300    forwarding_unit forwarding_unit_module (
301        .destination_register_of_1st_previous_instruction(mem_destination_register),
302        .destination_register_of_2nd_previous_instruction(wb_register_destination),
303        .source_register_1(ex_rs_address), .source_register_2(ex_rt_address),
304        .reg_write_1st_instruction(mem_reg_write), .reg_write_2nd_instruction(wb_reg_write),
305
306        .alu_input_1(ex_alu_input_1_select), .alu_input_2(ex_alu_input_2_select)
307    );
308
309    ex_mem_stage ex_mem_stage_register (
310        //*****
311        //inputs
312        //*****
313
314        .clk(clk), .rst(rst), .flush(mem_stage_flush_trigger),
315        .branch_target_in(ex_branch_address), .pc_in(ex_pc), .pc_plus_4_in(ex_pc_plus_4),
316        .alu_result_in(ex_alu_result), .reg_file_out_2_in(ex_alu_source_mux_input_1),
317        .register_destination_in(ex_register_destination),
318        .zero_flag_in(ex_zero_flag), .overflow_flag_in(ex_overflow_flag),
319
320        // control signals
321        .branch_in(ex_branch), .memory_read_in(ex_memory_read),
322        .memory_write_in(ex_memory_write), .memory_to_register_in(ex_memory_to_register),
323        .reg_write_in(ex_reg_write), .pc_control_in(ex_pc_control),
324        .memory_write_source_in(ex_memory_write_source), .memory_read_source_in(ex_memory_read_source),
325
326        //*****
327        // outputs
328        //*****
329
330        .branch_target_out(mem_branch_address), .pc_out(mem_pc), .pc_plus_4_out(mem_pc_plus_4),
331        .alu_result_out(mem_alu_result), .reg_file_out_2_out(mem_register_file_output_2),
332        .register_destination_out(mem_destination_register),
333        .zero_flag_out(mem_zero_flag), .overflow_flag_out(mem_overflow_flag),
334
335        // control signals
336        .branch_out(mem_branch), .memory_read_out(mem_memory_read),
337        .memory_write_out(mem_memory_write), .memory_to_register_out(mem_memory_to_register),
338        .reg_write_out(mem_reg_write), .pc_control_out(mem_pc_control),
339        .memory_write_source_out(mem_memory_write_source), .memory_read_source_out(mem_memory_read_source)
340    );

```

## 4.19 Top Module (Cont.)

```
342     mux_2to1_32bits memory_read_source_mux (
343         .input_1(mem_alu_result), .input_2(mem_register_file_output_2),
344         .select(mem_memory_read_source),
345         .mux_out(mem_memory_read_address)
346     );
347
348     mux_2to1_32bits memory_data_write_source_mux (
349         .input_1(mem_register_file_output_2), .input_2(mem_pc_plus_4),
350         .select(mem_memory_write_source),
351         .mux_out(mem_memory_write_data)
352     );
353
354     data_memory data_memory_module (
355         .memory_read(mem_memory_read), .memory_write(mem_memory_write), .clk(clk),
356         .read_address(mem_memory_read_address), .write_address(mem_alu_result),
357         .write_data(mem_memory_write_data),
358         .output_data(mem_memory_out_data)
359     );
```

## 4.19 Top Module (Cont.)

```
361     mem_wb_stage mem_wb_stage_register (
362         //*****
363         //inputs
364         //*****
365
366         .clk(clk), .rst(rst),
367
368         .pc_in(mem_pc),
369
370         // data signals
371         .memory_data_in(mem_memory_out_data), .alu_result_in(mem_alu_result),
372         .register_destination_in(mem_destination_register),
373
374         // control signals
375         .memory_to_register_in(mem_memory_to_register), .reg_write_in(mem_reg_write),
376         .overflow_flag_in(mem_overflow_flag),
377
378         //*****
379         //outputs
380         //*****
381
382         .pc_out(wb_pc),
383
384         // data signals
385         .memory_data_out(wb_memory_data), .alu_result_out(wb_alu_result),
386         .register_destination_out(wb_register_destination),
387
388         // control signals
389         .memory_to_register_out(wb_memory_to_register), .reg_write_out(wb_reg_write),
390         .overflow_flag_out(wb_overflow_flag)
391     );
392
393     mux_2to1_32bits data_to_register_mux (
394         .input_1(wb_alu_result), .input_2(wb_memory_data),
395         .select(wb_memory_to_register),
396         .mux_out(wb_register_file_data_wrtie)
397     );

```

## 4.19 Top Module (Cont.)

```
399 // overflow monitoring block
400 always @(posedge clk) begin
401     if (wb_overflow_flag) begin
402         $display("-----");
403         $display("CRITICAL ERROR: Arithmetic Overflow Detected!");
404         $display("Processor Halted at Faulting PC = %h", wb_pc);
405         $display("-----");
406         $stop;
407     end
408 end
409
410 endmodule
```

## 5.0 SYNTHESIS AND IMPLEMENTATION

The RTL design was synthesized and implemented using Xilinx Vivado 2023. The target hardware was the **Xilinx Spartan-7 FPGA (xc7s50csga324-1)**. To accurately benchmark the total gate-level logic utilized by the architecture, a compiler directive (`* dont_touch = "true"`) was applied to all submodules. This forced the synthesis engine to map the Data Memory directly to logic slices (Flip-Flops) rather than dedicated Block RAM (BRAM).

### Implementation Results:

- **Slice LUTs (Look-Up Tables):** 11,333
- **Slice Registers (Flip-Flops):** 34,366
- **Worst Negative Slack (WNS):** +0.540 ns
- **Target Clock Constraint:** 10.000 ns (100 MHz)
- **Calculated Maximum Frequency (\$F\_{max}\$):** 105.7 MHz

The timing report confirms that all user-specified timing constraints were met with zero setup or hold violations.



Figure 6: Elaboration schematic (post-elaboration netlist view).



Figure 7: Elaboration schematic (Expanded View).



Figure 8: Synthesis schematic.



Figure 9: Synthesis schematic (Expanded).

| Name                                 | ^ 1 | Slice LUTs | Slice Registers | F7 Muxes | F8 Muxes | Bonded IOB | BUFGCTRL |
|--------------------------------------|-----|------------|-----------------|----------|----------|------------|----------|
| top_module                           |     | 11333      | 34366           | 4608     | 2176     | 2          | 1        |
| alu_module (alu)                     |     | 153        | 0               | 0        | 0        | 0          | 0        |
| data_memory_module (data_memory)     |     | 10046      | 32768           | 4352     | 2176     | 0          | 0        |
| ex_mem_stage_register (ex_mem_stage) |     | 90         | 180             | 0        | 0        | 0          | 0        |
| id_ex_stage_register (id_ex_stage)   |     | 97         | 194             | 0        | 0        | 0          | 0        |
| if_id_stage_register (if_id_stage)   |     | 49         | 96              | 0        | 0        | 0          | 0        |
| mem_wb_stage_register (mem_wb_stage) |     | 0          | 104             | 0        | 0        | 0          | 0        |
| register_file_module (register_file) |     | 607        | 992             | 256      | 0        | 0          | 0        |

Figure 10: Synthesis resource utilization (LUTs, FFs, BRAM, DSP slices).

# Design Timing Summary

| Setup                       | Hold     |                             | Pulse Width |                                         |
|-----------------------------|----------|-----------------------------|-------------|-----------------------------------------|
| Worst Negative Slack (WNS)  | 0.540 ns | Worst Hold Slack (WHS)      | 0.154 ns    | Worst Pulse Width Slack (WPWS)          |
| Total Negative Slack (TNS)  | 0.000 ns | Total Hold Slack (THS)      | 0.000 ns    | Total Pulse Width Negative Slack (TPWS) |
| Number of Failing Endpoints | 0        | Number of Failing Endpoints | 0           | Number of Failing Endpoints             |
| Total Number of Endpoints   | 68227    | Total Number of Endpoints   | 68227       | Total Number of Endpoints               |

All user specified timing constraints are met.

Figure 11: Synthesis timing report excerpt (critical path and Fmax)



Figure 12: Device view on the target FPGA.



Figure 13: Zoomed in Device View.

## 6.0 CONCLUSION

Phase 1 of the MIPS processor project was highly successful. The RTL architecture correctly implements a 5-stage pipeline with robust hardware-level hazard resolution. Synthesizing the design proved that the Verilog code is not just simulation-ready, but physically realizable, achieving a stable clock speed of over 100 MHz on a modern Spartan-7 architecture. With the RTL hardware fully implemented and benchmarked, the project is now prepared for Phase 2: rigorous functional verification utilizing a SystemVerilog UVM testbench.

## 7.0 References

- [1] Technical University of Cluj-Napoca, "Lab 10," [Online]. Available: <https://mihai.utcluj.ro/wp-content/uploads/ca/labs/Lab10.pdf>. (Accessed: Feb. 22, 2026).
- [2] Stack Overflow, "MIPS pipeline CPU architecture," [Online]. Available: <https://stackoverflow.com/questions/44331059/mips-pipeline-cpu-architecture>. (Accessed: Feb. 22, 2026).
- [3] D. A. Patterson and J. L. Hennessy, *Computer Organization and Design: The Hardware/Software Interface*, 5th ed. San Francisco, CA, USA: Morgan Kaufmann, 2013.
- [4] Xilinx, Inc., *Vivado Design Suite User Guide: Synthesis*, UG901, San Jose, CA, USA.
- [5] Xilinx, Inc., *Vivado Design Suite User Guide: Using Constraints*, UG903, San Jose, CA, USA.