



**University of Minho**  
School of Engineering

**Processor**  
GR0040  
**Group 4**

Master's in Industrial Electronics and Computers

André Martins — PG60192  
Mariana Martins — PG60211

Project Supervised by  
**Professor Adriano Tavares**

January 2026

# Contents

|          |                                                     |           |
|----------|-----------------------------------------------------|-----------|
| <b>1</b> | <b>Introduction</b>                                 | <b>6</b>  |
| 1.1      | Problem Statement . . . . .                         | 6         |
| 1.2      | Motivation . . . . .                                | 6         |
| 1.3      | Requirements . . . . .                              | 7         |
| 1.3.1    | Functional Requirements . . . . .                   | 7         |
| 1.3.2    | Non-Functional Requirements . . . . .               | 7         |
| 1.4      | Constraints . . . . .                               | 7         |
| 1.4.1    | Technical Constraints . . . . .                     | 8         |
| 1.4.2    | Non-Technical Constraints . . . . .                 | 8         |
| 1.5      | Gantt Diagram . . . . .                             | 8         |
| 1.6      | Theoretical Foundation . . . . .                    | 8         |
| 1.6.1    | Processor Design . . . . .                          | 8         |
| 1.6.2    | Hardware Description Languages (HDL) . . . . .      | 9         |
| 1.6.3    | Processor Datapath Theory . . . . .                 | 9         |
| 1.6.4    | Control Unit Theory . . . . .                       | 10        |
| 1.6.5    | Target Platform: Zybo Z7-10 . . . . .               | 10        |
| 1.6.6    | RISC Architecture Principles . . . . .              | 10        |
| 1.6.7    | Design and Verification Flow Using Vivado . . . . . | 10        |
| <b>2</b> | <b>Analysis</b>                                     | <b>13</b> |
| 2.1      | Instruction Set Architecture . . . . .              | 13        |
| 2.2      | Memory Organization . . . . .                       | 14        |
| 2.3      | Core Processor Module (GR0040) . . . . .            | 14        |
| 2.4      | Interrupt Controller Module (GR0041) . . . . .      | 16        |
| 2.5      | Complete System-on-Chip with Peripherals . . . . .  | 18        |
| 2.5.1    | Block RAM Implementation . . . . .                  | 18        |
| 2.5.2    | On-Chip Peripheral Bus Architecture . . . . .       | 18        |
| 2.5.3    | Integrated Peripherals . . . . .                    | 19        |
| <b>3</b> | <b>Design- Refactoring</b>                          | <b>20</b> |
| 3.1      | Assembler . . . . .                                 | 20        |
| 3.2      | Signal Naming Standardization . . . . .             | 21        |
| 3.3      | Instruction Set Architecture . . . . .              | 21        |
| 3.4      | Register File . . . . .                             | 23        |
| 3.5      | Memory Architecture Refinement . . . . .            | 24        |
| 3.6      | Datapath . . . . .                                  | 26        |
| 3.6.1    | Datapath Overview . . . . .                         | 26        |
| 3.6.2    | Internal Modules . . . . .                          | 27        |
| 3.6.3    | Data Flow Through the Datapath . . . . .            | 29        |
| 3.7      | Control Unit . . . . .                              | 30        |
| 3.7.1    | Control Signals . . . . .                           | 31        |
| 3.7.2    | Status Signals . . . . .                            | 32        |
| 3.7.3    | External Interface . . . . .                        | 32        |

|                     |                                                                   |           |
|---------------------|-------------------------------------------------------------------|-----------|
| 3.7.4               | Control Unit Operation . . . . .                                  | 33        |
| 3.8                 | Peripherals . . . . .                                             | 33        |
| 3.8.1               | Timer . . . . .                                                   | 33        |
| 3.8.2               | UART Peripheral . . . . .                                         | 34        |
| 3.8.3               | I <sup>2</sup> C Master Peripheral . . . . .                      | 40        |
| 3.9                 | Interrupt Controller Architecture . . . . .                       | 42        |
| 3.10                | Interrupt Vector Table (IVT) . . . . .                            | 43        |
| 3.11                | Implementation and Priority Logic . . . . .                       | 43        |
| 3.11.1              | Priority Encoding . . . . .                                       | 44        |
| 3.12                | Latch Avoidance . . . . .                                         | 46        |
| 3.12.1              | Understanding Latch Formation . . . . .                           | 46        |
| 3.13                | Removal of Internal Tri-State . . . . .                           | 48        |
| <b>4</b>            | <b>Implementation</b>                                             | <b>50</b> |
| 4.1                 | ISA refactored . . . . .                                          | 50        |
| 4.1.1               | Shift operations . . . . .                                        | 50        |
| 4.1.2               | RETI instruction . . . . .                                        | 51        |
| 4.2                 | Register file . . . . .                                           | 51        |
| 4.3                 | Memory . . . . .                                                  | 52        |
| 4.4                 | Datapath and Control Unit . . . . .                               | 52        |
| 4.4.1               | Datapath implementation and module instantiation . . . . .        | 53        |
| 4.5                 | Timer and Parallel I/O . . . . .                                  | 57        |
| 4.6                 | UART . . . . .                                                    | 59        |
| 4.7                 | I2C . . . . .                                                     | 60        |
| 4.8                 | Interrupt Controller . . . . .                                    | 63        |
| 4.9                 | Latch Avoidance . . . . .                                         | 67        |
| 4.9.1               | Original Code . . . . .                                           | 67        |
| 4.9.2               | Refactored Code . . . . .                                         | 67        |
| 4.10                | Removal of Internal Tri-State . . . . .                           | 67        |
| 4.10.1              | Original Code . . . . .                                           | 67        |
| 4.10.2              | Refactored Code and analysis . . . . .                            | 68        |
| 4.11                | System Validation and Testing . . . . .                           | 68        |
| 4.11.1              | Test Program . . . . .                                            | 68        |
| 4.11.2              | Testbench Design . . . . .                                        | 71        |
| 4.11.3              | Behavioral Simulation . . . . .                                   | 73        |
| 4.11.4              | Post-Synthesis Timing Simulation and Synthesized Design . . . . . | 74        |
| 4.11.5              | Post-Implementation Timing Simulation . . . . .                   | 75        |
| 4.11.6              | Hardware Validation . . . . .                                     | 81        |
| <b>Bibliography</b> |                                                                   | <b>85</b> |

# List of Figures

|      |                                                                                                                                                                                                                                                                                                                                                                                               |    |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1  | Gantt Diagram . . . . .                                                                                                                                                                                                                                                                                                                                                                       | 8  |
| 2.1  | GR0040 instruction formats: rr (register-register), ri (register-immediate), rri (register-register-immediate), i12 (12-bit immediate), and br (branch with condition and displacement) . . . . .                                                                                                                                                                                             | 13 |
| 2.2  | Complete GR0040 instruction set showing opcode assignments, formats, assembly syntax, and semantics. Notable features include interlocked instructions (imm, adc*, sbc*, *cmp*) and efficient immediate constant handling                                                                                                                                                                     | 14 |
| 2.3  | Memory address space distribution: 1KB on-chip BRAM (0x0000-0x03FF), unused address space (0x0400-0x7FFF), and memory-mapped peripherals (0x8000-0xFFFF) . . . . .                                                                                                                                                                                                                            | 14 |
| 2.4  | GR0040 core interface signals: instruction port (insn_ce, i_ad, insn, hit), data port (d_ad, rdy, sw, sb, do, lw, lb, data), and interrupt support (int_en) . . . . .                                                                                                                                                                                                                         | 15 |
| 2.5  | GR0041 interrupt wrapper interface, adding int_req input and zero_insn output to force instruction RAM output during interrupt insertion . . . . .                                                                                                                                                                                                                                            | 16 |
| 2.6  | Hierarchical relationship between GR0040 core and GR0041 interrupt wrapper, showing how the PIC (Programmable Interrupt Controller) mediates interrupt requests . . . . .                                                                                                                                                                                                                     | 17 |
| 2.7  | Interrupt handling mechanism: (1) interrupt detection and instruction insertion flow, showing how zero_insn forces BRAM output to 0x0000, which is then modified to 0x0002 (JAL r0,2(r0)) to call the interrupt handler; (2) interrupt service routine structure with prologue, handler code, and epilogue that preserves the interrupted instruction using r0 as temporary storage . . . . . | 17 |
| 2.8  | Complete system-on-chip architecture showing GR0041 processor with interrupt controller, dual $512 \times 8$ block RAMs (ramh, raml) for 1KB program/-data storage, ctrl_enc/ctrl_dec bus interface modules, and two peripherals (PARIO and TIMER/COUNTER) connected via the abstract control bus and tri-state data bus . . . . .                                                            | 19 |
| 3.1  | make result . . . . .                                                                                                                                                                                                                                                                                                                                                                         | 20 |
| 3.2  | Comparison between Assembly Instructions and their Hexadecimal Equivalent . . . . .                                                                                                                                                                                                                                                                                                           | 20 |
| 3.3  | Shift Operation Refactor . . . . .                                                                                                                                                                                                                                                                                                                                                            | 22 |
| 3.4  | L0gic behing shift operation refactoring . . . . .                                                                                                                                                                                                                                                                                                                                            | 22 |
| 3.5  | RETI instruction . . . . .                                                                                                                                                                                                                                                                                                                                                                    | 23 |
| 3.6  | Inputs and Outputs of Register file . . . . .                                                                                                                                                                                                                                                                                                                                                 | 24 |
| 3.7  | Refactored memory architecture with separate ROM and RA . . . . .                                                                                                                                                                                                                                                                                                                             | 25 |
| 3.8  | Specification of on-chip RAM and ROM . . . . .                                                                                                                                                                                                                                                                                                                                                | 25 |
| 3.9  | Complete datapath architecture showing all functional units and data flow paths . . . . .                                                                                                                                                                                                                                                                                                     | 26 |
| 3.10 | Control Unit architecture showing input instruction decoding and output control signals . . . . .                                                                                                                                                                                                                                                                                             | 30 |
| 3.11 | Refactored timer control register CR#0 at address 0x1000 . . . . .                                                                                                                                                                                                                                                                                                                            | 33 |
| 3.12 | Interrupt request register CR#1 at address 0x1002 . . . . .                                                                                                                                                                                                                                                                                                                                   | 34 |
| 3.13 | Counter initialization register CR#2 at address 0x1004 . . . . .                                                                                                                                                                                                                                                                                                                              | 34 |

|      |                                                                                                                                                                                                                                                                                                                                                |    |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.14 | UART Frame . . . . .                                                                                                                                                                                                                                                                                                                           | 35 |
| 3.15 | UART TX Finite State machine . . . . .                                                                                                                                                                                                                                                                                                         | 38 |
| 3.16 | UART RX Finite State machine . . . . .                                                                                                                                                                                                                                                                                                         | 39 |
| 3.17 | Interrupt controller flowchart . . . . .                                                                                                                                                                                                                                                                                                       | 44 |
| 3.18 | Priority Encoding flowchart . . . . .                                                                                                                                                                                                                                                                                                          | 45 |
| 3.19 | ISRs flowchart for each interrupt . . . . .                                                                                                                                                                                                                                                                                                    | 46 |
| 3.20 | Glitch formation in combinational logic due to propagation delay differences.<br>The output Y experiences an unwanted transition ( $1 \rightarrow 0 \rightarrow 1$ ) when input B changes, demonstrating the hazard that can occur in incompletely specified logic. . . . .                                                                    | 47 |
| 3.21 | Karnaugh map showing latch inference when output depends on its previous value. The expression $Q^+(C, D, Q) = \overline{C} \cdot Q + C \cdot D$ contains the current state Q, requiring a storage element (latch) for implementation. The overlapping regions (e1 and e2) show how incomplete specification leads to state retention. . . . . | 47 |
| 4.1  | Example of a Shift Left Logical (SLL) instruction: the value 0003 is shifted left, resulting in 0006. . . . .                                                                                                                                                                                                                                  | 50 |
| 4.2  | RETI instruction working . . . . .                                                                                                                                                                                                                                                                                                             | 51 |
| 4.3  | Reconfiguration of initial counter value . . . . .                                                                                                                                                                                                                                                                                             | 58 |
| 4.4  | Test case using timer and parallel i/o . . . . .                                                                                                                                                                                                                                                                                               | 58 |
| 4.5  | UART echo test case . . . . .                                                                                                                                                                                                                                                                                                                  | 59 |
| 4.6  | I2C Communication Waveform showing successful data transmission . . . . .                                                                                                                                                                                                                                                                      | 61 |
| 4.7  | Priority Encoding test case . . . . .                                                                                                                                                                                                                                                                                                          | 66 |
| 4.8  | Behavioural Simulation of final test case . . . . .                                                                                                                                                                                                                                                                                            | 73 |
| 4.9  | Post-Synthesis Timing Simulation . . . . .                                                                                                                                                                                                                                                                                                     | 75 |
| 4.10 | Synthesized Design . . . . .                                                                                                                                                                                                                                                                                                                   | 75 |
| 4.11 | Pin assignment constraints showing the mapping of design signals to FPGA package pins. Input signals (i_clk, i_rst, i_rx_line, i_par_i) and output signals (o_tx_line, o_par_o) are assigned to specific pins with LVCMOS33 I/O standard at 3.3V supply voltage. . . . .                                                                       | 76 |
| 4.12 | Utilization Report . . . . .                                                                                                                                                                                                                                                                                                                   | 77 |
| 4.13 | Timing summary showing setup, hold, and pulse width analysis results. All timing constraints are met with positive slack values. . . . .                                                                                                                                                                                                       | 78 |
| 4.14 | Detailed timing report showing the critical path breakdown, including source and destination registers, path delays, and contributing delay components. .                                                                                                                                                                                      | 79 |
| 4.15 | Pin constraint verification showing no unconstrained output pins or timing violations. . . . .                                                                                                                                                                                                                                                 | 80 |
| 4.16 | Post-Implementation Timing Simulation . . . . .                                                                                                                                                                                                                                                                                                | 81 |
| 4.17 | FPGA before configuration . . . . .                                                                                                                                                                                                                                                                                                            | 82 |
| 4.18 | Program loaded in the FPGA and main finished: display turned on and no interrupts happened . . . . .                                                                                                                                                                                                                                           | 82 |
| 4.19 | Button 1 pressed, timer started running and led 0 toggled . . . . .                                                                                                                                                                                                                                                                            | 83 |
| 4.20 | Button 2 pressed, timer stop running and led 0 stayed in the same status; Character was received and echoed so led 1 toggled . . . . .                                                                                                                                                                                                         | 83 |
| 4.21 | Echoed character from FPGA . . . . .                                                                                                                                                                                                                                                                                                           | 83 |

# List of Tables

|      |                                                               |    |
|------|---------------------------------------------------------------|----|
| 3.1  | Signals of the <code>regfile16x16</code> module . . . . .     | 23 |
| 3.2  | UART Register Map . . . . .                                   | 35 |
| 3.3  | UART SBUF_RX Register Specification . . . . .                 | 36 |
| 3.4  | UART SBUF_TX Register Specification . . . . .                 | 36 |
| 3.5  | UART BAUDRATE_DIV Register Specification . . . . .            | 36 |
| 3.6  | UART SCON Register Specification . . . . .                    | 37 |
| 3.7  | I <sup>2</sup> C Register Map . . . . .                       | 40 |
| 3.8  | I <sup>2</sup> C I2C_DIVIDER Register Specification . . . . . | 40 |
| 3.9  | I <sup>2</sup> C I2C_DATA Register Specification . . . . .    | 41 |
| 3.10 | I <sup>2</sup> C I2C_CONTROL Register Specification . . . . . | 41 |
| 3.11 | GR0040 Interrupt Vector Table . . . . .                       | 43 |
| 4.1  | Preloaded registers and their usage . . . . .                 | 71 |

# Chapter 1

## Introduction

This report is based on the GR0040 System-on-Chip (SoC) implementation provided by the course instructor, which was developed by Jan Gray, from the Gray Research LLC. Serving as a functional reference design for educational purposes and although operating correctly, the existing codebase suffers from significant structural deficiencies. In particular, the design lacks proper separation of concerns, exhibits tight coupling between functional blocks, and provides limited modularity. These issues substantially hinder code readability, verification, debugging, and future extensibility.

### 1.1 Problem Statement

This project aims to understand the initial GR0040 SoC implementation and to refactor it into a well-structured and modular SoC design while preserving full functional equivalence. The refactoring effort must maintain strict compliance with the original GR0040 architecture and instruction set, ensuring that all existing behaviors remain unchanged. Additionally, any new improvement should be added, as well as new peripherals, such as UART.

This project focuses on analyzing the original implementation, identifying architectural and structural weaknesses, and redesigning the system using a clean, modular architecture. The result should be a robust and maintainable SoC design that supports easier verification, extension, and long-term maintenance, while serving as a solid foundation for future enhancements.

### 1.2 Motivation

The motivation for this project stems from both educational and practical considerations. From an educational perspective, the original GR0040 implementation provides limited insight into sound hardware design practices due to its poor modular structure and tightly integrated components. Refactoring the system offers an opportunity to apply key digital design principles, such as modularity, abstraction, and separation of datapath and control logic, in a realistic and non-trivial hardware system.

From a practical standpoint, a well-structured SoC design is essential for scalability and verification. As systems grow in complexity, poorly organized designs become increasingly difficult to debug, extend, and validate. By redesigning the GR0040 SoC with clean interfaces and clearly defined modules, this project aims to demonstrate how thoughtful architectural decisions can significantly improve development efficiency and system reliability.

Additionally, the integration of new peripherals, enhanced interrupt handling, and deployment on a real FPGA platform further motivates this work by bridging the gap between theoretical design and real-world hardware implementation. The final refactored system not only validates the correctness of the new architecture but also provides a reusable and extensible platform for future academic or research-oriented developments.

## 1.3 Requirements

The project requirements are divided into functional and non-functional requirements, addressing both system behavior and quality attributes of the final design.

### 1.3.1 Functional Requirements

Functional requirements define the expected behavior and capabilities of the system:

- Analyze and verify the functionality of the provided GR0040 and GR0041 processor core implementations.
- Refactor the existing SoC architecture while preserving full functional equivalence with the original design.
- Integrate the existing Timer and Parallel I/O peripherals with enhanced and standardized interrupt support.
- Design and implement a new Universal Asynchronous Receiver-Transmitter (UART) peripheral compatible with the system bus architecture.
- Design and implement an interrupt controller capable of managing interrupts from all peripherals in a scalable and efficient manner.
- Develop and deploy a demonstration program on the Zybo Z7-10 FPGA that validates correct processor operation, peripheral interaction, and interrupt handling.

### 1.3.2 Non-Functional Requirements

Non-functional requirements specify global quality attributes and design constraints:

- Enforce a clear separation of concerns through function-based and module-based design.
- Refactor the GR0040 processor core to include a strict separation between the datapath and control unit.
- Organize the project in a modular fashion, with each major component implemented in a dedicated Verilog source file.
- Ensure that post-synthesis and post-implementation simulation results match behavioral simulation outcomes.
- Maintain comprehensive documentation describing architectural decisions, module interfaces, and system integration.
- Adhere to Verilog HDL coding best practices to ensure readability, maintainability, and reusability.

## 1.4 Constraints

Project constraints define the technical, academic, and logistical limitations under which the system must be developed.

### 1.4.1 Technical Constraints

- Xilinx Vivado Design Suite must be used for synthesis, implementation, and simulation.
- The target hardware platform is the Zybo Z7-10 FPGA development board.
- All system components must be implemented exclusively using Verilog HDL.
- The design must remain fully compliant with the GR0040 instruction set architecture as specified in the reference documentation.
- On-chip block RAM resources must be used for program and data memory, in accordance with the original design specifications.

### 1.4.2 Non-Technical Constraints

- The project must be completed collaboratively by a team of two members.
- All project deliverables must be completed and submitted by the end of the academic semester.

## 1.5 Gantt Diagram

Figure 1.1 below illustrates the expected project timeline along with the key milestones to be achieved.



Figure 1.1: Gantt Diagram

## 1.6 Theoretical Foundation

This section presents the theoretical concepts and design principles that underpin the refactoring and enhancement of the GR0040 System-on-Chip (SoC). The discussion covers processor design fundamentals, hardware description languages, datapath and control unit theory and platform-specific considerations. These concepts form the basis for architectural decisions, implementation strategies, and verification methodologies adopted throughout the project.

### 1.6.1 Processor Design

A processor is a digital system responsible for executing instructions stored in memory and manipulating data according to a defined instruction set architecture (ISA). At a high level, a processor consists of a datapath, which performs arithmetic and data movement operations,

and a control unit, which orchestrates these operations by generating appropriate control signals.

Modern processor design emphasizes modularity, scalability, and clarity of functionality. In this project, the GR0040 processor core follows a reduced instruction set computing (RISC) paradigm, characterized by simple instructions, uniform instruction formats, and a load/store architecture. These characteristics simplify both the datapath and control logic, making the architecture suitable for educational purposes and hardware refactoring.

The refactoring process relies heavily on classical processor design methodologies, where each functional block (register file, arithmetic logic unit, program counter, memory interface, and control logic) is clearly defined and isolated. This separation enables easier debugging, verification, and future extension of the processor.

### 1.6.2 Hardware Description Languages (HDL)

Hardware Description Languages are formal languages used to model, simulate, and synthesize digital systems. Unlike software programming languages, HDLs describe hardware behavior and structure concurrently, allowing designers to model parallelism inherent in digital circuits.

Verilog HDL is used throughout this project due to its widespread adoption in industry and full support within the Xilinx Vivado Design Suite. Verilog supports multiple modeling styles:

- **Behavioral modeling**, used for describing functionality without explicitly defining hardware structure.
- **Register-transfer level (RTL) modeling**, which describes how data moves between registers on clock edges and is the primary modeling style used in this project.
- **Structural modeling**, which explicitly instantiates and connects lower-level modules.

Best practices in Verilog coding—such as the use of synchronous logic, non-blocking assignments for sequential circuits, and clearly defined module interfaces—are critical for ensuring synthesizability, timing correctness, and maintainability. These practices guide the refactoring of the original GR0040 implementation.

### 1.6.3 Processor Datapath Theory

The datapath is the portion of the processor responsible for performing computations and data transfers. It consists of interconnected components such as registers, multiplexers, arithmetic logic units (ALUs), shifters, and memory interfaces.

In a typical RISC datapath, instructions are executed through a sequence of stages, including instruction fetch, decode, execute, memory access, and write-back. Even in non-pipelined implementations, the logical separation of these stages remains conceptually important.

Key datapath components include:

- **Program Counter (PC)**, which holds the address of the current instruction.
- **Instruction Register**, which stores the fetched instruction.
- **Register File**, providing fast access to operands and storage for results.
- **Arithmetic Logic Unit (ALU)**, responsible for arithmetic and logical operations.
- **Multiplexers**, used to select between multiple data sources.

Separating the datapath from the control logic enables a cleaner architecture and simplifies both design verification and future enhancements. This principle is a central requirement of the project.

#### 1.6.4 Control Unit Theory

The control unit governs the operation of the datapath by generating control signals based on the current instruction and processor state. These signals determine actions such as register writes, ALU operation selection, memory access, and program counter updates.

Control units are typically implemented using either:

- **Hardwired control**, where combinational logic directly generates control signals.
- **Microprogrammed control**, where control signals are stored in a control memory.

The GR0040 processor employs a hardwired control unit, which is suitable for simpler RISC architectures. During refactoring, the control logic is explicitly separated from the datapath, improving readability and ensuring that control signal generation can be independently verified.

The control unit also plays a critical role in interrupt handling by coordinating context switching and peripheral interaction.

#### 1.6.5 Target Platform: Zynq Z7-10

The Zynq Z7-10 is an FPGA development board based on the Xilinx Zynq-7000 series, integrating programmable logic with an ARM processing system. In this project, only the programmable logic (PL) is utilized.

The board provides on-chip block RAM, clock management resources, and external interfaces suitable for SoC development. Its compatibility with Vivado enables accurate synthesis, timing analysis, and hardware validation of the refactored design.

#### 1.6.6 RISC Architecture Principles

Reduced Instruction Set Computing (RISC) architectures are designed to simplify processor implementation and improve execution efficiency. Core RISC principles include:

- Simple and fixed-length instructions.
- Load/store architecture, where memory is accessed only through dedicated instructions.
- A large set of general-purpose registers.
- Simplified control logic.

These principles guide the design of the GR0040 processor and strongly influence the refactoring strategy adopted in this project. By adhering to RISC fundamentals, the processor remains efficient, predictable, and easy to extend.

#### 1.6.7 Design and Verification Flow Using Vivado

The Xilinx Vivado Design Suite provides an integrated environment for RTL design, simulation, synthesis, implementation, and bitstream generation. The standard design flow followed in this project includes:

- RTL design and refactoring using Verilog HDL.
- Behavioral simulation to validate functional correctness.

- Synthesis to map RTL to FPGA primitives.
- Implementation (place and route).
- Static timing analysis and hardware validation on the Zybo Z7-10 board.

This flow ensures that the refactored SoC behaves consistently across simulation and hardware execution.

### From Verilog Code to Hardware

Although Verilog HDL resembles a programming language, it does not describe a sequence of executed instructions. Instead, Verilog specifies the structure and behavior of digital hardware. Each construct in the code corresponds to a physical hardware element or interconnection once synthesized.

Combinational logic is typically described using continuous assignments or `always @(*)` blocks. These constructs are interpreted by synthesis tools as Boolean logic, which is later mapped to lookup tables (LUTs) in the FPGA fabric. For example, arithmetic operations, multiplexers, and logic comparisons are implemented as networks of LUTs.

Sequential logic is described using clocked `always` blocks, commonly of the form `always @(posedge clk)`. When such a block assigns values to registers using non-blocking assignments, the synthesis tool infers flip-flops. Each variable assigned in this context corresponds to a physical storage element triggered by the specified clock edge.

As a result, the structure of the Verilog code directly determines the resulting hardware. Poorly structured code can lead to unintended latches, excessive logic, or timing issues, whereas well-structured RTL code results in predictable and efficient hardware implementations.

### Inference of Flip-Flops and Registers

Flip-flops are the fundamental storage elements in synchronous digital systems. In Verilog, flip-flops are inferred rather than explicitly instantiated. A flip-flop is inferred when the following conditions are met:

- The signal is assigned within a clocked `always` block.
- The block is sensitive to a clock edge (e.g., `posedge clk`).
- The signal is assigned in all possible execution paths of the block.

For example, a register in the processor datapath—such as the program counter or a general-purpose register—is inferred as a bank of flip-flops, with each bit mapped to a physical flip-flop on the FPGA. Reset logic, if present, is implemented using either synchronous or asynchronous reset ports of the flip-flops, depending on the coding style.

This inference mechanism allows designers to describe high-level behavior while relying on synthesis tools to generate the appropriate low-level hardware structures.

### Synthesis Process

Synthesis is the process of translating Verilog RTL code into a gate-level representation tailored to the target FPGA technology. During synthesis, the Vivado tool performs several key steps:

- **RTL elaboration:** The Verilog code is parsed, and all module hierarchies, parameters, and signal connections are resolved.
- **Logic inference:** Combinational logic, flip-flops, multiplexers, memories, and finite state machines are identified based on coding patterns.

- **Technology mapping:** The inferred logic is mapped to FPGA-specific primitives such as LUTs, flip-flops, block RAMs, and DSP slices.
- **Optimization:** Redundant logic is removed, Boolean expressions are simplified, and logic is reorganized to improve area and timing.

The output of synthesis is a technology-specific netlist that describes how the design will be implemented using the resources available on the Zybo Z7-10 FPGA.

### **Implementation: Place and Route**

Following synthesis, the design undergoes implementation, which consists primarily of placement and routing:

- **Placement:** Each synthesized logic element (LUTs, flip-flops, block RAMs) is assigned to a specific physical location on the FPGA fabric.
- **Routing:** Electrical connections between placed elements are established using the FPGA's programmable interconnect network.

During this phase, Vivado considers timing constraints, resource availability, and routing congestion. The tool attempts to minimize critical path delays while ensuring all connections can be successfully routed.

Clock networks receive special treatment to minimize skew and jitter. Dedicated clock routing resources are used to distribute the clock signal uniformly across the design.

### **Bitstream Generation and Hardware Execution**

Once timing closure is achieved, Vivado generates a configuration bitstream. This bitstream programs the FPGA by defining the state of its programmable logic and routing resources.

After configuration, the FPGA behaves as a custom hardware implementation of the Verilog design. The processor core, peripherals, memories, and interconnect operate concurrently, executing instructions and responding to events in real time.

This transformation—from high-level Verilog code to physical hardware—highlights the importance of correct RTL design, as errors in the code directly translate into faulty hardware behavior.

# Chapter 2

## Analysis

This chapter presents the original GR0040 RISC processor architecture designed by Jan Gray, which serves as the foundation for our implementation. The architecture is specifically optimized for FPGA implementation, making efficient use of Xilinx Virtex-II resources such as block RAMs, LUTs, and tri-state buffers.

The GR0040 is a 16-bit non-pipelined RISC processor featuring a compact instruction set, Harvard architecture with separate instruction and data paths, and a unique interrupt handling mechanism implemented outside the core. The complete design details, including synthesizable Verilog code and implementation rationale, can be found in the original paper referenced in the bibliography.

The following sections present the key architectural components through a series of progressive design stages, from the basic instruction set to the complete system-on-chip with peripherals.

### 2.1 Instruction Set Architecture

The GR0040 instruction set is organized into five formats (rr, ri, rri, i12, and br), supporting 22 operations plus 16 branch variants. This design minimizes instruction decode complexity while maintaining sufficient expressiveness for integer C code execution.



Figure 2.1: GR0040 instruction formats: rr (register-register), ri (register-immediate), rri (register-register-immediate), i12 (12-bit immediate), and br (branch with condition and displacement)



Figure 2.2: Complete GR0040 instruction set showing opcode assignments, formats, assembly syntax, and semantics. Notable features include interlocked instructions (imm, adc\*, sbc\*, \*cmp\*) and efficient immediate constant handling

## 2.2 Memory Organization

The system employs a Harvard architecture with 1KB of on-chip block RAM shared between program and data storage. The memory map reserves the lower 1KB for on-chip RAM and dedicates addresses 0x8000-0xFFFF to memory-mapped peripheral I/O.



Figure 2.3: Memory address space distribution: 1KB on-chip BRAM (0x0000-0x03FF), unused address space (0x0400-0x7FFF), and memory-mapped peripherals (0x8000-0xFFFF)

## 2.3 Core Processor Module (GR0040)

The GR0040 core implements the basic processor functionality with separate instruction fetch and load/store data ports. The core assumes instruction availability from zero-latency block RAM, enabling single-cycle instruction execution for most operations.



Figure 2.4: GR0040 core interface signals: instruction port (insn\_ce, i.ad, insn, hit), data port (d.ad, rdy, sw, sb, do, lw, lb, data), and interrupt support (int\_en)

The GR0040 core is composed of several key functional modules that work together to execute instructions:

- **Instruction Decoder:** Splits the current instruction into constituent fields (opcode, register addresses, function codes, immediates) and generates control signals for the datapath.
- **Register File:** A  $16 \times 16$ -bit dual-port register file implemented using distributed RAM (LUTs configured as RAM). It provides asynchronous read access to two registers (rd and rs) simultaneously and synchronous write-back of results.
- **Immediate Operand Formation:** Handles sign/zero extension of 4-bit immediate fields and supports the `imm` prefix instruction for constructing 16-bit immediates by concatenating a 12-bit prefix with the 4-bit immediate field.
- **Operand Selection:** Multiplexers that route either register values or immediate constants to the ALU inputs, with special handling for ri-format instructions.
- **Arithmetic Logic Unit (ALU):** Comprises three parallel functional units:
  - *Adder/Subtractor:* A 16-bit ripple-carry adder/subtractor with carry-in/carry-out support for multi-precision arithmetic (`adc`/`sbc` instructions)
  - *Logic Unit:* Performs AND and XOR operations
  - *Shift Right Unit:* Implements logical and arithmetic right shift by one bit position
- **Condition Code Logic:** Generates and maintains hidden condition flags (zero, negative, carry, overflow) for conditional branch evaluation. The overflow flag uses the clever identity:  $v = c[W] \oplus sum[N] \oplus a[N] \oplus b[N]$  to avoid needing both carry-out bits.
- **Result Multiplexer:** Selects the final result from among ALU outputs, PC (for JAL), or loaded data, using tri-state buffers on FPGA long-lines to implement an efficient multi-input mux without consuming LUTs.
- **Branch Evaluation:** Decodes the branch condition field and compares it against stored condition codes to determine if branches are taken.

- **Address/PC Unit:** Computes the next program counter value, handling sequential execution, branch targets ( $PC + 2 \times$ sign-extended displacement), and jump targets (effective address from adder).

This modular organization enables a compact implementation consuming only about 200 logic cells in a Spartan-II FPGA, while achieving clock speeds of 50 MHz with careful optimization.

## 2.4 Interrupt Controller Module (GR0041)

The GR0041 module wraps the GR0040 core with external interrupt handling logic. This modular approach inserts a “call intr” instruction ( $JAL r0,2(r0)$ ) into the instruction stream upon interrupt request, leveraging the block RAM reset functionality to force instruction 0x0000 and then OR it with 0x0002.



Figure 2.5: GR0041 interrupt wrapper interface, adding int\_req input and zero\_insn output to force instruction RAM output during interrupt insertion

The interface between GR0040 and GR0041 consists of two critical signals that enable interrupt handling without modifying the core processor:

- **int\_en** (output from GR0040): Indicates when the processor is in an interruptible state. This signal is deasserted during interlocked instruction sequences (imm, adc\*/sbc\*, \*cmp\*) to prevent interrupts from breaking multi-instruction atomic operations.
- **int\_req** (input to GR0041): The interrupt request signal from peripherals. GR0041 monitors this along with int\_en to determine when to insert the interrupt instruction.
- **zero\_insn** (output from GR0041): When asserted, forces the block RAM to output 0x0000, which is then modified to 0x0002 (the  $JAL r0,2(r0)$  instruction) before being presented to the GR0040 core.



Figure 2.6: Hierarchical relationship between GR0040 core and GR0041 interrupt wrapper, showing how the PIC (Programmable Interrupt Controller) mediates interrupt requests

The interrupt mechanism operates through rising-edge detection on int\_req, maintaining an internal int\_pend flag until the interrupt can be serviced. Figure 2.7 illustrates the complete interrupt flow, showing how the interrupt instruction is inserted and how the interrupt service routine (ISR) preserves the return address temporarily in r0 before restoring it upon return.



Figure 2.7: Interrupt handling mechanism: (1) interrupt detection and instruction insertion flow, showing how zero\_insn forces BRAM output to 0x0000, which is then modified to 0x0002 (JAL r0,2(r0)) to call the interrupt handler; (2) interrupt service routine structure with prologue, handler code, and epilogue that preserves the interrupted instruction using r0 as temporary storage

This clever design exploits the fact that JAL has opcode 0, so forcing the block RAM output to 0x0000 and ORing with 0x0002 produces exactly the needed interrupt call instruction, at minimal hardware cost.

## 2.5 Complete System-on-Chip with Peripherals

The final SoC integrates the GR0041 processor core with dual-ported block RAM (configured as separate high and low byte RAMs for byte-addressable access), an abstract control bus architecture (`ctrl`, `sel`, `per_rdy`), and memory-mapped peripherals including a timer/counter and 8-bit parallel I/O port.

### 2.5.1 Block RAM Implementation

The system uses two Xilinx RAMB4\_S8\_S8 block RAM primitives (`ramh` and `raml`) configured as dual  $512 \times 8$ -bit memories, providing a total of 1KB of storage. Each block RAM has two independent ports:

- **Port A:** Dedicated instruction fetch port (read-only), delivering 16-bit instructions by combining one byte from `ramh[15:8]` and one byte from `raml[7:0]`
- **Port B:** Data load/store port (read-write), supporting both word and byte accesses with appropriate byte-write-enable logic

This dual-port configuration enables simultaneous instruction fetch and data access without contention, essential for maintaining single-cycle throughput. For byte stores, the design routes the data byte to either `ramh` or `raml` based on the address LSB, since the architecture is big-endian.

### 2.5.2 On-Chip Peripheral Bus Architecture

The system employs an abstract control bus architecture designed for simplicity and extensibility. Rather than exposing individual control signals, the design encapsulates all peripheral control information into three buses:

- **`ctrl[31:0]`:** Abstract control bus carrying encoded clock, reset, I/O address, byte output enables (`oe[3:0]`), and byte write enables (`we[3:0]`)
- **`sel[7:0]`:** Peripheral select vector, with each bit selecting one of up to 8 peripherals based on address decoding
- **`per_rdy[7:0]`:** Ready signals from each peripheral, OR-reduced to form the global `io_rdy` signal
- **`data[15:0]`:** Shared tri-state data bus for both processor results and peripheral I/O

The `ctrl_enc` module encodes processor control signals and I/O addresses into the abstract `ctrl` bus, while each peripheral instantiates a `ctrl_dec` module to decode the signals it needs. This abstraction layer provides several benefits:

- Core users only need to connect four buses to add a peripheral
- The bus protocol can evolve without invalidating existing peripheral designs
- Address decoding is implicit through `sel[]` assignment
- Typical peripheral bus interface overhead is only 1-2 CLBs plus a column of tri-state buffers

### 2.5.3 Integrated Peripherals

As shown in Figure 2.8, the SoC includes two memory-mapped peripherals:

- **Timer/Counter:** A 16-bit configurable timer/counter peripheral assigned to sel[0] (addresses 0x8000-0x80FF). It operates in timer mode (counting clock cycles when enabled) or counter mode (counting rising edges on an external input). The timer includes a programmable initial count value, overflow detection, and interrupt generation capability. In the default configuration, it counts from 0xFFC0 to overflow, generating an interrupt every 64 clock cycles.
- **Parallel I/O Port (PARIO):** An 8-bit bidirectional parallel I/O port assigned to sel[1] (addresses 0x8100-0x81FF). It provides simple memory-mapped access to external parallel inputs and outputs, suitable for interfacing with LEDs, switches, or other slow peripherals. The peripheral asserts rd<sub>y</sub> immediately upon selection, requiring no wait states.

Both peripherals connect to the abstract control bus and tri-state data bus, demonstrating the ease of integrating additional functionality into the SoC framework.



Figure 2.8: Complete system-on-chip architecture showing GR0041 processor with interrupt controller, dual 512×8 block RAMs (ramh, raml) for 1KB program/data storage, ctrl\_enc/ctrl\_dec bus interface modules, and two peripherals (PARIO and TIMER/- COUNTER) connected via the abstract control bus and tri-state data bus

# Chapter 3

## Design- Refactoring

### 3.1 Assembler

In order to simplify instruction decoding from Assembly to its equivalent hexadecimal value, an Assembler was developed in C. For a given Assembly code with basic sections, such as `isr:`, `reset:` and `main:`, the Assembler decodes the instructions and their operands and transforms them into their hexadecimal equivalent (Figure 3.2). A `.asm` file is edited and the opcodes are generated in the correct and desired order, through a `makefile` (Figure 3.1), which simply calls the C compiler and the necessary files and verification flags.

Additionally, it separates the instructions for the High and Low memory `.coe` files used.

```
andre@macaco:~/Documents/Assembler$ make
gcc -Wall -o program main.c assembler.c
./program instructions.asm hexa.txt
Program Execution Terminated. Opening Output File
xdg-open hexa.txt
```

Figure 3.1: `make` result

| instructions.asm                    |                                     | hexa.txt    |
|-------------------------------------|-------------------------------------|-------------|
| <input type="button" value="Open"/> | <input type="button" value="Save"/> | ~/Downloads |
| 1 <code>isr:</code>                 |                                     | 1 0000      |
| 2 // Interrupt handler              |                                     | 2 1FFF      |
| 3 <code>JAL R0, R0, #0</code>       |                                     | 3 61F0      |
| 4 <code>ADDI R15, R15, #-1</code>   |                                     | 4 90FD      |
| 5 <code>SW R1, R15, #0</code>       |                                     | 5 FFFF      |
| 6 <code>BR #-3</code>               |                                     | 6 FFFF      |
| 7                                   |                                     | 7 FFFF      |
| 8 <code>reset:</code>               |                                     | 8 FFFF      |
| 9 // Reset vector at 0x0020         |                                     | 9 FFFF      |
| 10 <code>IMM #128</code>            |                                     | 10 FFFF     |
| 11 <code>ADDI R15, R0, #2</code>    |                                     | 11 FFFF     |
| 12                                  |                                     | 12 FFFF     |
| 13 <code>start:</code>              |                                     | 13 FFFF     |
| 14 // Main program                  |                                     | 14 FFFF     |
| 15 <code>ADDI R1, R0, #10</code>    |                                     | 15 FFFF     |
| 16 <code>ADDI R2, R0, #15</code>    |                                     | 16 FFFF     |
| 17 <code>ADD R3, R1</code>          |                                     | 17 8080     |
| 18 <code>CMP R1, R2</code>          |                                     | 18 1F02     |
| 19 <code>BEQ #5</code>              |                                     | 19 110A     |
|                                     |                                     | 20 120F     |
|                                     |                                     | 21 2310     |
|                                     |                                     | 22 2126     |
|                                     |                                     | 23 9205     |

(a) Assembly Instructions

(b) Hexadecimal Equivalent

Figure 3.2: Comparison between Assembly Instructions and their Hexadecimal Equivalent

## 3.2 Signal Naming Standardization

During the design phase of our FPGA-based processor, one of the first tasks was the refactoring of the code, focusing on signal naming standardization. Refactoring restructures the code to improve readability, maintainability, and scalability without changing its functionality. Standardized signal names ensure the design is consistent, intuitive, and easier to debug or extend.

### Initial Signal Naming

The initial code used short or non-descriptive names, which can be ambiguous in larger designs:

```
1 reg ['AN:0] pc;           // program counter
2 output ['AN:0] i_ad;      // next instruction address
3 output ['AN:0] d_ad;      // data address for load/store
4 output ['N:0] do;         // data to store in RAM
5 wire ['N:0] dreg, sreg;  //destination and source register
```

These names do not clearly indicate the signal's role or width. For example, `pc` could refer to the current or next program counter, and `do` is too generic for RAM write data.

### Refactored Signal Naming

After refactoring, the signals follow a consistent convention:

- **Ports:** Prefixed with `i_` for inputs and `o_` for outputs.
- **Internal signals:** Prefixed with `_` to indicate internal use within a module.
- **Descriptive names:** Each signal clearly reflects its function.
- **Parameter-based widths:** Widths defined using global macros (e.g., `ADDR_MSB`, `DATA_MSB`).

```
1 reg ['ADDR_MSB:0] _IR;           // current instruction register
2 wire ['ADDR_MSB:0] _PC;          // next instruction address
3 wire ['ADDR_MSB:0] _data_address; // data address for load/store
4 wire ['DATA_MSB:0] _data_to_store; // data to write to RAM
5 wire ['DATA_MSB:0] _destinationRegister;
6 wire ['DATA_MSB:0] _2ndOp;

7
8 input i_clk;                  // system clock
9 output ['DATA_MSB:0] o_insn;    // current instruction output
```

This refactoring also aligns with microcontroller naming conventions, such as those used in the 8051, particularly in UART modules in section ??.

These conventions improve code clarity, reduce potential errors, and facilitate integration with other modules. Using these conventions consistently across the processor ensures readability and simplifies verification and testing.

## 3.3 Instruction Set Architecture

The modifications to the original Instruction Set Architecture (ISA) were minimal and focused primarily on enhancing the RR (register-register) format instructions and adding new instructions. As shown in Figure ??, the original ISA specification defined five main instruction formats: `rr`, `ri`, `rri`, `il2`, and `br`, each with specific field allocations for opcode, register destinations, register sources, and immediate values.

The primary enhancements involved two key additions: expanding the functionality within the RR format to include additional shift register operations, and introducing the RETI (Return from Interrupt) instruction for interrupt handling support.

### Shift Register Operations

Figure 3.3 illustrates the extended instruction encodings for shift operations, which now support four distinct shift operations: Shift Right Logic (SRL), Shift Left Arithmetic (SLA), Shift Right Arithmetic (SRA), and Shift Left Logic (SLL). These operations maintain the same field structure as the original RR format, utilizing bits 15-12 for the opcode (0010), bits 11-8 for the register destination, bits 7-4 for the register source, and bits 3-0 to encode the specific shift operation type.



Figure 3.3: Shift Operation Refactor

The decoding logic for these shift operations, depicted in Figure 3.4, employs a compact 2-bit selection scheme to differentiate between the four shift types. The color-coded matrix shows how different bit combinations map to logic operations (green), arithmetic operations (purple), shift left operations (blue), and shift right operations (yellow). This encoding allows for efficient hardware implementation while maintaining clarity in the instruction semantics.

|   |   |   |   |                      |
|---|---|---|---|----------------------|
| 0 | 1 | 1 | 1 | Logic Operation      |
| 1 | 0 | 0 | 0 | Arithmetic Operation |
| 1 | 0 | 0 | 1 | Shift Left           |
| 1 | 0 | 1 | 0 | Shift Right          |

Figure 3.4: Logic behind shift operation refactoring

### Return from Interrupt (RETI) Instruction

A significant architectural enhancement was the introduction of the RETI instruction, as illustrated in Figure 3.5. Unlike traditional branch instructions that require explicit specification of jump offsets or target addresses, RETI provides a dedicated mechanism

for returning from interrupt service routines. This instruction eliminates the need for manually calculating and encoding branch displacements when exiting interrupt handlers, simplifying interrupt management and reducing potential errors in interrupt-driven code.



Figure 3.5: RETI instruction

The RETI instruction operates by automatically restoring the program counter to the address that was stored when the interrupt was triggered. This approach streamlines the interrupt handling process and ensures proper return flow control without requiring programmer intervention to manage return addresses explicitly.

This architectural extension demonstrates how the original ISA can be enhanced with minimal modifications to the instruction format while adding significant computational capability through comprehensive shift operations and improved interrupt handling support. The interrupt handling mechanism will be discussed in detail in the following sections.

### 3.4 Register File

The original code was developed for the Spartan-II, a member of the Virtex family of FPGAs, and relied on memory resources and IP cores that are no longer optimal for modern platforms. Since the target platform for this project is the ZYBO development board, which is based on a Xilinx Zynq device, a redesign of the register file implementation was required.

In the refactored design, the register file was implemented using the **Distributed Memory Generator** IP core. This choice was motivated by the relatively small size of the register file and the need for fast, deterministic access. Distributed memory maps directly onto lookup tables (LUTs), allowing for low-latency read and write operations, which is particularly advantageous for register files accessed every clock cycle.

| Signal | Direction | Description                        |
|--------|-----------|------------------------------------|
| a      | Input     | Register write address             |
| d      | Input     | Data to be written to the register |
| dpra   | Input     | Dual-port read address             |
| clk    | Input     | System clock                       |
| we     | Input     | Register file write enable         |
| spo    | Output    | Single-port read output            |
| dpo    | Output    | Dual-port read output              |

Table 3.1: Signals of the `regfile16x16` module



Figure 3.6: Inputs and Outputs of Register file

Additionally, using distributed memory provides greater flexibility in port configuration compared to block RAM, enabling efficient implementation of multiple read and write ports without incurring the complexity or resource overhead associated with BRAM-based solutions. This approach also aligns well with the architectural requirements of the processor, where compact storage and high-speed access are prioritized over large capacity.

### 3.5 Memory Architecture Refinement

In the refactoring process, a significant architectural decision was made to separate the original unified Block RAM (BRAM) structure into distinct Read-Only Memory (ROM) and Random Access Memory (RAM) modules. This separation provides better organization, enhanced security, and improved system performance compared to the monolithic memory approach used in the original implementation.

The original architecture, as illustrated in Figure 2.3, employed a single 1KB on-chip BRAM that stored both program instructions and data in a shared address space. While this approach is simple and resource-efficient, it presents several limitations:

- **Security concerns:** Executable code sharing the same memory space as data creates potential vulnerabilities, as errant writes could corrupt program instructions.
- **Limited scalability:** As the system grows, managing a single unified memory becomes increasingly complex.
- **Inflexibility:** Different access patterns for instructions (read-only) and data (read-write) cannot be optimized independently.

The refactored architecture addresses the limitations of the original design by adopting a Harvard-style memory organization, in which instruction and data memories are physically separated. In this implementation, program instructions are stored in a dedicated ROM, while data is accessed through a separate RAM, as illustrated in Figure 3.7.

In the original unified memory approach, instructions and data shared the same address space, increasing the risk of unintended memory corruption and making it more difficult to reason about memory accesses during execution. By contrast, the Harvard architecture inherently protects program code, since instructions stored in ROM cannot be altered during runtime. This guarantees program integrity and eliminates an entire class of errors related to accidental overwriting of instruction memory.



Figure 3.7: Refactored memory architecture with separate ROM and RA

From a performance and design perspective, separating ROM and RAM allows each memory to be optimized for its specific access pattern. Instruction memory can be tailored for sequential fetch operations, while data memory can be optimized for frequent read and write accesses. This separation simplifies timing analysis and improves determinism, which is particularly important in FPGA-based designs.

Furthermore, the physical distinction between instruction and data memories enforces a clearer conceptual model of the processor. During development and debugging, it becomes easier to identify whether an issue originates from instruction execution or data manipulation, significantly reducing debugging complexity. This clarity also improves code readability and maintainability, especially as the system scales or additional peripherals are integrated.

Both the ROM and RAM modules were implemented using Xilinx Block Memory Generator IP cores, each configured for 1KB capacity with single clock cycle latency. This zero-cycle latency is critical for maintaining high performance, as it allows instruction fetch and data access to complete within a single clock cycle, eliminating the need for wait states in most operations.



Figure 3.8: Specification of on-chip RAM and ROM

Following the design principles established in the original implementation, we maintained the high/low memory organization for both ROM and RAM. This decision was driven by practical considerations: the byte-addressable nature of the memory system requires separate control of individual bytes, and the high/low split simplifies address decoding logic.

This organization maintains compatibility with byte-wide operations (LB, SB) while efficiently supporting word-wide accesses (LW, SW) through simultaneous access to both high and low memory banks.

## Peripheral Address Space Reorganization

The peripheral addressing scheme was also refined in the refactored design. Instead of using the original address range of 0x8000 to 0xFFFF, the peripheral space now occupies addresses 0x1000 to 0x13FF. This decision was motivated by several factors:

- **Reduced address gap:** Positioning peripherals immediately after the BRAM/ROM space minimizes unused address ranges, improving memory map efficiency.
- **Current requirements:** With only four peripherals currently implemented, the 768-byte range (0x1000–0x13FF) provides ample addressing space.
- **Future expansion:** This organization reserves the upper address space (0x13FF–0xFFFF) for future peripheral additions while maintaining a clean, organized memory map.

## 3.6 Datapath

The datapath represents the execution engine of the processor, containing all the functional units and data storage elements necessary to perform instruction operations. As shown in Figure 3.9, the datapath receives control signals from the Control Unit and executes the actual computational work, including arithmetic operations, logical operations, data movement, and memory access.



Figure 3.9: Complete datapath architecture showing all functional units and data flow paths

### 3.6.1 Datapath Overview

The datapath architecture follows a classical RISC design philosophy with distinct functional blocks connected through multiplexers and buses. The main components visible in the architecture include:

- **Register File (regfile 16x16):** A 16-register by 16-bit dual-port register file that provides fast access to operands and storage for computation results
- **Arithmetic Logic Unit (ALU):** Performs arithmetic operations (addition, subtraction), logical operations (AND, XOR) and shift operations (SLL, SRL, SLA, SRA) on operands
- **Immediate Formation Unit:** Constructs immediate operands from instruction fields, supporting sign extension and word offset modes
- **Program Counter (PC) Logic:** Manages instruction sequencing, including sequential execution, branches, and jumps
- **Memory Interface:** Connects to ROM (for instructions) and BRAM (for data), handling both word and byte accesses through separate high and low memory banks
- **Peripheral Interface:** Provides connectivity to on-chip peripherals including UART, Timer, and Parallel I/O (Pario) modules
- **Interrupt Controller (Enc):** Manages interrupt requests from peripherals and coordinates interrupt handling with the Control Unit

The datapath operates synchronously with the system clock, with data flowing from left to right through the various functional units. Control signals from the Control Unit configure multiplexers to route data appropriately and enable specific operations in each functional unit based on the current instruction being executed.

### 3.6.2 Internal Modules

The datapath is composed of several specialized modules, each responsible for specific aspects of instruction execution. This modular design promotes clarity, reusability, and ease of verification.

#### Register File

The register file module (regfile 16x16) implements a dual-port register bank with 16 general-purpose 16-bit registers (r0-r15). It features:

- **Dual-port architecture:** Allows simultaneous read of two operands (destination and source registers) in a single cycle
- **Synchronous write:** Results are written back to the destination register on the rising edge of the clock when the write enable signal is asserted
- **Asynchronous read:** Operands are available combinatorially after address inputs stabilize, minimizing latency
- **r0 convention:** By software convention, r0 is maintained as zero, though it can be temporarily used during interrupt handling

#### Arithmetic Logic Unit (ALU)

The ALU is the computational heart of the datapath, performing all arithmetic and logical operations. It consists of three parallel functional units:

- **Adder/Subtractor:** Implements 16-bit addition and subtraction with carry-in and carry-out support for multi-precision arithmetic. The unit also generates condition codes (Zero, Negative, Carry, Overflow) used by conditional branch instructions.

- **Logic Unit:** Performs bitwise AND and XOR operations, selected by the logic operation select signal from the Control Unit.
- **Shift Unit:** Executes right shift operations including logical right shift (SRL) and arithmetic right shift (SRA), as described in the ISA modifications section.

The ALU outputs are multiplexed based on the operation type, with the selected result driven onto the internal data bus for writeback to the register file or memory.

### **Immediate Formation Unit**

This module constructs immediate operands from instruction fields according to the instruction format:

- **Sign extension:** Extends 4-bit immediate values to 16 bits, preserving the sign for signed operations
- **Word offset scaling:** Multiplies immediate values by 2 for word-aligned address calculations in load/store operations
- **Immediate prefix support:** Concatenates the 12-bit immediate prefix (from IMM instruction) with the 4-bit immediate field to form 16-bit constants

### **Program Counter and Branch Logic**

The PC management logic handles instruction sequencing:

- **PC register:** Holds the address of the current instruction being executed
- **PC increment:** Normally advances by 2 (word size) to fetch the next sequential instruction
- **Branch unit:** Evaluates condition codes against branch conditions and calculates target addresses for taken branches using sign-extended displacement
- **Jump logic:** Computes jump targets for JAL (jump-and-link) instructions using the ALU result
- **Interrupt vector:** Redirects PC to the interrupt vector address when an interrupt is acknowledged

### **Memory Interface Logic**

The memory interface manages data transfers between the processor and memory:

- **ROM interface:** Fetches instructions from the dual-port ROM (high and low bytes) using the PC as the address
- **BRAM interface:** Handles load and store operations to the data RAM, with separate control for high and low byte banks
- **Byte/word control:** Generates appropriate write enables and output enables for byte (LB/SB) versus word (LW/SW) operations
- **Address parity:** Determines which byte lane to access based on the LSB of the effective address

## Peripheral Interface

The peripheral interface provides connectivity to memory-mapped I/O devices:

- **Address decoder:** Detects accesses to peripheral address space (0x1000-0x12FF) and generates appropriate select signals
- **Data routing:** Multiplexes data between the processor data bus and peripheral data buses
- **Ready synchronization:** Coordinates multi-cycle peripheral accesses using ready signals from each peripheral

## Interrupt Controller (Enc)

The interrupt encoder prioritizes and manages interrupt requests:

- **Interrupt detection:** Monitors interrupt request signals from peripherals (Timer, UART, Pario)
- **Priority encoding:** Determines which interrupt to service when multiple requests are pending
- **Vector generation:** Provides the appropriate interrupt vector table address to redirect the PC
- **Interrupt acknowledgment:** Coordinates with the Control Unit to insert the interrupt call instruction at appropriate instruction boundaries

## Peripheral Modules

Three peripheral modules are integrated into the datapath:

- **UART:** Provides serial communication capability with configurable baud rate, supporting both transmit and receive operations with interrupt generation on receive data available or transmit buffer empty
- **Timer:** A programmable 16-bit counter/timer that can operate in timer mode (counting clock cycles) or counter mode (counting external events), generating periodic interrupts for real-time task scheduling
- **Pario (Parallel I/O):** An 8-bit parallel input/output port for interfacing with external digital devices, with separate input and output registers accessible through memory-mapped control registers
- **I<sup>2</sup>C:** Implements the Inter-Integrated Circuit protocol in master mode only, supporting start/stop condition generation, slave address transmission, and byte-level data transfer over a two-wire serial interface

### 3.6.3 Data Flow Through the Datapath

During typical instruction execution, data flows through the datapath as follows:

1. The PC addresses ROM to fetch the next instruction
2. The Control Unit decodes the instruction and generates control signals
3. The register file outputs operand values based on source register specifiers
4. Immediate values are formed and sign-extended as needed

5. Operand multiplexers select between register and immediate values
6. The ALU performs the specified operation on the operands
7. The result multiplexer selects the appropriate result (ALU output, memory data, or PC value)
8. The result is written back to the destination register or memory
9. The PC is updated for the next instruction (sequential, branch, or jump)

This pipelined-style data flow, combined with the single-cycle block RAM access, enables efficient instruction execution with minimal stall cycles, achieving good performance despite the non-pipelined architecture.

### 3.7 Control Unit

The Control Unit serves as the central orchestrator of the processor, responsible for decoding instructions and generating the appropriate control signals that govern the operation of the datapath. As illustrated in Figure 3.10, the Control Unit acts as the interface between the instruction stream and the execution resources, translating opcodes and instruction fields into the specific control signals required for each operation.



Figure 3.10: Control Unit architecture showing input instruction decoding and output control signals

The Control Unit receives instructions from the on-chip ROM through the instruction fetch interface. The primary input for instruction decoding is:

- **i\_insn[15:0]**: The 16-bit instruction word fetched from ROM, containing the opcode, register specifiers, immediate values, and function codes that determine the operation to be executed

Additionally, the Control Unit interfaces with external systems through:

- **i\_par\_i[7:0]**: Parallel input used for interrupt handling and external communication
- **i\_rx\_line**: UART receive line for serial communication
- **i<sub>i</sub>2c\_scl** : *I2C serial clock line for I2C communication*
- **i<sub>i</sub>2c\_sda** : *I2C serial data line for I2C communication*

Based on the instruction word and system status, the Control Unit performs instruction decoding and generates two categories of outputs: control signals that configure the datapath for execution, and status signals that provide feedback to the system.

### 3.7.1 Control Signals

The control signals generated by the Control Unit configure the datapath for the execution of each instruction. These signals are distributed throughout the datapath to enable specific operations and control data flow. The major control signal groups include:

#### Register File Control

- **o\_rf\_we / i\_rf\_we**: Register file write enable, determines when results are written back to the register file
- **o\_isRI / i\_isRI**: Indicates an RI-format instruction for proper operand selection

#### Operand Selection and Data Path Control

- **o\_op\_a\_sel / i\_op\_a\_sel**: Selects source for operand A (register vs. immediate)
- **o\_word\_off / i\_word\_off**: Indicates word offset addressing mode
- **o\_signal\_extension / i\_signal\_extension**: Controls sign extension of immediate values
- **o\_imm\_pre / i\_imm\_pre**: Immediate prefix flag for extended immediate values
- **o\_which\_dataInsn[3:0] / i\_which\_dataInsn[3:0]**: Selects which data source drives the result bus- load byte/word or store byte/word
- **o\_zero\_insn / i\_zero\_insn**: Forces instruction to zero (used in interrupt handling)

#### ALU Operation Control

- **o\_valid\_insn\_ce / i\_valid\_insn\_ce**: Valid instruction clock enable
- **o\_br / i\_br**: Branch instruction indicator
- **o\_is\_jal / i\_is\_jal**: Jump-and-link instruction flag
- **o\_carry\_in / i\_carry\_in**: Carry input for add-with-carry operations
- **o\_logic\_op\_sel / i\_logic\_op\_sel**: Selects logical operation (AND, XOR)
- **o\_is\_add / i\_is\_add**: Indicates addition operation
- **o\_shift\_op\_sel[1:0] / i\_shift\_op\_sel[1:0]**: Selects shift operation type
- **o\_op\_type\_sel[1:0] / i\_op\_type\_sel[1:0]**: General operation type selector

## Memory Access Control

- ***o\_w\_oe* / *i\_w\_oe***: Word output enable for load operations
- ***o\_l\_oe* / *i\_l\_oe***: Low byte output enable
- ***o\_h\_oe* / *i\_h\_oe***: High byte output enable
- ***o\_l\_we* / *i\_l\_we***: Low byte write enable for store operations
- ***o\_h\_we* / *i\_h\_we***: High byte write enable for store operations
- ***o\_io* / *i\_io***: I/O operation indicator for peripheral access

## Special Instructions

- ***o\_is\_reti* / *i\_is\_reti***: Return from interrupt instruction flag
- ***o\_I\_ad\_IVT[15:0]* / *i\_I\_ad\_IVT[15:0]***: Interrupt vector table address
- ***o.int[4:0]* / *i\_int[4:0]***: Interrupt request signals

### 3.7.2 Status Signals

In addition to control signals, the Control Unit also monitors status feedback from the datapath to make informed decoding decisions for subsequent instructions:

- ***i\_insn[15:0]* / *o\_insn[15:0]***: Current instruction word being decoded
- ***i\_cc[3:0]* / *o\_cc[3:0]***: Condition codes (Zero, Negative, Carry, Overflow) from ALU operations
- ***i\_addr\_parity* / *o\_addr\_parity***: Address parity for byte-addressing alignment
- ***i\_is\_peripheral\_addr* / *o\_is\_peripheral\_addr***: Indicates if current address targets peripheral space
- ***i\_io\_rdy* / *o\_io\_rdy***: I/O ready signal from peripheral devices
- ***i\_int\_req[3:0]* / *o\_int\_req[3:0]***: Interrupt request status from peripherals

### 3.7.3 External Interface

The Control Unit also interfaces with the external system through dedicated clock and reset signals:

- ***i\_clk***: System clock input that synchronizes all control signal generation
- ***i\_RST***: System reset that initializes the Control Unit to a known state

Additionally, output ports are provided for external visibility:

- ***o\_par\_o[7:0]***: Parallel output for external communication
- ***o\_tx\_line***: Transmit line for serial output
- ***o\_i2c\_scl***: I2C serial clock output line
- ***o\_i2c\_sda***: I2C serial data bidirectional line

### 3.7.4 Control Unit Operation

The Control Unit operates in a synchronous manner, receiving new instructions on each clock cycle when enabled. The decoding process involves:

1. **Instruction Reception:** Capturing the instruction word from the instruction inputs
2. **Opcode Extraction:** Parsing the instruction format to identify the operation type
3. **Field Decoding:** Extracting register specifiers, immediate values, and function codes
4. **Control Signal Generation:** Activating the appropriate combination of control signals based on the decoded instruction
5. **Status Monitoring:** Evaluating condition codes and ready signals to handle conditional operations and multi-cycle instructions

The modular design of the Control Unit, with clearly defined control and status signal groups, facilitates easy modification and extension of the instruction set. New instructions can be added by implementing additional decoding logic and generating the corresponding control signal combinations, without requiring extensive redesign of the existing control structure.

This separation of control from execution (datapath) exemplifies the classical RISC design philosophy, where a simple, regular control unit generates straightforward control signals for a streamlined datapath, enabling high clock frequencies and efficient FPGA implementation.

## 3.8 Peripherals

### 3.8.1 Timer

The timer peripheral was refactored to improve configurability, clarity, and alignment with memory-mapped I/O conventions. In the refactored design, all timer functionality is controlled through a set of dedicated configuration registers, allowing software to manage the timer behavior in a structured and predictable manner.

The main configuration register, **CR#0**, is mapped to address 0x1000. This register is responsible for controlling the operational mode of the timer. Through specific bit fields, it allows enabling or disabling interrupts, selecting between **timer mode** and **counter mode**, and starting or stopping the timer. The bit assignments follow the specification implemented in the hardware: bit 0 enables interrupt generation on overflow, bit 1 selects between timer and counter operation, and bit 2 controls the start or stop state of the timer. This centralized configuration simplifies software control and replaces the more fragmented control logic present in the original design. Figure 3.11 illustrates the layout of the refactored control register.



Figure 3.11: Refactored timer control register CR#0 at address 0x1000

The interrupt request register, **CR#1**, is located at address 0x1002. This register maintains compatibility with the original implementation by asserting the interrupt request bit when a counter or timer overflow occurs, provided that interrupts are enabled. The interrupt request remains asserted until it is explicitly cleared by a write operation to this register. This behavior ensures deterministic interrupt handling and prevents spurious retriggering. Figure 3.12 shows the refactored interrupt request register.



Figure 3.12: Interrupt request register CR#1 at address 0x1002

An additional register, **CR#2**, was introduced at address 0x1004 to allow dynamic configuration of the initial counter value. This 16-bit register defines the value loaded into the counter when it is initialized or reset. If the timer is not running, any write to this register immediately updates the counter value on the next clock cycle. However, if the timer is already running, the new initialization value is only applied after the next overflow event, ensuring consistent operation without disrupting an active counting process. The structure of this register is shown in Figure 3.13.



Figure 3.13: Counter initialization register CR#2 at address 0x1004

### 3.8.2 UART Peripheral

This section describes the design and operation of a memory-mapped Universal Asynchronous Receiver/Transmitter (UART) peripheral. The module provides full-duplex asynchronous serial communication with programmable baud rate, transmit and receive buffering, and interrupt generation.



Figure 3.14: UART Frame

The UART peripheral is internally divided into two independent finite-state machines:

- UART Transmitter (TX FSM)
- UART Receiver (RX FSM)

Both FSMs operate concurrently and are synchronized to the system clock.

## Peripheral Overview

The UART peripheral supports:

- Configurable baud rate through a programmable divider
- 8-bit data frames
- One start bit and one stop bit
- Full-duplex operation
- Transmit and receive interrupt signaling

## Register Map

The UART exposes a set of memory-mapped Special Function Registers (SFRs) summarized in Table 3.2.

| Register     | Address | Access     | Description                   |
|--------------|---------|------------|-------------------------------|
| SBUF_RX      | 0x1200  | Read Only  | Receive buffer register       |
| SBUF_TX      | 0x1201  | Write Only | Transmit buffer register      |
| BAUDRATE_DIV | 0x1203  | Write Only | Baud rate divider             |
| SCON         | 0x1205  | Write Only | Serial control flags (TI, RI) |

Table 3.2: UART Register Map

## Register Specification

### SBUF\_RX

The SBUF\_RX register holds the last byte received by the UART receiver. The register is updated by hardware when a valid UART frame is received.

| <b>Bit</b> | <b>Name</b> | <b>Access</b> | <b>Reset</b> | <b>Description</b>                                                                                                     |
|------------|-------------|---------------|--------------|------------------------------------------------------------------------------------------------------------------------|
| 15:8       | –           | –             | 0            | Reserved. Read as zero.                                                                                                |
| 7:0        | RX_DATA     | RO            | 0            | Received data byte. Contains the last valid byte received by the UART receiver. Valid when the RI flag in SCON is set. |

Table 3.3: UART SBUF\_RX Register Specification

### **SBUF\_TX**

The SBUF\_TX register is used to load data for transmission. Writing to this register initiates a UART transmit operation.

| <b>Bit</b> | <b>Name</b> | <b>Access</b> | <b>Reset</b> | <b>Description</b>                                                                                           |
|------------|-------------|---------------|--------------|--------------------------------------------------------------------------------------------------------------|
| 15:8       | –           | –             | 0            | Reserved. Ignored by hardware.                                                                               |
| 7:0        | TX_DATA     | WO            | 0            | Transmit data byte. Writing a value to this field loads the transmit shift register and starts transmission. |

Table 3.4: UART SBUF\_TX Register Specification

### **BAUDRATE\_DIV**

The BAUDRATE\_DIV register defines the UART bit period relative to the system clock. The value is shared by both the transmitter and receiver.

| <b>Bit</b> | <b>Name</b> | <b>Access</b> | <b>Reset</b> | <b>Description</b>                                                                            |
|------------|-------------|---------------|--------------|-----------------------------------------------------------------------------------------------|
| 15:0       | BAUD_DIV    | WO            | Configurable | Clock divider value defining the UART baud rate. Each UART bit spans one full divider period. |

Table 3.5: UART BAUDRATE\_DIV Register Specification

By default, the Baudrate value is set to 115200bps.

## SCON

The Serial Control (SCON) register is used to report the status of the UART transmitter and receiver. It contains interrupt flags that indicate the completion of transmit and receive operations. The register is memory-mapped and written by hardware upon event completion.

| Bit | Name | Access | Reset | Description                                                                                                                                                                                          |
|-----|------|--------|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 7:2 | —    | —      | 0     | Reserved. Read as zero.                                                                                                                                                                              |
| 1   | TI   | HW set | 0     | Transmit Interrupt flag. Set by hardware when a transmission completes (after the stop bit). This flag can be used to trigger an interrupt or indicate that the transmitter is ready for a new byte. |
| 0   | RI   | HW set | 0     | Receive Interrupt flag. Set by hardware when a full byte has been successfully received and stored in the receive buffer. This flag indicates that valid data is available for software to read.     |

Table 3.6: UART SCON Register Specification

The RI and TI flags are asserted by hardware and may be cleared by software after servicing the corresponding interrupt. Reserved bits must be written as zero and are ignored by the hardware.

## Baud Rate Generation

The baud rate is derived from the system clock using a programmable divider. Each UART bit occupies one full baud period. The same divider value is shared between the transmitter and receiver to ensure synchronized timing.

## UART Transmitter (TX FSM)

The UART transmitter converts parallel data into a serial bitstream. Transmission is initiated by a write operation to the transmit buffer register.

**TX Frame Format** Each transmitted frame consists of:

- One start bit (logic 0)
- Eight data bits (least significant bit first)
- One stop bit (logic 1)



Figure 3.15: UART TX Finite State machine

**TX State Machine Description** The TX FSM operates according to the flowchart shown in Figure 3.15. The FSM progresses through the following phases:

1. Idle state with TX line held high
2. Start bit transmission
3. Sequential transmission of data bits
4. Stop bit transmission
5. Return to idle and interrupt generation

### TX Operation (Pseudocode)

#### Pseudocode: UART Transmit

```

wait until data is written to TX buffer
load data into shift register
drive TX low for start bit
for each data bit:
    drive TX with current bit
    wait one baud period
drive TX high for stop bit
signal transmit complete (TI)
return to idle

```

### UART Receiver (RX FSM)

The UART receiver samples the incoming serial data and reconstructs the transmitted byte. It uses **mid-bit sampling** to improve robustness against timing skew.



Figure 3.16: UART RX Finite State machine

**RX State Machine Description** The RX FSM follows the flowchart shown in Figure 3.16. The FSM operates as follows:

1. Idle state monitoring for a start bit
2. Start bit validation using **half-baud delay**
3. Sequential sampling of data bits
4. Stop bit verification
5. Data ready signaling and interrupt generation

### RX Operation (Pseudocode)

#### Pseudocode: UART Receive

```

monitor RX line for falling edge
wait half baud period
if start bit is valid:
    for each data bit:
        wait one baud period
        sample RX and store bit
        wait one baud period
        validate stop bit
        store received byte
        signal receive complete (RI)
return to idle

```

### Interrupt Handling

The UART peripheral provides two interrupt sources:

- **RI** (Receive Interrupt): Asserted when a valid byte is received
- **TI** (Transmit Interrupt): Asserted when a transmission completes

These flags allow software to service the UART asynchronously without polling.

### Ready Signal

The **o\_rdy** signal indicates that the UART peripheral is selected and able to respond to control bus transactions.

## Summary

The UART peripheral implements a compact and modular asynchronous serial interface using independent transmit and receive state machines. The use of flowchart-defined FSMs simplifies verification and provides a clear correspondence between the specification and implementation.

### 3.8.3 I<sup>2</sup>C Master Peripheral

This section describes the design and operation of a memory-mapped I<sup>2</sup>C master peripheral. The module implements a multi-speed I<sup>2</sup>C controller supporting Standard, Fast, Fast-Plus, High-Speed, and Ultra-Fast modes through a programmable clock divider. The peripheral is controlled via a generic control bus and operates as the bus master.

#### Peripheral Overview

The I<sup>2</sup>C master provides:

- Programmable serial clock (SCL) frequency
- Start and stop condition generation
- 7-bit addressing with read/write support
- Byte-level transmit and receive
- ACK/NACK handling

The serial data (SDA) and serial clock (SCL) lines are implemented as open-drain bidirectional signals, compliant with the I<sup>2</sup>C specification.

#### Register Map

The peripheral exposes a small set of memory-mapped registers summarized in Table 3.7.

| Register    | Address | Access     | Description                       |
|-------------|---------|------------|-----------------------------------|
| I2C_DIVIDER | 0x1300  | Write Only | Clock divider for SCL generation  |
| I2C_DATA    | 0x1302  | Write Only | Address and data payload register |
| I2C_CONTROL | 0x1304  | Write Only | Control bits (Enable, Start)      |

Table 3.7: I<sup>2</sup>C Register Map

#### Register Specification

##### I2C\_DIVIDER Register Specification

The I2C\_DIVIDER register controls the serial clock (SCL) frequency of the I<sup>2</sup>C master by dividing the system clock.

| Bit  | Name    | Access | Reset  | Description                                                                                               |
|------|---------|--------|--------|-----------------------------------------------------------------------------------------------------------|
| 15:0 | I2C_DIV | WO     | 0x01F4 | Clock divider value used to generate the I <sup>2</sup> C SCL signal. Determines the communication speed. |

Table 3.8: I<sup>2</sup>C I2C\_DIVIDER Register Specification

After reset, this register configures I<sup>2</sup>C communication at 100kbps.

##### I2C\_DATA Register Specification

| Bit  | Name | Access | Reset | Description                                                                         |
|------|------|--------|-------|-------------------------------------------------------------------------------------|
| 15:8 | DATA | WO     | 0     | Data byte to be transmitted or buffer for received data during read operations.     |
| 7:0  | ADDR | WO     | 0     | 7-bit slave address with the least significant bit indicating read/write operation. |

Table 3.9: I<sup>2</sup>C I2C\_DATA Register Specification

The I2C\_DATA register stores the address and data payload used during an I<sup>2</sup>C transaction.

#### I2C\_CONTROL Register Specification

The I2C\_CONTROL register controls the operation of the I<sup>2</sup>C master state machine.

| Bit | Name  | Access | Reset | Description                                                                                             |
|-----|-------|--------|-------|---------------------------------------------------------------------------------------------------------|
| 7:2 | —     | —      | 0     | Reserved. Must be written as zero.                                                                      |
| 1   | START | WO     | 0     | Start command. When set, initiates an I <sup>2</sup> C transaction. Cleared internally after execution. |
| 0   | EN    | WO     | 0     | Enable bit. Enables the I <sup>2</sup> C master state machine when set.                                 |

Table 3.10: I<sup>2</sup>C I2C\_CONTROL Register Specification

#### Clock Generation

The I<sup>2</sup>C clock is derived from the system clock using a programmable divider. The divider value determines the SCL frequency and allows selection of multiple I<sup>2</sup>C speed modes at compile time or runtime.

Internally, the divider generates:

- A toggling internal I<sup>2</sup>C clock
- Rising-edge enable signals for data sampling
- Falling-edge enable signals for data driving

SCL is only actively driven during address and data phases; it remains high during idle, start, and stop conditions.

#### I<sup>2</sup>C State Machine

The protocol is implemented using a finite-state machine synchronized to the internal I<sup>2</sup>C clock. The main states are:

1. **IDLE**: Bus is idle; SCL and SDA are high
2. **START**: Start condition generation
3. **ADDRESS**: Transmission of address and R/W bit
4. **READ\_ACK**: ACK sampling from slave
5. **WRITE\_DATA**: Byte transmission to slave
6. **WRITE\_ACK**: Master ACK after read

7. **READ\_DATA**: Byte reception from slave

8. **STOP**: Stop condition generation

### Protocol Operation (Pseudocode)

The high-level behavior of the I<sup>2</sup>C master can be summarized as follows:

#### Pseudocode: I<sup>2</sup>C Transaction

```
wait until ENABLE is set
if START requested:
    generate START condition
    transmit address + R/W bit
    if slave ACK received:
        if WRITE operation:
            transmit data byte
            check ACK
        else if READ operation:
            receive data byte
            transmit ACK
    generate STOP condition
return to IDLE
```

### Data Direction and SDA Control

The SDA line is controlled dynamically based on the current protocol phase:

- During address and write phases, SDA is actively driven by the master
- During ACK and read phases, SDA is released and sampled

This behavior ensures proper open-drain operation and allows slave devices to acknowledge or supply data as required.

### Transaction Completion

A transaction completes when the STOP condition is generated and the state machine returns to the IDLE state. At this point, the peripheral is ready to accept a new START command.

### Summary

The I<sup>2</sup>C master peripheral provides a compact and configurable implementation of the I<sup>2</sup>C protocol. Its register-based control interface enables easy software integration, while the internal state machine ensures protocol-compliant timing and sequencing across multiple speed modes.

## 3.9 Interrupt Controller Architecture

The Refactored GR0040 implements an interrupt structure inspired by the 8051 microcontroller architecture. This design provides a clear, simple, and standardized organization for handling interrupt triggers without the need for complex software-level lookups or indirections.

The interrupt process is managed through a hardwired sequence consisting of four primary stages:

1. **Interruption Prologue:** A hardwired sequence initiated upon an interrupt trigger.
2. **Vector Jump:** The Program Counter (PC) automatically jumps to a fixed address corresponding to the specific interrupt source in the Interrupt Vector Table (IVT).
3. **ISR Execution:** The processor executes the Interrupt Service Routine (ISR) handler.
4. **Interruption Epilogue:** A hardwired sequence that restores the PC to its pre-interruption state, allowing normal program execution to resume.

### 3.10 Interrupt Vector Table (IVT)

The IVT defines fixed entry points for various interrupt sources. Each maskable interrupt handler is allocated 8 words (16 bytes), allowing for a maximum of 8 instructions within the vector itself. If a handler requires more space, it must include a jump instruction to a different memory location where the remainder of the code resides.

Table 3.11: GR0040 Interrupt Vector Table

| Interrupt Source | ISR Address (Hex) | Description                    |
|------------------|-------------------|--------------------------------|
| Reset            | 0x0002            | Reset Vector (Jumps to 0x0054) |
| Timer            | 0x0004            | Timer/Counter Interrupt        |
| Extern 0         | 0x0014            | External Interrupt Line 0      |
| Extern 1         | 0x0024            | External Interrupt Line 1      |
| UART_Rx          | 0x0034            | Signals Reception on UART      |
| UART_Tx          | 0x0044            | Signals Transmission on UART   |

### 3.11 Implementation and Priority Logic

The interrupt system is implemented using a priority-based “If-Else” logic structure within the Control Unit. During every positive clock edge, the system verifies the interrupt request lines (Figure 3.17).

- **Priority Handling:** Interrupts are processed based on a hardwired priority. If multiple requests occur simultaneously, the “Priority If-Else” logic determines which source is serviced first.
- **Jump-and-Link:** The jump to the ISR and the linking process (saving the return address) are entirely hardwired.
- **Address Space:** Interrupt handling and program data are mapped within the first 1 kB of on-chip RAM (BRAM), specifically in the range 0x0000 to 0x03FF.



Figure 3.17: Interrupt controller flowchart

### 3.11.1 Priority Encoding

The interrupt handling mechanism implemented in this processor draws inspiration from the classical 8051 microcontroller architecture, particularly in its priority-based interrupt servicing approach. Similar to the 8051, which supports multiple interrupt sources with fixed priority levels, this design implements a priority encoder that determines which interrupt request should be serviced when multiple interrupts occur simultaneously.

#### Comparison with 8051 Interrupt Architecture

The 8051 microcontroller features a two-level priority interrupt system with no indirections with five interrupt sources, each with a fixed priority. When multiple interrupts are pending, the 8051's interrupt controller automatically services the highest priority interrupt first. Our implementation extends this concept with the following key similarities and enhancements:

- **Priority-based servicing:** Like the 8051, interrupts are assigned fixed priorities based on their source, with lower-indexed interrupts having higher priority
- **Interrupt queuing:** Pending interrupts are queued while an interrupt service routine is executing, similar to the 8051's interrupt pending flags
- **Non-maskable execution:** Once an interrupt begins servicing, it runs to completion (until RETI) before another interrupt can be serviced, preventing interrupt nesting
- **Atomic instruction protection:** Certain instruction sequences cannot be interrupted, maintaining data integrity during critical operations

## Interrupt Priority Encoding Design

The priority encoding logic manages interrupt requests through several coordinated stages, as illustrated in Figure 3.19. The implementation ensures that interrupts are serviced fairly while respecting priority levels and protecting critical instruction sequences.



Figure 3.18: Priority Encoding flowchart

## Timer/Counter Interrupt Example

The timer peripheral (assigned to address space 0x1000–0x10FF) utilizes Interrupt Line 0. When a timer overflow occurs:

1. PC is saved by hardware.
2. The system jumps to address 0x0004 (Timer ISR).
3. The `int_req` flag is cleared by hardware (on UART interrupts, the bits must be cleared by software).
4. The handler code is executed.
5. The handler executes a return from interrupt routine (RETI) to return to the original program flow.

## Overview

Figure 3.19 showcases the different implemented instructions for each ISR. Firstly, regarding the timer interrupt, on the timer overflow, this interrupt is triggered. In its ISR a virtual timer is created so a board LED can blink.

Secondly, the external interrupts are used to start and stop the timer and, therefore, we can make the LED stop blinking and resume blinking. Code execution enters in each of these interrupts when a falling edge is detected on its respective pin.

Additionally, the UART\_Rx interrupt aims to implements an "echo" algorithm, where date receive (Rx) is echoed to the transmission line (Tx). Program execution enters here after UART reception has been detected.

Lastly, the program enters in the UART\_Tx interrupt after sending a character via UART. The implemented interrupt service routine toggles a board LED and clears the transmission flag (TI).



Figure 3.19: ISRs flowchart for each interrupt

## 3.12 Latch Avoidance

The inference of latches must be avoided and is explicitly warned against by Vivado. Inferred latches are typically the result of improperly specified combinational logic, where not all possible input conditions are handled. When a signal is not assigned a value in every execution path of a combinational process, the synthesis tool preserves the previous value, which leads to latch inference.

Inferred latches are undesirable because they may introduce unintended state retention, unpredictable behavior, and potential glitches. Furthermore, the presence of latches complicates timing analysis and significantly increases the difficulty of debugging and verification, particularly in large or deeply pipelined designs.

### 3.12.1 Understanding Latch Formation

To understand why latches form and how glitches arise, consider the example shown in Figure 3.20. The circuit implements a simple logic function where:

- Input (A) = 0
- Input (B) transitions from 1 to 0
- Input (C) = 1

As illustrated in the timing diagram, when signal B transitions, the intermediate signals AB (output of the top NAND gate) and BC (output of the bottom NAND gate) experience different propagation delays. This creates a brief moment where both AB and BC are temporarily high, causing the output Y to glitch from 1 to 0 and back to 1. This glitch occurs because the combinational logic path has not been fully specified for all input transitions, allowing transient states to propagate through the circuit.



Figure 3.20: Glitch formation in combinational logic due to propagation delay differences. The output Y experiences an unwanted transition ( $1 \rightarrow 0 \rightarrow 1$ ) when input B changes, demonstrating the hazard that can occur in incompletely specified logic.

When such incompletely specified combinational logic is synthesized, the tools may infer a latch to "hold" the previous value during undefined states. Figure 3.21 illustrates this concept using a Karnaugh map representation. The function  $Q^+(C, D, Q)$  represents the next state logic, where:

- $e1 = \bar{C} \cdot Q$  represents one term (circled in blue)
- $e2 = C \cdot D$  represents another term (highlighted in pink)
- The complete expression is  $Q^+(C, D, Q) = e1 + e2 = \bar{C} \cdot Q + C \cdot D$

The critical observation is that the current value of Q appears in the next-state equation. This feedback of the current state into the next state is the defining characteristic of a latch. In the Karnaugh map, the cells where Q appears in the expression (shown with value 1 in the blue circled region when C=0) demonstrate that the circuit must "remember" its previous value, thus requiring storage—a latch.



Figure 3.21: Karnaugh map showing latch inference when output depends on its previous value. The expression  $Q^+(C, D, Q) = \bar{C} \cdot Q + C \cdot D$  contains the current state Q, requiring a storage element (latch) for implementation. The overlapping regions (e1 and e2) show how incomplete specification leads to state retention.

A common but incorrect workaround is the addition of feedback or self-assignments intended to "hold" the previous value of a signal. While such constructs may silence

synthesis warnings in some cases, they do not eliminate the underlying issue. Instead, they explicitly describe storage behavior by relying on the previous value of the signal, which by definition results in latch inference due to the absence of complete assignment specifications.

For example, code like:

```

1  always @(*) begin
2      if (condition)
3          signal = new_value;
4      else
5          signal = signal; // WRONG: Creates latch
6  end

```

This explicitly creates a feedback loop where the signal depends on its own previous value, exactly as shown in the Karnaugh map of Figure 3.21.

### Proper Latch Avoidance Strategy

To properly avoid inferred latches, all combinational processes were refactored to ensure that every output signal is assigned a well-defined value for all possible input conditions. This was achieved by:

1. **Providing default assignments:** At the beginning of each combinational process, all outputs are assigned default values
2. **Exhaustive case coverage:** All branches of conditional statements (if-else, case) are explicitly handled
3. **Complete sensitivity lists:** Using `always @(*)` or equivalent to ensure all read signals are in the sensitivity list
4. **Eliminating feedback:** Ensuring no output signal depends on its own previous value in combinational logic

The correct approach:

```

1  always @(*) begin
2      signal = default_value; // Default assignment
3      if (condition)
4          signal = new_value;
5      // No else needed - default covers all other cases
6  end

```

As a result of these refactoring efforts, the logic is fully combinational, deterministic, and free of unintended storage elements. The design now passes synthesis without latch warnings, and the combinational paths are well-defined for all input combinations, eliminating the potential for glitches and unpredictable behavior as demonstrated in Figures 3.20 and 3.21.

### 3.13 Removal of Internal Tri-State

This refactoring addresses the use of internal tri-state (high-impedance, Z) signals within the design. Recent versions of Verilog and modern synthesis tools do not support the use of high-impedance states for internal variables. According to UG901 (Vivado Design Suite User Guide: Synthesis), tri-state signals are intended to be used exclusively for top-level I/O ports. This behavior has also been confirmed through AMD Community Support.

The presence of internal tri-state assignments may lead to non-synthesizable logic, unexpected synthesis results, or implicit logic inference that diverges from the intended hardware behavior. As a consequence, internal signals driven to a high-impedance state

can introduce ambiguity in simulation and are not reliably mapped to physical resources during synthesis.

To resolve this issue, all internal signal assignments were refactored to ensure that signals are always driven to known and deterministic values. In cases where the original intent was to logically disconnect a signal from the circuit (previously modeled using a high-impedance state), an additional control variable was introduced or an existing one was reused. This control logic explicitly manages signal activation and deactivation, preserving the original functional intent without relying on unsupported tri-state behavior.

# Chapter 4

## Implementation

### 4.1 ISA refactored

#### 4.1.1 Shift operations

The hardware implementation of the shift operations is achieved through a concise multiplexer-based design, as shown in Listing 4.1. The selection signal `i_shift_op_sel[0]` determines the shift direction (right or left), while `i_shift_op_sel[1]` differentiates between logical and arithmetic right shifts. For right shifts, a logical shift inserts a zero in the most significant bit position, whereas an arithmetic shift preserves the sign bit. Left shifts are identical for both logical and arithmetic variants, simply inserting a zero in the least significant bit position.

```
1 assign o_shift_result =
2     i_shift_op_sel[0] ?                                // RIGHT shift
3         (i_shift_op_sel[1] ?
4             {1'b0, i_a['DATA_MSB:1]} :                  // LOGICAL
5                 right
6                 {i_a['DATA_MSB], i_a['DATA_MSB:1]})) : // ARITHMETIC
7                     right
8                     {i_a['DATA_MSB-1:0], 1'b0};           // LEFT shift
```

Listing 4.1: Shift operation implementation

In the following example, it is possible to observe that a **shift left logical (SLL)** operation was performed. The original value 0003 becomes 0006 after the execution of this instruction, demonstrating that the value was effectively multiplied by 2.



Figure 4.1: Example of a Shift Left Logical (SLL) instruction: the value 0003 is shifted left, resulting in 0006.

#### 4.1.2 RETI instruction

The implementation, shown in Listing 4.2, consists of two main components. In the control unit, the RETI instruction is decoded when the opcode equals 10, and a control signal `o_is_reti` is generated to indicate this instruction type. In the datapath, a dedicated register `_return_addr` stores the instruction register value at the moment an interrupt occurs. The program counter multiplexer logic then selects the appropriate next address: the interrupt vector table address when an interrupt is detected, the stored return address when RETI is executed, or the normal sequential or jump target address otherwise.

```

1 // In control unit:
2 `define RETI    (_op==10)
3 assign o_is_reti = o_valid & 'RETI;
4
5 // In datapath:
6 reg [`ADDR_MSB:0] _return_addr;
7 always @(posedge i_clk) begin
8     if (i_rst)
9         _return_addr <= 16'h0000;
10    else if (!i_int)
11        _return_addr <= _IR;
12 end
13
14 assign _PC = (!i_int) ? i_i_ad_IVT :
15             i_is_reti ? _return_addr :
16             (i_is_jal) ? _op_arithmetic_result : _IRincd;

```

Listing 4.2: RETI instruction implementation

Figure 4.2 illustrates the correct operation of the RETI mechanism. Upon entering an interrupt, the interrupt request line (`i_int`, shown in violet) is asserted, triggering the storage of the current instruction register value into `_return_addr`. Later, when the RETI instruction is executed to exit the interrupt service routine, the corresponding instruction signal (`o_insn`, shown in orange) is asserted, and the program counter is updated with the previously saved return address. As a result, execution resumes at the instruction immediately preceding the interrupt, validating the correct functionality of the implementation.

|                       |      |      |      |      |      |      |      |     |
|-----------------------|------|------|------|------|------|------|------|-----|
| >  _IR[15:0]          | 0094 | 007c | 007e | 0014 | 0016 | 007e | 0080 | 008 |
| >  o_insn[15:0]       | ffff | ffff | 0000 | 67c0 | a000 | 007e | 0080 | 008 |
| >  _PC[15:0]          | 0096 | 007e | 0014 | 0016 | 007e | 0080 | 0082 | 008 |
| >  _return_addr[15:0] | 007e | 0000 | 0000 | 0000 | 0000 | 0000 | 0002 | 000 |
| >  i_int[4:0]         | 00   | 00   | 02   | X    | X    | X    | X    | 00  |

Figure 4.2: RETI instruction working

## 4.2 Register file

```

1 regfile16x16 regfile(
2     .a(_rd),                                //write addr
3     .d(_data),                                //write data
4     .dpra(i_isRI ? _rd : _rs),                //dual port read addr
5     .clk(i_clk),
6     .we(i_rf_we),                            //register file write enable
7     .spo(_destinationRegister),                //single port output
8     .dpo(_2ndOp)                             //dual port output
9 );

```

## 4.3 Memory

Listing 4.3 shows the Verilog implementation of the separated ROM and RAM architecture. The code clearly demonstrates the division between instruction ROM (high and low bytes) and data RAM (high and low bytes).

```
1 ROM_8_512_H romhInsn(
2     .rsta(i_zero_insn),
3     .ena(i_insn_ce),
4     .clka(i_clk),
5     .addrA(_PC[9:1]),
6     .douta(o_insn[15:8]));
7
8 ROM_8_512_L romlInsn(
9     .rsta(i_zero_insn),
10    .ena(i_insn_ce),
11    .clka(i_clk),
12    .addrA(_PC[9:1]),
13    .douta(o_insn[7:0]));
14
15 BRAM_8_512_H ramhData(
16     .rsta(i_rst),
17     .wea(i_h_we),
18     .ena(1'b1),
19     .clka(i_clk),
20     .addrA(_data_address[9:1]),
21     .dina(_do_h),
22     .douta(_di[15:8]));
23
24 BRAM_8_512_L ramlData(
25     .rsta(i_rst),
26     .wea(i_l_we),
27     .ena(1'b1),
28     .clka(i_clk),
29     .addrA(_data_address[9:1]),
30     .dina(_data_to_store[7:0]),
31     .douta(_di[7:0]));
32
```

Listing 4.3: Separated ROM and RAM implementation with high/low byte organization

This refactored memory architecture provides a more robust, maintainable, and scalable foundation for the system compared to the original unified BRAM approach, while maintaining the performance characteristics critical for embedded applications.

## 4.4 Datapath and Control Unit

In the following listings, the top-level interconnection between the Control Unit and the Datapath is shown. These two modules form the core of the processor architecture, where the Control Unit is responsible for instruction decoding and control signal generation, while the Datapath executes the operations dictated by those control signals.

The bidirectional exchange of signals between the Control Unit and the Datapath enables a clean separation of concerns and improves modularity. This structure simplifies verification and future extensions of the processor, while ensuring that instruction execution, peripheral access, and interrupt handling are performed in a deterministic and well-defined manner.

```

1 ControlUnit controlunit (
2   .i_clk(_clk50),
3   .i_rst(_rst_sys),
4   .iInsn(_insn),
5   .i_cc(_cc),
6   .i_addr_parity(_addr_parity),
7   .i_is_peripheral_addr(_is_
8   peripheral_addr),
9   .i_io_rdy(_io_rdy),
10  .i_int_req(_int_req),
11  .o_valid_insn_ce(_valid_insn_ce),
12  .o_isRI(_isRI),
13  .o_op_a_sel(_op_a_sel),
14  .o_rf_we(_rf_we),
15  .o_word_off(_word_off),
16  .o_signal_extension(_signal_
17  extension),
18  .o_imm_pre(_imm_pre),
19  .o_br(_br),
20  .o_is_jal(_is_jal),
21  .o_carry_in(_carry_in),
22  .o_logic_op_sel(_logic_op_sel),
23  .o_shift_op_sel(_shift_op_sel),
24  .o_op_type_sel(_op_type_sel),
25  .o_valid(_valid),
26  .o_insn_ce(_insn_ce),
27  .o_is_add_op(_is_add_op),
28  .o_w_oe(_w_oe),
29  .o_l_oe(_l_oe),
30  .o_h_oe(_h_oe),
31  .o_l_we(_l_we),
32  .o_h_we(_h_we),
33  .o_io(_io),
34  .o_which_data_insn(_which_
35  data_insn),
36  .o_zero_insn(_zero_insn),
37  .o_i_ad_IVT(_i_ad_IVT),
38  .o_int(_int),
39  .o_is_reti(_is_reti)
40 );

```

Listing 4.4: Control Unit instantiation

```

1 Datapath datapath (
2   .i_clk(_clk50),
3   .i_rst(_rst_sys),
4   .i_rf_we(_rf_we),
5   .i_isRI(_isRI),
6   .i_op_a_sel(_op_a_sel),
7   .i_word_off(_word_off),
8   .i_signal_extension(_signal_
9   extension),
10  .i_imm_pre(_imm_pre),
11  .i_valid_insn_ce(_valid_insn_ce),
12  .i_br(_br),
13  .i_is_jal(_is_jal),
14  .i_carry_in(_carry_in),
15  .i_logic_op_sel(_logic_op_sel),
16  .i_is_add_op(_is_add_op),
17  .i_shift_op_sel(_shift_op_sel),
18  .i_op_type_sel(_op_type_sel),
19  .i_valid(_valid),
20  .i_insn_ce(_insn_ce),
21  .i_w_oe(_w_oe),
22  .i_l_oe(_l_oe),
23  .i_h_oe(_h_oe),
24  .i_l_we(_l_we),
25  .i_h_we(_h_we),
26  .i_io(_io),
27  .i_which_data_insn(_which_
28  data_insn),
29  .i_par_i(w_par_i),
30  .i_zero_insn(_zero_insn),
31  .i_i_ad_IVT(_i_ad_IVT),
32  .i_int(_int),
33  .i_rx_line(w_rx_line),
34  .i_is_reti(_is_reti),
35  .o_insn(_insn),
36  .o_cc(_cc),
37  .o_addr_parity(_addr_parity),
38  .o_is_peripheral_addr(_is_
39  peripheral_addr),
40  .o_io_rdy(_io_rdy),
41  .o_par_o(w_par_o),
42  .o_int_req(_int_req),
43  .o_tx_line(_tx_line)
44 );

```

Listing 4.5: Datapath instantiation

#### 4.4.1 Datapath implementation and module instantiation

```

1  `timescale 1ns / 1ps
2
3  `include "Constants.vh"
4
5  `define IO
6
7  `*****`*****`*****`*****
8  * DATAPATH MODULE
9  `*****`*****`*****`*/
10

```

```

11  module Datapath(
12    ...
13  );
14
15  ****
16  * INTERNAL VARIABLES
17  ****
18  reg ['ADDR_MSB:0] _IR;
19  wire ['ADDR_MSB:0] _PC;
20  wire ['ADDR_MSB:0] _data_address;
21  wire ['DATA_MSB:0] _data_to_store;
22  wire ['DATA_MSB:0] _data;
23
24  wire ['DATA_MSB:0] _io_data;
25
26
27 /* INSTRUCTION ARGUMENTS DECODE - DATAPATH */
28 wire [3:0] _rd = o_insn[11:8];
29 wire [3:0] _rs = o_insn[7:4];
30 wire [3:0] _imm = o_insn[3:0];
31 wire [11:0] _i12 = o_insn[11:0];
32 wire [7:0] _disp = o_insn[7:0];
33
34
35 /* Select Operation */
36 wire ['DATA_MSB:0] _op_arithmetic_result;
37 wire ['DATA_MSB:0] _op_logic_result;
38 wire ['DATA_MSB:0] _op_shift_result;
39
40 /* ALU operands */
41 wire ['DATA_MSB:0] _destinationRegister; // Value just Stored on
42           the RF
43 wire ['DATA_MSB:0] _2ndOp;      // b selection in RF
44
45 ****
46 * IMMEDIATE OPERATION FORMATION
47 ****
48
49 reg [11:0] _i12_pre;
50 always @(posedge i_clk)
51   if (i_valid_insn_ce)
52     _i12_pre <= _i12;
53
54 wire _sxi = i_signal_extension & _imm[3];
55 wire [10:0] _sxi11 = {11{_sxi}};
56 wire _i_4 = _sxi | (i_word_off & _imm[0]);
57 wire _i_0 = ~i_word_off & _imm[0];
58 wire ['DATA_MSB:0] _imm16 = i_immm_pre ? {_i12_pre, _imm} :
59           {_sxi11, _i_4, _imm[3:1], _i_0};
60
61 ****
62 * Instruction Register (IR) UPDATE
63 ****
64
65 reg ['ADDR_MSB:0] _return_addr;
66
67 always @(posedge i_clk) begin
68   if (i_RST)
69     _return_addr <= 16'h0000;
70   else if (!i_int)
71     _return_addr <= _IR;

```

```

70     end
71
72     wire [6:0] _sxd7    = {7{_disp[7]} }; //signal extension disp8
73     for jump
74     wire ['DATA_MSB:0] _sxd16 = {_sxd7,_disp,1'b0}; // -> result
75         of disp8*2 multiply by 2 means shift right and LSb = 0
76     wire ['DATA_MSB:0] _IRinc = i_br ? _sxd16 : {i_valid,1'b0};
77         //if hit instruction is the next sequential
78     wire ['DATA_MSB:0] _IRincd = _IR + _IRinc;
79
80
81     assign _PC = (|i_int) ? i_i_ad_IVT :
82             i_is_reti ? _return_addr :
83             (i_is_jal) ? _op_arithmetic_result : _IRincd;
84
85
86     always @(posedge i_clk)
87         if (i_rst)
88             _IR <= i_i_ad_IVT;
89         else if(i_valid_insn_ce)
90             _IR <= _PC;
91
92
93 /**
94 * ALU
95 *****/
96
97 /* ALU Operand Selection */
98 wire ['DATA_MSB:0] _a = i_op_a_sel ? _destinationRegister :
99             _imm16;
100 wire ['DATA_MSB:0] _b = _2ndOp;
101
102 assign _data_address = _op_arithmetic_result; // SUM
103     RESLUT - efective address
104 assign o_addr_parity = _data_address[0];
105 assign o_is_peripheral_addr = _data_address['IO_IDENTIFIER];
106 assign _data_to_store = _destinationRegister; //d_ad
107     (destination address) is stored here
108
109 /**
110 * STORE/LOAD
111 *****/
112
113 wire _sw, _sb, _lw, _lb;
114 assign {_lw, _lb, _sw, _sb} = i_which_dataInsn;
115
116 wire ['DATA_MSB:0] _di;
117 wire [7:0] _do_h = _sw ? _data_to_store[15:8] :
118             _data_to_store[7:0];
119
120 assign _data =
121     o_io_rdy ? _io_data :
122     i_w_oe ? {_di[15:8], _di[7:0]} :
123     i_l_oe ? {8'h00, _di[7:0]} :
124     i_h_oe ? {8'h00, _di[15:8]} :
125
126     (i_op_type_sel == 2'b00) ? _op_arithmetic_result :
127     (i_op_type_sel == 2'b01) ? _op_logic_result :
128     (i_op_type_sel == 2'b10) ? _op_shift_result :
129     /*(i_op_type_sel == 2'b11) */ _IR;
130     // Default value is _IR, given the default value of
131     o_op_type_sel in C.U.
132
133 /**
134 * PERIPHERAL CONTROL

```

```

123 ****
124 `ifdef IO
125
126     reg [`IO_REG_MSB:0] _io_ad;
127
128     always @(posedge i_clk) _io_ad <= _data_address;
129
130     wire [`CONTROL_MSB:0] _ctrl;
131     wire [`SELECT_MSB:0] _sel;
132
133     wire [`SELECT_MSB:0] _per_rdy; //peripheral ready
134     assign _per_rdy[`SELECT_MSB:3] = {(`SELECT_MSB-3+1){1'b0}};
135     assign o_io_rdy = | (_sel & _per_rdy);
136
137     ctrl_enc enc(
138         .i_clk(i_clk), .i_rst(i_rst), .i_io(i_io),
139         .i_io_ad(_io_ad), .i_lw(_lw), .i_lb(_lb), .i_sw(_sw),
140         .i_sb(_sb), .o_ctrl(_ctrl), .o_sel(_sel));
141
142
143     wire [`DATA_MSB:0] _timer_data;
144     wire [`DATA_MSB:0] _pario_data;
145     wire [`DATA_MSB:0] _uart_data;
146
147     assign _io_data = _sel[0] ? {8'h00, _timer_data[`PARIO_MSB:0]}
148         :
149             _sel[1] ? {8'h00, _pario_data[`PARIO_MSB:0]}
150                 :
151                     _sel[2] ? {8'h00, _uart_data[`PARIO_MSB:0]}:
152                         16'h0000;
153
154     timer timer(
155         .i_ctrl(_ctrl), .i_sel(_sel[0]),
156         .i_i(1'b1), .i_data_to_store(_data_to_store),
157         .i_cnt_init(16'h0000), .o_rdy(_per_rdy[0]),
158             //0x1000-0x10FF --> 255 endere os
159         .o_data(_timer_data), .o_int_req(o_int_req[0]));
160
161     pario par(
162         .i_rst(i_rst), .i_ctrl(_ctrl), .i_sel(_sel[1]),
163         .i_i_8bits(i_par_i),
164             .i_data_to_store(_data_to_store[`PARIO_MSB:0]),
165         .o_data(_pario_data), .o_rdy(_per_rdy[1]), //0x1100-0x11FF
166             --> 255 endere os
167         .o_o_8bits(o_par_o), .o_int_req(o_int_req[2:1]);
168             //o_int_req[2:1]
169
170     uart uart(
171         .i_ctrl(_ctrl), .i_sel(_sel[2]), .i_rx_line(i_rx_line),
172         .i_data_to_store(_data_to_store), .o_rdy(_per_rdy[2]),
173             .o_tx_line(o_tx_line),
174         .o_data(_uart_data), .o_int_req(o_int_req[4:3]));
175
176 `else
177     assign o_int_req = 4'b00000;
178 `endif
179
180 ****
181 * INTERNAL MODULES
182 ****

```

```

177      regfile16x16 regfile(
178        .a(_rd),                                //write addr
179        .d(_data),                                //write data
180        .dpra(i_isRI ? _rd : _rs),                //dual port read addr
181        .clk(i_clk),
182        .we(i_rf_we),                            //register file write
183          enable
184        .spo(_destinationRegister),              //single port output
185        .dpo(_2ndOp)                            //dual port output
186      );
187
188      ALU alu(
189        .i_is_add_op(i_is_add_op),
190        .i_a(_a),
191        .i_b(_b),
192        .i_carry_in(i_carry_in),
193        .i_logic_op_sel(i_logic_op_sel),
194        .i_shift_op_sel(i_shift_op_sel),
195        .o_arithmetic_result (_op_arithmetic_result),
196        .o_logic_result(_op_logic_result),
197        .o_shift_result(_op_shift_result),
198        .o_cc (o_cc)
199      );
200
201      /*Memory instantiation showed previously*/
202
203
204 endmodule

```

Listing 4.6: Control Unit instantiation

## 4.5 Timer and Parallel I/O

The code shown in Listing 4.7 presents the updated implementation of the timer peripheral and its associated control and status registers, as defined during the design phase.

The timer logic is implemented as a synchronous process driven by the rising edge of the clock. Upon reset, all control signals are cleared, the interrupt request output `o_int_req` is deasserted, and the counter initialization register `_cnt_init_reg` is loaded with its default value. The control register fields `_start`, `_timer`, and `_int_en` respectively enable the timer, select the timer mode, and control interrupt generation.

When the timer is active and a valid tick event occurs, an interrupt request is generated if interrupts are enabled. This behavior corresponds to the interrupt request register (CR#1), which asserts `o_int_req` and remains active until explicitly cleared by a write operation to the same register. This mechanism ensures a level-sensitive interrupt request that can be reliably acknowledged by the processor.

Register writes are decoded based on the address bus. Writing to CR#0 updates the timer control fields, writing to CR#1 clears the interrupt request, and writing to CR#2 updates the counter initialization value. This structure closely follows the memory-mapped peripheral model defined in the architectural specification.

```

1 always @(posedge _clk) begin
2   if (_rst) begin
3     {_start, _timer, _int_en} <= 3'b000;
4     o_int_req <= 1'b0;
5     _cnt_init_reg <= i_cnt_init;
6   end
7   else begin

```

```

8     //CR#1 interrupt request
9     if (_tick && _v && _int_en)
10    o_int_req <= 1'b1;
11
12    // Handle writes based on address
13    if (_we[0]) begin
14      case (_ad)
15        // 0x1000 - Counter Control Register
16        CR_0[7:0]:{_start, _timer, _int_en} <=
17          i_data_to_store[2:0];
18        // 0x1002 - Interrupt Request Register
19        CR_1[7:0]: o_int_req <= 1'b0; // Clear on write
20        // 0x1004 - Counter Init Register
21        CR_2[7:0]: _cnt_init_reg <= i_data_to_store;
22        default: begin
23          end
24        endcase
25      end
26    end

```

Listing 4.7: timer peripheral implemenatation

To validate the correct operation of the timer and its registers, a test program was executed. Initially, the counter was observed to start from 0x0000. A write operation to CR#2 updated the counter initialization register to 0xFFE5, as shown in Figure 4.3, confirming the correct behavior of the register write path.



Figure 4.3: Reconfiguration of initial counter value

Further validation was performed by enabling the timer interrupt and executing an interrupt service routine designed during the design phase. After four timer overflows, bit 0 of the peripheral output register (**par\_o**) toggles, demonstrating correct interrupt generation, handling, and interaction between the timer and the peripheral logic. This behavior is illustrated in Figure 4.4, where the LED output visibly toggles as expected.

These results confirm that the timer peripheral, its register interface, and the interrupt handling mechanism operate correctly and in accordance with the original design specifications.



Figure 4.4: Test case using timer and parallel i/o

## 4.6 UART

As observed in the figure, the transmitted data on the TX line matches exactly the value received on the RX line. This behavior confirms the correct operation of both the UART receiver and transmitter, as well as the proper internal data handling and timing between the two modules. The successful echo of the received byte demonstrates that the UART subsystem functions as intended.



Figure 4.5: UART echo test case

Figure 4.5 shows a validation example of the UART peripheral configured in echo mode. In this test, a byte with value 0x0085 is transmitted on the UART receive (RX) line. After the reception is completed and the data is correctly decoded by the UART receiver, the same value is immediately forwarded to the UART transmit (TX) line.

```

1  always @(posedge _clk) begin
2      if (~_rst) begin
3          case (_state_tx)
4              0: begin // Idle
5                  o_tx_out <= 1; // High when idle
6                  if (_tx_start) begin
7                      _shift_reg <= _sbuf_tx;
8                      _state_tx <= 1;
9                      _counter <= 0;
10                 end
11            end
12            1: begin // Start bit (0)
13                o_tx_out <= 0;
14                if (_counter == i_baudrate_div - 1) begin
15                    _counter <= 0;
16                    _state_tx <= 2;
17                    end else _counter <= _counter + 1;
18                end
19                2, 3, 4, 5, 6, 7, 8, 9: begin // Data bits (LSB first)
20                    o_tx_out <= _shift_reg[_state_tx - 2];
21                    if (_counter == i_baudrate_div - 1) begin
22                        _counter <= 0;
23                        _state_tx <= _state_tx + 1;
24                        end else _counter <= _counter + 1;
25                    end
26                    10: begin // Stop bit (1)
27                        o_tx_out <= 1;
28                        if (_counter == i_baudrate_div - 1) begin
29                            _counter <= 0;
30                            _state_tx <= 0;
31                            end else _counter <= _counter + 1;
32                        end
33                    endcase

```

```

34     end
35   else
36     o_tx_out <= 1;
37 end

```

Listing 4.8: uart tx implemenatation

```

1  always @(posedge _clk) begin
2    case (_state_rx )
3      0: begin // Wait for start bit (falling edge)
4        if (!i_rx_in) begin
5          _counter <= 0;
6          _state_rx <= 1;
7        end
8      end
9      1: begin // Sample mid-start bit
10        if (_counter == _half_baudrate_div - 1) begin
11          if (!i_rx_in) begin // Confirm start bit
12            _counter <= 0;
13            _state_rx <= 2;
14          end else _state_rx <= 0; // False start
15        end else _counter <= _counter + 1;
16      end
17      2, 3, 4, 5, 6, 7, 8, 9: begin // Sample data bits
18        if (_counter == i_baudrate_div - 1) begin
19          _shift_reg[_state_rx - 2] <= i_rx_in; // Store
20          bit
21          _counter <= 0;
22          _state_rx <= _state_rx + 1;
23        end else _counter <= _counter + 1;
24      end
25      10: begin // Stop bit
26        if (_counter == i_baudrate_div - 1) begin
27          _counter <= 0;
28          _state_rx <= 0;
29        end else _counter <= _counter + 1;
30      end
31    endcase
end

```

Listing 4.9: uart rx implemenatation

## 4.7 I2C

The I2C (Inter-Integrated Circuit) module implements a standard I2C protocol for communication with external peripherals. As shown in Figure 4.6, the waveform demonstrates successful I2C operation with proper START and STOP conditions, address transmission, and data transfer.

To initiate an I2C transaction, the processor must configure the I2C data register with the peripheral address in the least significant byte (LSB) and the data to be transmitted in the most significant byte (MSB). The I2C control register must have both the enable (EN) and start (START) flags set to 1 to begin the communication sequence.



Figure 4.6: I2C Communication Waveform showing successful data transmission

The I2C implementation uses a dual-edge triggered state machine approach, where rising edges of the I2C clock handle state transitions and data sampling, while falling edges manage data output and the SDA line direction control. This ensures proper setup and hold times as required by the I2C specification.

```

1 // I2C posedge Validation
2 always @(posedge _clk) begin
3     if(_rst == 1) begin
4         state <= IDLE;
5         counter <= 0;
6         o_data <= 16'h0000;
7     end
8     else if (_i2c_control[bit_EN])begin
9         if (i2c_clk_rising_en && clk_enable) begin
10             case(state)
11
12                 IDLE: begin
13                     if (_i2c_control[bit_START]) begin
14                         state <= START;
15                         saved_addr <= _i2c_data[7:0];
16                         saved_data <= _i2c_data[15:8];
17                     end
18                     else state <= IDLE;
19                 end
20
21                 START: begin
22                     counter <= 'DATA_I2C-1; // Set bit number for
23                         transmission
24                     state <= ADDRESS;
25                 end
26
27                 ADDRESS: begin
28                     if (counter == 0) begin
29                         state <= READ_ACK;
30                     end else counter <= counter - 1;
31                 end
32
33                 READ_ACK: begin
34                     if (o_i2c_sda == 0) begin
35                         counter <= 'DATA_I2C-1; // Set bit number
36                             for transmission
37                         if(saved_addr[0] == 0) state <= WRITE_DATA;
38                         else state <= READ_DATA;
39                     end else state <= STOP;
40                 end

```

```

40         WRITE_DATA: begin
41             if(counter == 0) begin
42                 state <= READ_ACK2;
43             end else counter <= counter - 1;
44         end
45
46         READ_ACK2: begin
47             if ((o_i2c_sda == 0) && (bit_START == 1)) state
48                 <= IDLE;
49             else state <= STOP;
50             //state <= IDLE;
51         end
52
53         READ_DATA: begin
54             o_data[counter] <= o_i2c_sda;
55             if (counter == 0) state <= WRITE_ACK;
56             else counter <= counter - 1;
57         end
58
59         WRITE_ACK: begin
60             state <= STOP;
61         end
62
63         STOP: begin
64             state <= IDLE;
65         end
66     endcase
67 end
68 end

```

Listing 4.10: I2C Posedge State Machine - Handles state transitions and data reception

```

1 // I2C negedge Validation
2 always @(posedge _clk) begin
3     if(_rst == 1) begin
4         write_enable <= 1;
5         sda_out <= 1;
6     end else if (i2c_clk_falling_en && clk_enable) begin
7         case(state)
8
9             START: begin
10                 write_enable <= 1;
11                 sda_out <= 0;
12             end
13
14             ADDRESS: begin
15                 sda_out <= saved_addr[counter];
16             end
17
18             READ_ACK: begin
19                 write_enable <= 0;
20             end
21
22             WRITE_DATA: begin
23                 write_enable <= 1;
24                 sda_out <= saved_data[counter];
25             end
26
27             WRITE_ACK: begin
28                 write_enable <= 1;

```

```

29                     sda_out <= 0;
30     end
31
32     READ_DATA: begin
33         write_enable <= 0;
34     end
35
36     STOP: begin
37         write_enable <= 1;
38         sda_out <= 1;
39         end
40     endcase
41 end
42

```

Listing 4.11: I2C Negedge Logic - Controls SDA output and direction

The posedge state machine (Listing 4.10) manages the protocol sequencing, beginning in the IDLE state where it waits for the START condition to be asserted. Upon receiving the start signal, it captures the address and data from the I2C data register. The state machine then progresses through ADDRESS transmission, ACK reception, and either WRITE\_DATA or READ\_DATA states depending on the read/write bit (bit 0) of the address byte. Each state transition occurs on the rising edge of the I2C clock to properly sample incoming data.

The negedge logic (Listing 4.11) operates on falling edges of the I2C clock to set up data on the SDA line before the next rising edge. The `write_enable` signal controls the bidirectional nature of the SDA line, setting it to output mode during transmission and input mode during ACK and data reception phases. This dual-edge approach ensures compliance with I2C timing specifications, where data must be stable during the high period of SCL and can only change during the low period.

## 4.8 Interrupt Controller

Listing 4.12 shows the complete priority encoding implementation. The critical components of this implementation are explained below.

```

1 assign _int_served_mask = (1 << _int_index_next);
2
3 /* Assure instruction is interruptable */
4 wire _int_can_occur = o_valid & // Valid to insert
5   (~rst and on a valid clk cycle (due to mem) - o_valid_insn_ce)
6   ~('IMM|'ALU&('ADC|'SBC|'CMP)) & // Assure int
7   req only on NON interruptable instructions
8   ~_int_in_progress; //Assure the
9   int being served is not interrupted
10
11 /* Logic to make sure the same interrupt isn't served twice,
12   consecutively */
13 always @(posedge i_clk)
14 if (i_RST)
15   _int_req_last <= 0;
16 else
17   _int_req_last <= i_int_req;
18
19
20 assign _int_pend = (i_RST || (~o_int)) ? 0 : (i_int_req &
21   ~_int_req_last);
22 wire _int_accept = (~_int_next) & o_insn_ce;
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
259
260
261
262
263
264
265
266
267
268
269
269
270
271
272
273
274
275
276
277
278
279
279
280
281
282
283
284
285
286
287
287
288
289
289
290
291
292
293
294
295
296
297
297
298
299
299
300
301
302
303
304
305
306
307
308
309
309
310
311
312
313
314
315
316
317
318
319
319
320
321
322
323
324
325
326
327
328
329
329
330
331
332
333
334
335
336
337
338
339
339
340
341
342
343
344
345
346
347
348
349
349
350
351
352
353
354
355
356
357
358
359
359
360
361
362
363
364
365
366
367
368
369
369
370
371
372
373
374
375
376
377
378
379
379
380
381
382
383
384
385
386
387
388
389
389
390
391
392
393
394
395
396
397
398
399
399
400
401
402
403
404
405
406
407
408
409
409
410
411
412
413
414
415
416
417
418
419
419
420
421
422
423
424
425
426
427
428
429
429
430
431
432
433
434
435
436
437
438
439
439
440
441
442
443
444
445
446
447
448
449
449
450
451
452
453
454
455
456
457
458
459
459
460
461
462
463
464
465
466
467
468
469
469
470
471
472
473
474
475
476
477
478
478
479
480
481
482
483
484
485
486
487
488
489
489
490
491
492
493
494
495
496
497
498
499
499
500
501
502
503
504
505
506
507
508
509
509
510
511
512
513
514
515
516
517
518
519
519
520
521
522
523
524
525
526
527
528
529
529
530
531
532
533
534
535
536
537
538
539
539
540
541
542
543
544
545
546
547
548
549
549
550
551
552
553
554
555
556
557
558
559
559
560
561
562
563
564
565
566
567
568
569
569
570
571
572
573
574
575
576
577
578
579
579
580
581
582
583
584
585
586
587
588
589
589
590
591
592
593
594
595
596
597
597
598
599
599
600
601
602
603
604
605
606
607
608
609
609
610
611
612
613
614
615
616
617
618
619
619
620
621
622
623
624
625
626
627
628
629
629
630
631
632
633
634
635
636
637
638
639
639
640
641
642
643
644
645
646
647
648
649
649
650
651
652
653
654
655
656
657
658
659
659
660
661
662
663
664
665
666
667
668
669
669
670
671
672
673
674
675
676
677
678
679
679
680
681
682
683
684
685
686
687
688
689
689
690
691
692
693
694
695
696
697
697
698
699
699
700
701
702
703
704
705
706
707
708
709
709
710
711
712
713
714
715
716
717
718
719
719
720
721
722
723
724
725
726
727
728
729
729
730
731
732
733
734
735
736
737
738
739
739
740
741
742
743
744
745
746
747
748
749
749
750
751
752
753
754
755
756
757
758
759
759
760
761
762
763
764
765
766
767
768
769
769
770
771
772
773
774
775
776
777
778
778
779
779
780
781
782
783
784
785
786
787
788
789
789
790
791
792
793
794
795
796
797
797
798
799
799
800
801
802
803
804
805
806
807
808
809
809
810
811
812
813
814
815
816
817
818
819
819
820
821
822
823
824
825
826
827
828
829
829
830
831
832
833
834
835
836
837
838
839
839
840
841
842
843
844
845
846
847
848
849
849
850
851
852
853
854
855
856
857
858
859
859
860
861
862
863
864
865
866
867
868
869
869
870
871
872
873
874
875
876
877
878
878
879
879
880
881
882
883
884
885
886
887
888
888
889
889
890
891
892
893
894
895
896
897
897
898
899
899
900
901
902
903
904
905
906
907
908
909
909
910
911
912
913
914
915
916
917
917
918
919
920
921
922
923
924
925
926
927
928
929
929
930
931
932
933
934
935
936
937
938
939
939
940
941
942
943
944
945
946
947
948
948
949
949
950
951
952
953
954
955
956
957
958
959
959
960
961
962
963
964
965
966
967
968
969
969
970
971
972
973
974
975
976
977
978
978
979
979
980
981
982
983
984
985
986
987
987
988
989
989
990
991
992
993
994
995
996
997
998
999
999
1000
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1009
1010
1011
1012
1013
1014
1015
1016
1017
1017
1018
1019
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1048
1049
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1078
1079
1079
1080
1081
1082
1083
1084
1085
1086
1087
1087
1088
1088
1089
1089
1090
1091
1092
1093
1094
1095
1096
1096
1097
1097
1098
1098
1099
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1109
1110
1111
1112
1113
1114
1115
1116
1117
1117
1118
1119
1119
1120
1121
1122
1123
1124
1125
1126
1127
1127
1128
1128
1129
1129
1130
1131
1132
1133
1134
1135
1136
1137
1137
1138
1138
1139
1139
1140
1141
1142
1143
1144
1145
1146
1146
1147
1147
1148
1148
1149
1149
1150
1151
1152
1153
1154
1155
1156
1157
1157
1158
1158
1159
1159
1160
1161
1162
1163
1164
1165
1166
1166
1167
1167
1168
1168
1169
1169
1170
1171
1172
1173
1174
1175
1175
1176
1176
1177
1177
1178
1178
1179
1179
1180
1181
1182
1183
1184
1185
1186
1186
1187
1187
1188
1188
1189
1189
1190
1191
1192
1193
1194
1195
1195
1196
1196
1197
1197
1198
1198
1199
1199
1200
1201
1202
1203
1204
1205
1206
1207
1207
1208
1208
1209
1209
1210
1211
1212
1213
1214
1215
1216
1216
1217
1217
1218
1218
1219
1219
1220
1221
1222
1223
1224
1225
1226
1226
1227
1227
1228
1228
1229
1229
1230
1231
1232
1233
1234
1235
1236
1236
1237
1237
1238
1238
1239
1239
1240
1241
1242
1243
1244
1245
1246
1246
1247
1247
1248
1248
1249
1249
1250
1251
1252
1253
1254
1255
1256
1256
1257
1257
1258
1258
1259
1259
1260
1261
1262
1263
1264
1265
1266
1266
1267
1267
1268
1268
1269
1269
1270
1271
1272
1273
1274
1275
1275
1276
1276
1277
1277
1278
1278
1279
1279
1280
1281
1282
1283
1284
1285
1285
1286
1286
1287
1287
1288
1288
1289
1289
1290
1291
1292
1293
1294
1295
1295
1296
1296
1297
1297
1298
1298
1299
1299
1300
1301
1302
1303
1304
1305
1306
1306
1307
1307
1308
1308
1309
1309
1310
1311
1312
1313
1314
1315
1315
1316
1316
1317
1317
1318
1318
1319
1319
1320
1321
1322
1323
1324
1325
1325
1326
1326
1327
1327
1328
1328
1329
1329
1330
1331
1332
1333
1334
1335
1335
1336
1336
1337
1337
1338
1338
1339
1339
1340
1341
1342
1343
1344
1345
1345
1346
1346
1347
1347
1348
1348
1349
1349
1350
1351
1352
1353
1354
1355
1355
1356
1356
1357
1357
1358
1358
1359
1359
1360
1361
1362
1363
1364
1365
1365
1366
1366
1367
1367
1368
1368
1369
1369
1370
1371
1372
1373
1374
1375
1375
1376
1376
1377
1377
1378
1378
1379
1379
1380
1381
1382
1383
1384
1385
1385
1386
1386
1387
1387
1388
1388
1389
1389
1390
1391
1392
1393
1394
1394
1395
1395
1396
1396
1397
1397
1398
1398
1399
1399
1400
1401
1402
1403
1404
1404
1405
1405
1406
1406
1407
1407
1408
1408
1409
1409
1410
1411
1412
1413
1414
1414
1415
1415
1416
1416
1417
1417
1418
1418
1419
1419
1420
1421
1422
1423
1424
1424
1425
1425
1426
1426
1427
1427
1428
1428
1429
1429
1430
1431
1432
1433
1434
1434
1435
1435
1436
1436
1437
1437
1438
1438
1439
1439
1440
1441
1442
1443
1444
1444
1445
1445
1446
1446
1447
1447
1448
1448
1449
1449
1450
1451
1452
1453
1454
1454
1455
1455
1456
1456
1457
1457
1458
1458
1459
1459
1460
1461
1462
1463
1464
1464
1465
1465
1466
1466
1467
1467
1468
1468
1469
1469
1470
1471
1472
1473
1474
1474
1475
1475
1476
1476
1477
1477
1478
1478
1479
1479
1480
1481
1482
1483
1484
1484
1485
1485
1486
1486
1487
1487
1488
1488
1489
1489
1490
1491
1492
1493
1494
1494
1495
1495
1496
1496
1497
1497
1498
1498
1499
1499
1500
1501
1502
1503
1504
1504
1505
1505
1506
1506
1507
1507
1508
1508
1509
1509
1510
1511
1512
1513
1514
1514
1515
1515
1516
1516
1517
1517
1518
1518
1519
1519
1520
1521
1522
1523
1524
1524
1525
1525
1526
1526
1527
1527
1528
1528
1529
1529
1530
1531
1532
1533
1534
1534
1535
1535
1536
1536
1537
1537
1538
1538
1539
1539
1540
1541
1542
1543
1544
1544
1545
1545
1546
1546
1547
1547
1548
1548
1549
1549
1550
1551
1552
1553
1554
1554
1555
1555
1556
1556
1557
1557
1558
1558
1559
1559
1560
1561
1562
1563
1564
1564
1565
1565
1566
1566
1567
1567
1568
1568
1569
1569
1570
1571
1572
1573
1574
1574
1575
1575
1576
1576
1577
1577
1578
1578
1579
1579
1580
1581
1582
1583
1584
1584
1585
1585
1586
1586
1587
1587
1588
1588
1589
1589
1590
1591
1592
1593
1594
1594
1595
1595
1596
1596
1597
1597
1598
1598
1599
1599
1600
1601
1602
1603
1604
1604
1605
1605
1606
1606
1607
1607
1608
1608
1609
1609
1610
1611
1612
1613
1614
1614
1615
1615
1616
1616
1617
1617
1618
1618
1619
1619
1620
1621
1622
1623
1624
1624
1625
1625
1626
1626
1627
1627
1628
1628
1629
1629
1630
1631
1632
1633
1634
1634
1635
1635
1636
1636
1637
1637
1638
1638
1639
1639
1640
1641
1642
1643
1644
1644
1645
1645
1646
1646
1647
1647
1648
1648
1649
1649
1650
1651
1652
1653
1654
1654
1655
1655
1656
1656
1657
1657
1658
1658
1659
1659
1660
1661
1662
1663
1664
1664
1665
1665
1666
1666
1667
1667
1668
1668
1669
1669
1670
1671
1672
1673
1674
1674
1675
1675
1676
1676
1677
1677
1678
1678
1679
1679
1680
1681
1682
1683
1684
1684
1685
1685
1686
1686
1687
1687
1688
1688
1689
1689
1690
1691
1692
1693
1694
1694
1695
1695
1696
1696
1697
1697
1698
1698
1699
1699
1700
1701
1702
1703
1704
1704
1705
1705
1706
1706
1707
1707
1708
1708
1709
1709
1710
1711
1712
1713
1714
1714
1715
1715
1716
1716
1717
1717
1718
1718
1719
1719
1720
1721
1722
1723
1724
1724
1725
1725
1726
1726
1727
1727
1728
1728
1729
1729
1730
1731
1732
1733
1734
1734
1735
1735
1736
1736
1737
1737
1738
1738
1739
1739
1740
1741
1742
1743
1744
1744
1745
1745
1746
1746
1747
1747
1748
1748
1749
1749
1750
1751
1752
1753
1754
1754
1755
1755
1756
1756
1757
1757
1758
1758
1759
1759
1760
1761
1762
1763
1764
1764
1765
1765
1766
1766
1767
1767
1768
1768
1769
1769
1770
1771
1772
1773
1774
1774
1775
1775
1776
1776
1777
1777
1778
1778
1779
1779
1780
1781
1782
1783
1784
1784
1785
1785
1786
1786
1787
1787
1788
1788
1789
1789
1790
1791
1792
1793
1794
1794
1795
1795
1796
1796
1797
1797
1798
1798
1799
1799
1800
1801
1802
1803
1804
1804
1805
1805
1806
1806
1807
1807
1808
1808
1809
1809
1810
1811
1812
1813
1814
1814
1815
1815
1816
1816
1817
1817
1818
1818
1819
1819
1820
1821
1822
1823
1824
1824
1825
1825
1826
1826
1827
1827
1828
1828
1829
1829
1830
1831
1832
1833
1834
1834
1835
1835
1836
1836
1837
1837
1838
1838
1839
1839
1840
1841
1842
1843
1844
1844
1845
1845
1846
1846
1847
1847
1848
1848
1849
1849
1850
1851
1852
1853
1854
1854
1855
1855
1856
1856
1857
1857
1858
1858
1859
1859
1860
1861
1862
1863
1864
1864
1865
1865
1866
1866
1867
1867
1868
1868
1869
1869
1870
1871
1872
1873
1874
1874
1875
1875
1876
1876
1877
1877
1878
1878
1879
1879
1880
1881
1882
1883
1884
1884
1885
1885
1886
1886
1887
1887
1888
1888
1889
1889
1890
1891
1892
1893
1894
1894
1895
1895
1896
1896
1897
1897
1898
1898
1899
1899
1900
1901
1902
1903
1904
1904
1905
1905
1906
1906
1907
1907
1908
1908
1909
1909
1910
1911
1912
1913
1914
1914
1915
1915
1916
1916
1917
1917
1918
1918
1919
1919
1920
1921
1922
1923
1924
1924
1925
1925
1926
1926
1927
1927
1928
1928
1929
1929
1930
1931
1932
1933
1934
1934
1935
1935
1936
1936
1937
1937
1938
1938
1939
1939
1940
1941
1942
1943
1944
1944
1945
1945
1946
1946
1947
1947
1948
1948
1949
1949
1950
1951
1952
1953
1954
1954
1955
1955
1956
1956
1957
1957
1958
1958
1959
1959
1960
1961
1962
1963
1964
1964
1965
1965
1966
1966
1967
1967
1968
1968
1969
1969
1970
1971
1972
1973
1974
1974
1975
1975
1976
1976
1977
1977
1978
1978
1979
1979
1980
1981
1982
1983
1984
1984
1985
1985
1986
1986
1987
1987
1988
1988
1989
1989
1990
1991
1992
1993
1994
1994
1995
1995
1996
1996
1997
1997
1998
1998
1999
1999
2000
2001
2002
2003
2004
2004
2005
2005
2006
2006
2007
2007
2008
2008
2009
2009
2010
2011
2012
2013
2014
2014
2015
2015
2016
2016
2017
2017
2018
2018
2019
2019
2020
2021
2022
2023
2024
2024
2025
2025
2026
2026
2027
2027
2028
2028
2029
2029
2030
2031
2032
2033
2034
2034
2035
2035
2036
2036
2037
2037
2038
2038
2039
2039
2040
2041
2042
2043
2044
2044
2045
2045
2046
2046
2047
2047
2048

```

```

19  /* Interrupt in progress flag */
20  always @(posedge i_clk) begin
21      if (i_rst)
22          _int_in_progress <= 1'b0;
23      else if (o_is_reti && o_valid_insn_ce)
24          _int_in_progress <= 1'b0;
25      else if (_int_accept)
26          _int_in_progress <= 1'b1;
27  end
28
29  /* Pending interrupt queue (bitmask)*/
30  always @(posedge i_clk) begin
31      if (i_rst)
32          _int_waiting <= 0;
33      else begin
34          /* Add pending interrupt to queue */
35          if (_int_in_progress && (!_int_pend))
36              _int_waiting <= _int_waiting | _int_pend;
37
38          /* Remove interrupt when it is accepted */
39          if (_int_accept)
40              _int_waiting <= _int_waiting & ~_int_served_mask;
41      end
42  end
43
44  /* Next interrupt selection (priority encoder) */
45  always @(posedge i_clk) begin
46      if (i_rst)
47          _int_next <= 0;
48      else if (_int_accept)
49          _int_next <= 0;
50      else if (~_int_in_progress) begin
51          if (!_int_waiting)
52              _int_next <= _int_waiting;
53          else if (!_int_pend)
54              _int_next <= _int_pend;
55      end
56  end
57
58
59  /* Select Interrupt Request Line */
60  assign _int_index_next = _int_next[0] ? 4'b0000:
61                                         (_int_next[1] ? 4'b0001 :
62                                         (_int_next[2] ? 4'b0010 :
63                                         (_int_next[3] ? 4'b0011 :
64                                         (_int_next[4] ? 4'b0100 : 4'b0000)))); 
65
66  /* Register the interrupt index when interrupt is accepted */
67  always @(posedge i_clk) begin
68      if (i_rst)
69          _int_index <= 4'b0000;
70      else if (_int_accept) // Capture when interrupt triggers
71          _int_index <= _int_index_next;
72  end
73
74  /* Signal Interrupt to the System */
75  always @(posedge i_clk) begin
76      if (i_rst)
77          o_int <= 'INT_REQ_LINES' b00000;
78      else if (_int_accept)
79          o_int <= _int_next;

```

```

80     else
81         o_int <= 'INT_REQ_LINES' b00000;
82     end
83
84     assign o_zero_insn = _int_accept;
85
86     /***** HARDCODED INTERRUPT VECTOR TABLE *****/
87     assign o_i_ad_IVT = i_rst ? 16'h0002 :
88             ((|o_int) ? {8'b0, _int_index, 1'b0, (|o_int),
89                         2'b0} : 16'h0002);

```

Listing 4.12: Priority encoding and interrupt management logic

## Critical Implementation Components

**Interrupt Served Mask:** The `_int_served_mask` signal creates a bitmask with a single bit set corresponding to the interrupt being serviced. This mask is crucial for removing the specific interrupt from the waiting queue using bitwise AND operation with the complement of the mask.

**Interrupt Condition Check:** The `_int_can_occur` signal enforces three critical conditions before allowing an interrupt:

1. The current instruction cycle must be valid
2. The current instruction must not be part of an interlocked sequence (IMM, ADC, SBC, CMP)
3. No interrupt service routine is currently executing (preventing nested interrupts)

**Edge Detection Logic:** The edge detection mechanism using `_int_req_last` prevents the same interrupt from being serviced multiple times. By comparing the current interrupt request with the previous cycle's value, only rising edges (new interrupt requests) are recognized as pending interrupts.

**Interrupt In-Progress Flag:** The `_int_in_progress` flag tracks the execution state of interrupt service routines. It is set when an interrupt is accepted and cleared only when a RETI instruction completes, ensuring that no new interrupts can interrupt an ongoing ISR.

**Pending Interrupt Queue:** The `_int_waiting` register maintains a bitmask of interrupts that arrived while an ISR was executing. When a new interrupt is detected during ISR execution, it is added to the queue using bitwise OR. When an interrupt is accepted for service, it is removed using bitwise AND with the complement of the served mask.

**Priority Encoder:** The priority selection logic in `_int_next` implements a simple but effective priority scheme: queued interrupts are serviced before newly pending ones, and within each category, lower-indexed interrupts have higher priority. This ensures fair servicing while maintaining deterministic priority levels.

**Index Encoding:** The `_int_index_next` combinational logic converts the one-hot encoded interrupt selection into a binary index. This cascaded ternary operator implements a priority encoder where interrupt 0 has the highest priority and interrupt 4 has the lowest.

**Interrupt Vector Table Address Generation:** The IVT address calculation follows a simple formula: each interrupt vector is located at  $0x0002 + (\text{interrupt\_index} \times 4)$ , providing 4 bytes of space per interrupt vector. This addressing scheme allows each interrupt to have a dedicated entry point in the vector table, with enough space for a short handler or a jump to the full ISR.

## Validation

For testing this section, two interrupts were triggered while another interrupt was already in progress. As expected, the currently executing interrupt completes first, and then the higher-priority interrupt is serviced, even though it was activated after the lower-priority one.

```

1  `timescale 1ns / 1ps
2
3  module tb_soc();
4
5    reg clk, rst;
6    reg [7:0] par_i;
7    wire [7:0] par_o;
8
9    SoC uut(.i_clk(clk), .i_RST(rst), .i_PAR_I(par_i),
10      .o_PAR_O(par_o));
11
12  initial clk = 0;
13  always #10 clk = ~clk;
14
15  initial begin
16    rst = 1; par_i = 0;
17
18  repeat (10) @(posedge clk); rst = 0;
19
20  repeat (14) @(posedge clk); par_i = 8'b01; // par_i[0] start timer
21  repeat (1) @(posedge clk); par_i = 8'b00;
22  repeat (38) @(posedge clk); par_i = 8'b10; // par_i[1] stop timer
23  repeat (2) @(posedge clk); par_i = 8'b0;
24  repeat (2) @(posedge clk); par_i = 8'b01; // par_i[0] start timer
25  repeat (1) @(posedge clk); par_i = 8'b00;
26
27  repeat (40) @(posedge clk);
28  $finish;
29
30
31 endmodule

```



Figure 4.7: Priority Encoding test case

## 4.9 Latch Avoidance

### 4.9.1 Original Code

```
1  reg br, t;
2  always @(hit or cond or op or ccz or ccn or ccc or ccv) begin
3    case (cond&4'b1110)
4      'BR:   t = 1;
5      'BEQ:  t = ccz;
6      'BC:   t = ccc;
7      'BV:   t = ccv;
8      'BLT:  t = ccn^ccv;
9      'BLE:  t = (ccn^ccv)|ccz;
10     'BLTU: t = ~ccz&~ccc;
11     'BLEU: t = ccz|~ccc;
12     endcase
13     br = hit & 'Bx & (cond[0] ? ~t : t);
14   end
```

### 4.9.2 Refactored Code

```
1  reg _t;
2  always @(_valid or _cond or _op or
3           _ccz or _ccn or _ccc or _ccv) begin
4    case (_cond&4'b1110)
5      'BR:   _t = 1;
6      'BEQ:  _t = _ccz;
7      'BC:   _t = _ccc;
8      'BV:   _t = _ccv;
9      'BLT:  _t = _ccn^_ccv;
10     'BLE:  _t = (_ccn^_ccv)|_ccz;
11     'BLTU: _t = ~_ccz&~_ccc;
12     'BLEU: _t = _ccz|~_ccc;
13     default: _t = 0;
14   endcase
15   o_br = _valid & 'Bx & (_cond[0] ? ~_t : _t);
16 end
```

## 4.10 Removal of Internal Tri-State

### 4.10.1 Original Code

```
1  assign data = sum_en      ? sum : 16'bz;
2  assign data = ('ALU&'LOG) ? log : 16'bz;
3  assign data = ('ALU&'SR)  ? sr  : 16'bz;
4  assign data = 'JAL       ? pc  : 16'bz;
5
6  assign data[15:8] = w_oe ? di[15:8] : 8'bz;
7  assign data[7:0]  = l_oe ? di[7:0]  : 8'bz;
8  assign data[7:0]  = h_oe ? di[15:8] : 8'bz;
9  assign data[15:8] = lb   ? 8'b0    : 8'bz;
10
11 assign data[7:0]  = swsb ? do[7:0]  : 8'bz;
12 assign data[15:8] = sb   ? do[7:0]  : 8'bz;
13 assign data[15:8] = sw   ? do[15:8] : 8'bz;
```

#### 4.10.2 Refactored Code and analysis

```
1 assign _data =
2   o_io_rdy ? _io_data :
3   i_w_oe ? {_di[15:8], _di[7:0]} :
4   i_l_oe ? {8'h00, _di[7:0]} :
5   i_h_oe ? {8'h00, _di[15:8]} :
6
7   (i_op_type_sel == 2'b00) ? _op_arithmetic_result :
8   (i_op_type_sel == 2'b01) ? _op_logic_result :
9   (i_op_type_sel == 2'b10) ? _op_shift_result :
10  /*(i_op_type_sel == 2'b11) */ _pc;
11  // Default value is _pc, given the default value of
    o_op_type_sel in C.U.
```

## 4.11 System Validation and Testing

In this section, the final test case will be validated. We will present a step-by-step explanation, starting from the planning of the test case, through the testbench setup, and finally the different simulations and procedures required for complete verification.

### 4.11.1 Test Program

#### COE File and Instruction Memory

Coefficient (COE) files are text-based initialization files used to preload memory contents in FPGA designs. In Xilinx Vivado, COE files are commonly employed to initialize Block RAM (BRAM) or ROM modules. This ensures that memory elements contain predefined values immediately after FPGA configuration, eliminating the need for additional runtime loading logic.

In this project, COE files are used to initialize the instruction memory (ROM) of the processor. The ROM stores the complete program to be executed, with instructions fetched sequentially by the instruction fetch stage of the datapath. By embedding the instructions directly into the FPGA configuration, deterministic program execution from reset is guaranteed, simplifying both simulation and hardware validation.

The COE file defines the instruction words and their order in memory, starting from address zero. During operation, the program counter indexes this ROM, and the fetched instructions are decoded by the Control Unit and executed by the Datapath. This mechanism closely resembles the behavior of a fixed program memory in a microcontroller or embedded processor.

## Instruction Program Description

```
1  1 JAL R0, R0, #0
2  2 BR #41
3  3 //ISR TIMER
4  4 SW r6, r12, #0
5  5 SW r10, r12, #2
6  6 CMP r15, r14
7  7 BEQ #124
8  8 ADD r14, r13
9  9 SW r7, r12, #0
10 10 RETI
11 11 FFFF
12 12 // ISR EXTERN 0
13 13 SW r7, r12, #0
14 14 IMM #084
15 15 ADDI r2, r0, #E
16 16 SW r2, r4, #2
17 17 ADDI r2, r0, #3
18 18 SW r2, r4, #4
19 19 RETI
20 20 FFFF
21 21 // ISR EXTERN 1
22 22 SW r6, r12, #0
23 23 IMM #004
24 24 ADDI r2, r0, #E
25 25 SW r2, r4, #2
26 26 ADDI r2, r0, #3
27 27 SW r2, r4, #4
28 28 RETI
29 29 FFFF
30 30 //ISR UART RX
31 31 LB r3, r5, #0
32 32 SB r3, r5, #1
33 33 LB r8, r5, #5
34 34 ADDI r3, r10, #2
35 35 AND r8, r3
36 36 SB r8, r5, #5
37 37 RETI
38 38 FFFF
39 39 //ISR UART TX
40 40 ADDI r8, r10, #2
41 41 XOR r9, r8
42 42 SB r9, r11, #2
43 43 LB r3, r5, #5
44 44 AND r3, r13
45 45 SB r3, r5, #5
46 46 RETI
47 47 FFFF
48 48 //MAIN PROGRAM
49 49 IMM #076
50 50 ADDI r15, r0, #9
51 51 ADDI r14, r0, #0
52 52 ADDI r13, r0, #1
53 53 IMM #100
54 54 ADDI r12, r0, #0
55 55 IMM #110
56 56 ADDI r11, r0, #0
57 57 ADDI r10, r0, #0
58 58 ADDI r7, r0, #7
```

```

59 ADDI r6, r0, #3
60 IMM #120
61 ADDI r5, r0, #0
62 IMM #130
63 ADDI r4, r0, #0
64 IMM #004
65 ADDI r2, r0, #E
66 SW r2, r4, #2
67 ADDI r2, r0, #3
68 SW r2, r4, #4
69 FFFF
70 FFFF
71 FFFF
72 FFFF
73 //...
74 //ISR TIMER CONTINUATION at 0x0100
75 BR #5
76 AND r14, r10
77 XOR r9, r13
78 SB r9, r11, #2
79 BR #-125
80 //FFFF
81 //...

```

The instruction program is organized around an interrupt-driven architecture, in accordance with the flowcharts presented in the system design section. Most of the system functionality is executed inside interrupt service routines (ISRs), while the main program is kept intentionally minimal.

Each interrupt routine corresponds directly to a specific event defined in the design flowcharts. External interrupts are triggered by user interaction through physical buttons, which initiate or stop system operations. The timer, UART reception, and UART transmission completion are all handled exclusively within their respective ISRs, ensuring fast response times and deterministic behavior. This structure allows the processor to remain idle or in a waiting state until an event occurs, closely matching real embedded system operation.

User interaction plays a key role in triggering program execution. The only functionality that remains in the main program is the initialization and execution of the I<sup>2</sup>C communication.

This organization results in a clean separation between initialization code and event-driven execution, improves system responsiveness, and ensures that the implemented behavior follows precisely the logic defined in the design flowcharts.

**Preloaded Register Usage** To simplify program development and interrupt handling, a set of general-purpose registers is preloaded with constant values at initialization. This reduces the number of instructions required during normal execution, particularly inside interrupt service routines, where the instruction budget is limited.

By assigning fixed roles and predefined values to specific registers, frequently used constants such as peripheral base addresses, control values, and counters are always readily available. This approach eliminates repeated immediate loads, minimizes register pressure during interrupt execution, and results in clearer, more efficient, and deterministic code.

Table 4.1 summarizes the registers that are initialized with predefined values and their intended usage throughout the program.

| Register   | Initial Value | Purpose                                                          |
|------------|---------------|------------------------------------------------------------------|
| <i>r15</i> | 769           | Auto-reload value for the timer (2-second period)                |
| <i>r14</i> | 0             | Counter used to track timer overflows                            |
| <i>r13</i> | 1             | Constant value used for increments                               |
| <i>r12</i> | 1000          | Base address of the timer peripheral                             |
| <i>r11</i> | 1100          | Base address of the parallel I/O peripheral                      |
| <i>r10</i> | 0             | —                                                                |
| <i>r9</i>  | —             | Value written to <code>par_o</code> to toggle or control outputs |
| <i>r7</i>  | 7             | Control value used to start the timer                            |
| <i>r6</i>  | 3             | Control value used to stop the timer                             |
| <i>r5</i>  | 1200          | Base address of the UART peripheral                              |
| <i>r4</i>  | 1300          | Base address of the I <sup>2</sup> C peripheral                  |

Table 4.1: Preloaded registers and their usage

### Test Procedure and Objectives

Once the program is loaded, we aim to test the system with all peripherals and interrupts working simultaneously: the timer, UART, and I<sup>2</sup>C, triggered by external interrupts.

The COE files include both the initial instructions and the interrupt service routines (ISRs) for handling external events. The test objectives are as follows:

- **Timer:** The timer should start and stop correctly based on its respective external interrupts, toggling a dedicated LED every 2 seconds.
- **UART:** Sending a character from the terminal triggers an interrupt. The ISR echoes the received character back to the terminal. Once transmission completes, a second interrupt toggles an LED to indicate the end of transmission.
- **I<sup>2</sup>C:** At system startup, I<sup>2</sup>C is used to initialize a LCD display. This occurs only during the initial setup phase.

This approach ensures that the processor, memory, and peripherals operate correctly in a fully integrated scenario, validating interrupt handling, concurrent peripheral operation, and system timing behavior.

#### 4.11.2 Testbench Design

Before programming the FPGA, the design must be thoroughly verified through simulation to ensure that the processor, peripherals, and interrupt mechanisms behave correctly. Simulation allows potential design errors to be detected and corrected in a controlled environment, reducing the risk of issues when the system is implemented in hardware.

To perform this verification, a **testbench** is developed. A testbench is a separate module that emulates the external environment of the system under test. It provides inputs to the design, monitors outputs, and observes internal signals, enabling validation of system behavior under realistic conditions without requiring physical hardware. Through simulation, the testbench can model complex scenarios, including peripheral responses, data transmission, and external interrupts, allowing designers to verify that the design meets the intended functionality.

In our testbench in Listing 4.11.2, several key scenarios are simulated. First, the I<sup>2</sup>C interface is tested by modeling a slave device that responds with an acknowledgment (ACK), verifying correct communication during the initial setup. Next, an external interrupt from `par_i` is simulated to start the timer, allowing observation of correct timer initialization and counting. The UART interface is also tested by simulating characters arriving on the RX line. This ensures that the UART interrupt is triggered correctly and that the received

data is properly echoed back. Finally, events are included to stop the timer, confirming that the timer can be halted and its associated interrupts are handled as expected.

```

1  `timescale 1ns / 1ps
2
3  module tb_soc();
4
5    reg clk;
6    reg rst;
7    reg [7:0] par_i;
8    wire [7:0] par_o;
9    reg i_rx_line;
10   reg [7:0] rx_message;
11   wire o_tx_line;
12   wire o_i2c_scl;
13   wire o_i2c_sda;
14
15  SoC uut(
16    .i_clk(clk), .i_RST(rst), .i_PAR_I(par_i),
17    .i_RX_LINE(i_rx_line), .o_PAR_O(par_o),
18    .o_TX_LINE(o_tx_line), .o_I2C_SCL(o_i2c_scl),
19    .o_I2C_SDA(o_i2c_sda));
20
21  initial clk = 0;
22  always #4 clk = ~clk;
23
24  initial begin
25    rst = 1;
26    par_i = 0;
27
28    rx_message = 8'h85;
29    i_rx_line = 1'b1;
30
31    repeat (10) @(posedge clk);
32    rst = 0;
33    //start timer
34    repeat (100) @(posedge clk);
35    par_i = 8'b01; // par_i[0]
36    repeat (2.5) @(posedge clk);
37    par_i = 8'b00;
38
39    repeat(100) @(posedge clk);
40    i_rx_line = 0;
41    repeat (25) @(posedge clk);
42    i_rx_line = rx_message[0];
43    repeat (25) @(posedge clk);
44    i_rx_line = rx_message[1];
45    repeat (25) @(posedge clk);
46    i_rx_line = rx_message[2];
47    repeat (25) @(posedge clk);
48    i_rx_line = rx_message[3];
49    repeat (25) @(posedge clk);
50    i_rx_line = rx_message[4];
51    repeat (25) @(posedge clk);
52    i_rx_line = rx_message[5];
53    repeat (25) @(posedge clk);
54    i_rx_line = rx_message[6];
55    repeat (25) @(posedge clk);
56    i_rx_line = rx_message[7];
57    repeat (25) @(posedge clk);
58    i_rx_line = 1;

```

```

59      //stop timer
60      repeat (95) @(posedge clk);
61      par_i = 8'b10; // par_i[1]
62      repeat (5) @(posedge clk);
63      par_i = 8'b0;
64
65      repeat (40) @(posedge clk);
66      $finish;
67
68      end
69
70  endmodule

```

#### 4.11.3 Behavioral Simulation

The first stage of simulation performed is the behavioral simulation. Behavioral simulation verifies the logical functionality of the design without considering hardware-specific implementation details such as gate delays, routing, or timing constraints. It focuses solely on the correctness of the RTL (Register Transfer Level) code, ensuring that the design produces the expected outputs for given inputs. Behavioral simulation does not account for real propagation delays, clock skew, or setup and hold violations, but it is extremely useful for early validation of control logic, data paths, and peripheral interactions.

In our behavioral simulation, all major system functions were tested simultaneously. The figure shows the following key events:

- The timer starts and stops correctly (orange wave) in response to the `par_i` external interrupt. This confirms that interrupt handling and timer control logic operate as expected.
- The UART receives characters on the RX line (light blue wave). The data is immediately echoed back to the terminal (pink wave), and upon completion of the transmission, an external signal on `par_o` is toggled to indicate the end of the UART activity.
- The I<sup>2</sup>C interface correctly acknowledges the slave device and successfully transmits the intended data, demonstrating that initialization and data transfer occur as intended(purple line).

Importantly, in behavioral simulation, no timing delays are present. Signals change instantaneously according to the RTL behavior, allowing us to verify the correctness of logical sequences and functional interactions without being affected by physical implementation constraints. This stage confirms that the processor, peripherals, and interrupts interact correctly under ideal conditions, forming a solid foundation for subsequent post-synthesis and timing simulations.



Figure 4.8: Behavioural Simulation of final test case

#### 4.11.4 Post-Synthesis Timing Simulation and Synthesized Design

Before moving to post-synthesis simulation, certain RTL constructs were included to aid debugging and verification. For example:

```
1 (* keep = "true" *) reg [15:0] _cnt;
```

This synthesis attribute instructs the tool to preserve the register `_cnt` and prevents it from being optimized away, even if it appears unused in the logic. Using `(* keep = "true" *)` is particularly useful for waveform inspection, as it allows internal signals to remain visible in simulations and makes it possible to track values for timers, counters, or control signals during interrupt handling and peripheral operations.

#### Clock Wizard and Synchronous Reset

To ensure correct operation on the FPGA, the design includes a **clock wizard** and a synchronous reset mechanism. The clock wizard generates a stable 50 MHz clock from the input clock (`i_clk`) and provides a `locked` signal indicating when the output clock is stable. A synchronous reset, implemented with a 3-bit shift register, ensures that all registers and modules are properly initialized after the clock stabilizes, preventing metastability and ensuring deterministic reset release. The relevant code is shown below:

```
1 wire _clk50;
2 wire _locked;
3
4 // Clock wizard instantiation
5 clk_wiz_0 u_clk_wiz (
6   .clk_in1 (i_clk),
7   .clk_out1(_clk50),
8   .reset    (i_rst),
9   .locked   (_locked)
10 );
11
12 reg [2:0] _rst_sync;
13
14 always @ (posedge _clk50) begin
15   if (i_rst || ~_locked)
16     _rst_sync <= 3'b111;
17   else
18     _rst_sync <= {_rst_sync[1:0], 1'b0};
19 end
20
21 wire _rst_sys = _rst_sync[2];
```

Listing 4.13: Clock Wizard instantiation and synchronous reset

The shift register ensures that the system reset (`_rst_sys`) is deasserted only after three consecutive cycles of a stable clock. This guarantees that all synchronous modules—including the processor’s Datapath, Control Unit, and peripheral buffers—are properly initialized.

#### Behavioral vs Post-Synthesis Simulation

Unlike Behavioural simulation, Post-synthesis timing simulation, includes actual gate-level delays introduced during synthesis, mapping, and placement on the FPGA. It takes into account routing delays, logic propagation, and timing constraints, providing a realistic view of how the design will behave on hardware. In our design, post-synthesis simulation confirmed that all interactions remain correct, with the only observable difference from behavioral simulation being a small propagation delay of approximately 18 ns. This minor delay does not affect the functional behavior of the timer, UART, I<sup>2</sup>C, or interrupt handling.



Figure 4.9: Post-Synthesis Timing Simulation

### Gate-Level Design Outputs

After synthesis, the tool produces a gate-level netlist representing the actual logic that will be implemented on the FPGA. This synthesized design (Figure 4.10) clearly shows the processor's core components, including:

- The **Datapath**, responsible for arithmetic, logic, and data movement operations.
- The **Control Unit**, which generates the appropriate control signals to drive the Datapath and manage instruction execution.
- **Buffers and Registers for Parallel I/O**, which interface the processor with external peripherals while ensuring correct timing and synchronization.



Figure 4.10: Synthesized Design

These outputs provide a detailed, hardware-accurate representation of the system, allowing for post-synthesis timing verification and ensuring that all components are correctly connected and optimized for FPGA implementation.

#### 4.11.5 Post-Implementation Timing Simulation

##### Constraints

In FPGA design, constraints are essential directives that guide the synthesis and implementation tools to meet design requirements. Constraints are divided into two main categories: timing constraints and physical constraints. Understanding when and how to apply each type is crucial for achieving a functional and reliable design.

Timing constraints define the temporal requirements that the design must satisfy. They ensure that signals propagate through the logic within acceptable time windows, preventing setup and hold time violations. The primary timing constraints include:

- **Clock definitions:** Specify the clock frequency, duty cycle, and source using `create_clock`.
- **Input/Output delays:** Define the time relationship between external signals and the internal clock domain using `set_input_delay` and `set_output_delay`.
- **False paths:** Indicate signal paths that do not require timing analysis using `set_false_path`.
- **Multi-cycle paths:** Specify paths that intentionally take multiple clock cycles using `set_multicycle_path`.

Physical constraints control the placement and routing of design elements on the FPGA fabric. These constraints include:

- **Pin assignments:** Map design signals to specific FPGA package pins (LOC constraints).
- **I/O standards:** Define voltage levels and signaling standards (IOSTANDARD).
- **Area constraints:** Restrict logic placement to specific regions (AREA\_GROUP, PBLOCK).
- **Placement constraints:** Fix the location of specific logic elements (BEL, RLOC).

Figure 4.11 shows the pin assignment constraints for the design, mapping signals to specific FPGA package pins with appropriate I/O standards (LVCMOS33 at 3.3V). The timing constraints applied are shown in Listing 4.14.

| Name       | Direction | Board Part Pin | Board Part Interface | Interface | Neg Diff Pair | Package Pin | Fixed | Bank | I/O Std   | Vcco  | Vref | Drive Strength |
|------------|-----------|----------------|----------------------|-----------|---------------|-------------|-------|------|-----------|-------|------|----------------|
| i_clk      | IN        |                |                      |           |               | K17         | ✓     | 35   | LVCMOS33* | 3.300 |      |                |
| i_rst      | IN        |                |                      |           |               | T16         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_rx_line  | IN        |                |                      |           |               | Y17         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| o_tx_line  | OUT       |                |                      |           |               | T17         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| i_par_i[7] | IN        |                |                      |           |               | U12         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_par_i[6] | IN        |                |                      |           |               | T12         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_par_i[5] | IN        |                |                      |           |               | Y14         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_par_i[4] | IN        |                |                      |           |               | W14         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_par_i[3] | IN        |                |                      |           |               | T10         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_par_i[2] | IN        |                |                      |           |               | T11         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| i_par_i[1] | IN        |                |                      |           |               | K19         | ✓     | 35   | LVCMOS33* | 3.300 |      |                |
| i_par_i[0] | IN        |                |                      |           |               | Y16         | ✓     | 34   | LVCMOS33* | 3.300 |      |                |
| o_par_o[7] | OUT       |                |                      |           |               | V18         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[6] | OUT       |                |                      |           |               | V17         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[5] | OUT       |                |                      |           |               | U15         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[4] | OUT       |                |                      |           |               | U14         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[3] | OUT       |                |                      |           |               | R14         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[2] | OUT       |                |                      |           |               | P14         | ✓     | 34   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[1] | OUT       |                |                      |           |               | M15         | ✓     | 35   | LVCMOS33* | 3.300 | 12   | ▼              |
| o_par_o[0] | OUT       |                |                      |           |               | A11A        | ..    | 35   | LVCMOS33* | 3.300 | 12   | ..             |

Figure 4.11: Pin assignment constraints showing the mapping of design signals to FPGA package pins. Input signals (i\_clk, i\_rst, i\_rx\_line, i\_par\_i) and output signals (o\_tx\_line, o\_par\_o) are assigned to specific pins with LVCMOS33 I/O standard at 3.3V supply voltage.

```

1 # False path constraints for asynchronous inputs
2 set_false_path -from [get_ports {i_par_i[*]}]
3 set_false_path -from [get_ports i_rst]
4
5 # Input delay constraints (asynchronous, no relationship to clock)
6 set_input_delay -clock i_clk 0.000 [get_ports {i_par_i[*]}]
7 set_input_delay -clock i_clk 0.000 [get_ports i_rst]
8
9 # False path constraints for outputs (no external timing
10 requirements)
11 set_false_path -to [get_ports {o_par_o[*]}]
12 # Output delay constraints

```

```
13 | set_output_delay -clock i_clk 0.000 [get_ports {o_par_o[*]}]
```

Listing 4.14: Timing constraints file (XDC) for the design

The timing constraints define false paths for the parallel input bus (`i_par_i`) and reset signal (`i_rst`), as these are asynchronous signals not synchronized to the system clock. This prevents the timing analyzer from reporting false violations on these paths. Similarly, the parallel output bus (`o_par_o`) is marked as a false path since there are no external timing requirements for these signals—they are simply displayed on LEDs.

The zero input and output delays indicate that these signals do not have setup or hold time requirements relative to the clock edge, which is appropriate for asynchronous I/O and simple parallel interfaces. For high-speed synchronous interfaces, these values would need to reflect the actual timing requirements of the external devices.

## Utilization Report

The utilization report, generated after place-and-route, summarizes how the design consumes FPGA resources. In this short example, shows the number and percentage of logic elements (LUTs, flip-flops), memory blocks (Block RAM), DSP slices, and I/O pins used by the implementation.

| 1. Slice Logic         |      |       |            |           |       |  |
|------------------------|------|-------|------------|-----------|-------|--|
| Site Type              | Used | Fixed | Prohibited | Available | Util% |  |
| Slice LUTs             | 469  | 0     | 0          | 17600     | 2.66  |  |
| LUT as Logic           | 453  | 0     | 0          | 17600     | 2.57  |  |
| LUT as Memory          | 16   | 0     | 0          | 6000      | 0.27  |  |
| LUT as Distributed RAM | 16   | 0     |            |           |       |  |
| LUT as Shift Register  | 0    | 0     |            |           |       |  |
| Slice Registers        | 331  | 0     | 0          | 35200     | 0.94  |  |
| Register as Flip Flop  | 331  | 0     | 0          | 35200     | 0.94  |  |
| Register as Latch      | 0    | 0     | 0          | 35200     | 0.00  |  |
| F7 Muxes               | 2    | 0     | 0          | 8800      | 0.02  |  |
| F8 Muxes               | 1    | 0     | 0          | 4400      | 0.02  |  |

  

| 3. Memory      |      |       |            |           |       |  |
|----------------|------|-------|------------|-----------|-------|--|
| Site Type      | Used | Fixed | Prohibited | Available | Util% |  |
| Block RAM Tile | 2    | 0     | 0          | 60        | 3.33  |  |
| RAMB36/FIFO*   | 0    | 0     | 0          | 60        | 0.00  |  |
| RAMB18         | 4    | 0     | 0          | 120       | 3.33  |  |
| RAMB18E1 only  | 4    |       |            |           |       |  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. Tile, that tile can still accommodate a RAMB18E1.

  

| 4. DSP    |      |       |            |           |       |  |
|-----------|------|-------|------------|-----------|-------|--|
| Site Type | Used | Fixed | Prohibited | Available | Util% |  |
| DSPs      | 0    | 0     | 0          | 80        | 0.00  |  |

  

| 5. IO and GT Specific |      |       |            |           |       |  |
|-----------------------|------|-------|------------|-----------|-------|--|
| Site Type             | Used | Fixed | Prohibited | Available | Util% |  |
| Bonded IOB            | 22   | 22    | 0          | 100       | 22.00 |  |
| IOB Master Pads       | 12   |       |            |           |       |  |
| IOB Slave Pads        | 10   |       |            |           |       |  |
| Bonded IPADs          | 0    | 0     | 0          | 2         | 0.00  |  |
| Bonded IPADs          | 0    | 0     | 0          | 130       | 0.00  |  |
| PHY_CONTROL           | 0    | 0     | 0          | 2         | 0.00  |  |
| PHASER_REF            | 0    | 0     | 0          | 2         | 0.00  |  |

Figure 4.12: Utilization Report

This report is essential for verifying that the design fits within the target device and for assessing whether there is adequate resource headroom for future design modifications. It also helps determine if a smaller, more cost-effective FPGA could be used, or conversely, if a larger device is needed to accommodate additional functionality.

## Timing Report

The timing report is a critical output generated during the implementation phase of the FPGA design flow, specifically after the place-and-route process has completed. Vivado's timing analysis engine performs static timing analysis (STA) to verify that all timing constraints specified in the design are met and that the circuit can operate reliably at the target clock frequency.

Static timing analysis examines all possible paths in the design to ensure that data can propagate from one register to another within the specified clock period. The analysis considers three main categories of timing checks:

- **Setup Time:** The minimum time before the clock edge that data must be stable at the input of a flip-flop
- **Hold Time:** The minimum time after the clock edge that data must remain stable at the input of a flip-flop
- **Pulse Width:** The minimum duration that a clock signal must remain high or low

Figure 4.13 presents the timing summary for the final implementation. This summary provides a high-level overview of the design's timing performance across all three categories of timing checks.

Design Timing Summary

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>6.412 ns</b> | Worst Hold Slack (WHS): <b>0.154 ns</b> | Worst Pulse Width Slack (WPWS): <b>8.750 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>418</b>       | Total Number of Endpoints: <b>418</b>   | Total Number of Endpoints: <b>130</b>                    |

All user specified timing constraints are met.

Figure 4.13: Timing summary showing setup, hold, and pulse width analysis results. All timing constraints are met with positive slack values.

**Setup Timing Analysis:** The setup timing analysis verifies that data arrives at destination registers with sufficient time before the clock edge:

- **Worst Negative Slack (WNS):** 6.412 ns — This positive value indicates that the slowest path in the design has 6.412 ns of timing margin. This means the design can operate at a frequency higher than the specified 50 MHz (20 ns period). If this value were negative, it would indicate that at least one path is too slow and would fail to meet setup requirements by that amount.
- **Total Negative Slack (TNS):** 0.000 ns — The sum of all setup violations across all endpoints. A value of zero confirms that no setup violations exist anywhere in the design.
- **Number of Failing Endpoints:** 0 — No paths fail to meet setup timing requirements.
- **Total Number of Endpoints:** 418 — The design contains 418 timing paths that were analyzed for setup timing.

**Hold Timing Analysis:** The hold timing analysis ensures that data remains stable for a sufficient duration after the clock edge:

- **Worst Hold Slack (WHS):** 0.154 ns — The smallest hold margin in the design. This positive value confirms that all paths meet hold time requirements with at least 0.154 ns of margin.
- **Total Hold Slack (THS):** 0.000 ns — The sum of all hold violations. Zero indicates no hold violations exist.
- **Number of Failing Endpoints:** 0 — All 418 endpoints meet hold timing requirements.

**Pulse Width Analysis:** The pulse width analysis verifies that clock signals maintain adequate high and low durations:

- **Worst Pulse Width Slack (WPWS):** 8.750 ns — Significant margin exists for clock pulse width requirements.
- **Total Pulse Width Negative Slack (TPWS):** 0.000 ns — No pulse width violations.
- **Number of Failing Endpoints:** 0 out of 130 clock pins — All clock signals meet pulse width requirements.

The summary conclusively states: **”All user specified timing constraints are met.”** This confirms that the design is properly constrained and meets all timing requirements for reliable operation at the target frequency.

Figure 4.14 provides detailed information about the critical path—the slowest path in the design that determines the maximum operating frequency.

| Max Delay Paths                |                                                                                                   |          |                                |                     |
|--------------------------------|---------------------------------------------------------------------------------------------------|----------|--------------------------------|---------------------|
| <hr/>                          |                                                                                                   |          |                                |                     |
| Slack (MET) :                  | 6.412ns (required time - arrival time)                                                            |          |                                |                     |
| Source:                        | <hidden>                                                                                          |          |                                |                     |
| Destination:                   | (rising edge-triggered cell RAMB18E1 clocked by clk {rise@0.000ns fall@10.000ns period=20.000ns}) |          |                                |                     |
| Path Group:                    | clk                                                                                               |          |                                |                     |
| Path Type:                     | Setup (Max at Slow Process Corner)                                                                |          |                                |                     |
| Requirement:                   | 20.000ns (clk rise@20.000ns - clk rise@0.000ns)                                                   |          |                                |                     |
| Data Path Delay:               | 13.048ns (logic 5.161ns (39.553%) route 7.887ns (60.447%))                                        |          |                                |                     |
| Logic Levels:                  | 10 (CARRY4=3 LUT2=1 LUT3=1 LUT4=1 LUT6=3 RAMD32=1)                                                |          |                                |                     |
| Clock Path Skew:               | -0.097ns (DCD - SCD + CPR)                                                                        |          |                                |                     |
| Destination Clock Delay (DCD): | 4.891ns = ( 24.892 - 20.000 )                                                                     |          |                                |                     |
| Source Clock Delay (SCD):      | 5.380ns                                                                                           |          |                                |                     |
| Clock Pessimism Removal (CPR): | 0.391ns                                                                                           |          |                                |                     |
| Clock Uncertainty:             | 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE                                                       |          |                                |                     |
| Total System Jitter (TSJ):     | 0.071ns                                                                                           |          |                                |                     |
| Total Input Jitter (TIJ):      | 0.000ns                                                                                           |          |                                |                     |
| Discrete Jitter (DJ):          | 0.000ns                                                                                           |          |                                |                     |
| Phase Error (PE):              | 0.000ns                                                                                           |          |                                |                     |
| Location                       | Delay type                                                                                        | Incr(ns) | Path(ns)                       | Netlist Resource(s) |
| <hr/>                          |                                                                                                   |          |                                |                     |
| K17                            | (clock clk rise edge)                                                                             | 0.000    | 0.000 r                        |                     |
|                                |                                                                                                   | 0.000    | 0.000 r i_clk (IN)             |                     |
| K17                            | net (fo=0)                                                                                        | 0.000    | 0.000 r i_clk                  |                     |
| K17                            | IBUF (Prop_ibuf_I_0)                                                                              | 1.492    | 1.492 r i_clk_IBUF_inst/I      |                     |
|                                | net (fo=1, routed)                                                                                | 2.076    | 3.568 r i_clk_IBUF             |                     |
| BUFGCTRL_X0Y16                 | BUFG (Prop_bufg_I_0)                                                                              | 0.101    | 3.669 r i_clk_IBUF_BUFG_inst/I |                     |
| BUFGCTRL_X0Y16                 | net (fo=129, routed)                                                                              | 1.711    | 5.380 <hidden>                 |                     |
| RAMB18_X2Y12                   | RAMB18E1                                                                                          |          | r <hidden>                     |                     |

Figure 4.14: Detailed timing report showing the critical path breakdown, including source and destination registers, path delays, and contributing delay components.

The report also provides the physical location and netlist resources for each stage of the critical path, showing how the signal propagates from the Block RAM at location RAMB18\_X2Y12 through various logic elements (LUTs, carry chains, buffers) until reaching the destination register. This level of detail is invaluable for understanding bottlenecks and optimizing critical paths if needed.

**Pin Constraint Verification:** Figure 4.15 shows the results of pin constraint verification, confirming that all I/O pins are properly constrained.

```
6. checking no_output_delay (0)
-----
There are 0 ports with no output delay specified.

There are 0 ports with no output delay but user has a false path constraint

There are 0 ports with no output delay but with a timing clock defined on it or propagating through it
```

Figure 4.15: Pin constraint verification showing no unconstrained output pins or timing violations.

The verification report confirms:

- **No output delay (0):** 0 ports with no output delay specified — all output pins have proper timing constraints
- **No false path constraints:** 0 ports with missing false path constraints — all paths that should be excluded from timing analysis are properly identified
- **No timing clock:** 0 ports with output delay but no timing clock defined — all constrained outputs have associated clock domains

This verification ensures that the pin constraints specified in the XDC (Xilinx Design Constraints) file are complete and correctly applied to all I/O ports. Proper I/O timing constraints are essential for ensuring that the FPGA can reliably interface with external devices.

The timing analysis results demonstrate that the design has been successfully implemented with significant timing margin.

The post-implementation timing report is the most accurate and is used to verify that the physical design meets all timing requirements. It accounts for actual wire delays, cell delays, and clock distribution delays that are only known after place-and-route completes. This report serves as the final sign-off that the design is ready for programming onto the FPGA and will operate correctly at the specified clock frequency.

## Final Simulation

Figure 4.16 shows the post-implementation timing simulation of the complete system. This simulation incorporates all delays introduced by synthesis, placement, and routing on the target FPGA device, providing the most accurate representation of the design's timing behavior prior to actual hardware deployment.

Compared with post-synthesis timing simulation, the post-implementation simulation shows a slight recovery of approximately 11 ns due to optimization during placement and routing. Despite these minor timing adjustments, all functional objectives of the test case are fully met. The timer correctly starts and stops in response to external interrupts, UART data is received and echoed accurately, the UART transmission completion triggers the expected `par_o` toggle, and I<sup>2</sup>C communication is correctly acknowledged and completed.

These results confirm that the design operates correctly under real hardware timing constraints and that the test case is ready for implementation on the FPGA.



Figure 4.16: Post-Implementation Timing Simulation

#### 4.11.6 Hardware Validation

After completing simulation and post-implementation verification, the design was programmed onto the target FPGA for hardware validation. The test case was implemented using the following external connections:

- **External Interrupts:** Button1 (pin Y13) is connected to external interrupt 0 and is used to start the timer. Button2 (pin K19) is connected to external interrupt 1 and is used to stop the timer.
- **LED Indicators:** LED0 (pin M14) is configured as the blinking LED, driven by the timer with a 2-second period. LED1 (pin M15) toggles whenever a UART transmission completes, providing a visual indication of UART activity.
- **UART and I<sup>2</sup>C Interfaces:** The FPGA UART is connected to an FTDI module. The FTDI RX pin is connected to the FPGA TX, and the FTDI TX is connected to the FPGA RX. Terminal communication is established via a serial terminal (e.g., using the command `sudo screen /dev/ttyUSB0 115200`) to send characters and observe the echo response. Additionally, the I<sup>2</sup>C lines (SDA and SCL) are connected to the corresponding pins on the peripheral board for initialization and data transfer.

During validation, the following behavior was observed: the timer started and stopped correctly in response to the external interrupts, LED0 blinked with the expected 2-second period, and LED1 toggled after each UART transmission completion. Characters sent through the terminal were echoed back accurately via the FPGA UART, demonstrating proper interrupt handling and peripheral interaction. The I<sup>2</sup>C initialization also completed successfully, confirming correct communication with the connected devices.

The following photographs illustrate the FPGA setup, button connections, LEDs, and terminal interaction during hardware testing:



Figure 4.17: FPGA before configuration



Figure 4.18: Program loaded in the FPGA and main finished: display turned on and no interrupts happened



Figure 4.19: Button 1 pressed, timer started running and led 0 toggled



Figure 4.20: Button 2 pressed, timer stop running and led 0 stayed in the same status; Character was received and echoed so led 1 toggled



Figure 4.21: Echoed character from FPGA

# Conclusion

This project successfully refactored and enhanced the GR0040 System-on-Chip while preserving full functional equivalence with the original design. The redesigned architecture improves modularity, maintainability, and extensibility through a clear separation of datapath and control logic, a cleaner memory organization, and standardized HDL practices.

Key enhancements include a slightly expanded instruction set, a robust priority-based interrupt controller, and fully functional peripherals such as a timer, UART, and I2C interface, with SFRs specification. Comprehensive simulation, timing analysis, and hardware validation on the Zybo Z7-10 FPGA confirmed correct operation.

Beyond meeting all technical objectives, the project provided valuable hands-on experience in processor design, Verilog HDL, FPGA implementation, and system-level debugging. Undoubtedly, the resulting SoC forms a solid foundation for future extensions.

# Bibliography

1. <http://www.fpgacpu.org/papers/soc-gr0040-paper.pdf>
2. [https://digilent.com/reference/programmable-logic/zybo-z7/reference-manual?srsltid=AfmB0oqTFpjJn9WPr3BxTcB\\_c0GEDRAdSx-N\\_vSIQe6iw\\_-96VB-ttbH](https://digilent.com/reference/programmable-logic/zybo-z7/reference-manual?srsltid=AfmB0oqTFpjJn9WPr3BxTcB_c0GEDRAdSx-N_vSIQe6iw_-96VB-ttbH)
3. <https://docs.amd.com/>