



# Documentation

Version 1.0

## Table of Contents

|                                                        |           |
|--------------------------------------------------------|-----------|
| <b>1 Overview</b>                                      | <b>3</b>  |
| 1.1 About Steel Core . . . . .                         | 3         |
| 1.2 Licensing . . . . .                                | 3         |
| 1.3 Specifications version . . . . .                   | 3         |
| 1.4 Online repository . . . . .                        | 3         |
| 1.5 Hardware description language . . . . .            | 3         |
| 1.6 Supported privilege levels . . . . .               | 3         |
| 1.7 Implemented extensions . . . . .                   | 4         |
| 1.8 Implemented control and status registers . . . . . | 4         |
| 1.9 Integration with other devices . . . . .           | 4         |
| 1.10 Memory alignment rules . . . . .                  | 4         |
| 1.11 Microarchitecture . . . . .                       | 5         |
| <b>2 Configuration</b>                                 | <b>7</b>  |
| 2.1 Boot address . . . . .                             | 7         |
| 2.2 CSRs reset values . . . . .                        | 7         |
| <b>3 Input and output signals</b>                      | <b>8</b>  |
| 3.1 Interrupt controller interface . . . . .           | 8         |
| 3.2 Instruction fetch interface . . . . .              | 8         |
| 3.3 Data read/write interface . . . . .                | 9         |
| 3.4 Real time counter interface . . . . .              | 9         |
| 3.5 CLK and RESET signals . . . . .                    | 9         |
| <b>4 Timing diagrams</b>                               | <b>10</b> |
| 4.1 Instruction fetch . . . . .                        | 10        |
| 4.2 Data fetch . . . . .                               | 10        |
| 4.3 Data writing . . . . .                             | 10        |
| 4.4 Interrupt request . . . . .                        | 11        |
| 4.5 time CSR update . . . . .                          | 11        |
| <b>5 Exceptions and Interrupts</b>                     | <b>12</b> |
| 5.1 Supported exceptions and interrupts . . . . .      | 12        |
| 5.2 Trap handling in Steel . . . . .                   | 12        |
| 5.3 Nested interrupts capability . . . . .             | 12        |

|          |                                   |           |
|----------|-----------------------------------|-----------|
| <b>6</b> | <b>Steel-based system example</b> | <b>13</b> |
| <b>7</b> | <b>Implementation details</b>     | <b>14</b> |
| 7.1      | Control Unit . . . . .            | 14        |
| 7.2      | ALU . . . . .                     | 15        |
| 7.3      | Integer Register File . . . . .   | 16        |
| 7.4      | Branch Unit . . . . .             | 17        |
| 7.5      | Load Unit . . . . .               | 18        |
| 7.6      | Store Unit . . . . .              | 19        |
| 7.7      | Immediate Generator . . . . .     | 20        |
| 7.8      | CSR Register File . . . . .       | 21        |
| 7.9      | Machine Control . . . . .         | 23        |

# 1 Overview

## 1.1 About Steel Core

Steel is a 3-stage single-issue in-order RISC-V microprocessor core designed to be simple and easy to use. It is intended for use in FPGAs as the processing unit in embedded systems projects.

## 1.2 Licensing

Steel is distributed under the MIT License. The license text is reproduced below. Read it carefully and make sure you understand its terms before using Steel in your own projects.

---

### MIT License

Copyright (c) 2020 Rafael de Oliveira Calçada

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

---

## 1.3 Specifications version

Steel aims to be compliant with the following RISC-V specifications:

- Version 20191213 of RISC-V Unprivileged ISA
- Version 20190608-Priv-MSU-Ratified of RISC-V Privileged Architecture

## 1.4 Online repository

Steel files and documentation are available at GitHub ([github.com/rafaelcalcada/steel-core](https://github.com/rafaelcalcada/steel-core)).

## 1.5 Hardware description language

The hardware of Steel is described in Verilog (IEEE Std. 1364-2005).

## 1.6 Supported privilege levels

Steel supports only M-mode.

## 1.7 Implemented extensions

Steel implements the base instruction set RV32I and the Zicsr extension.

## 1.8 Implemented control and status registers

The control and status registers implemented in Steel are shown in table 1 (below). The required M-mode registers not shown in the table return the hardwired value predicted by the specifications when read.

Table 1 – Steel Core implemented CSRs

| CSR                  | Name                                     | Address |
|----------------------|------------------------------------------|---------|
| <b>cycle</b>         | <i>Cycle Counter</i>                     | 0xC00   |
| <b>time</b>          | <i>System Timer</i>                      | 0xC01   |
| <b>instret</b>       | <i>Instructions Retired</i>              | 0xC02   |
| <b>mstatus</b>       | <i>Machine Status</i>                    | 0x300   |
| <b>misa</b>          | <i>Machine ISA</i>                       | 0x301   |
| <b>mie</b>           | <i>Machine Interrupt Enable</i>          | 0x304   |
| <b>mtvec</b>         | <i>Machine Trap Vector</i>               | 0x305   |
| <b>mscratch</b>      | <i>Machine Scratch</i>                   | 0x340   |
| <b>mepc</b>          | <i>Machine Exception Program Counter</i> | 0x341   |
| <b>mcause</b>        | <i>Machine Cause</i>                     | 0x342   |
| <b>mtval</b>         | <i>Machine Trap Value</i>                | 0x343   |
| <b>mip</b>           | <i>Machine Interrupt Pending</i>         | 0x344   |
| <b>mcycle</b>        | <i>Machine Cycle Counter</i>             | 0xB00   |
| <b>minstret</b>      | <i>Machine Instructions Retired</i>      | 0xB01   |
| <b>mcountinhibit</b> | <i>Machine Counter Inhibit</i>           | 0x320   |

## 1.9 Integration with other devices

Steel must be connected to a word addressed memory with read/write latency of 1 clock cycle. These characteristics facilitate integration with Block RAMs in FPGAs.

Steel can optionally be connected to an interrupt controller and a real-time counter.

## 1.10 Memory alignment rules

An address-misaligned exception occurs when the running software attempts to access an address in disagreement with the following alignment rules:

- the address in jump, branch and load/store word instructions must be aligned on a 4-byte boundary (the last two bits of the address must be zero);
- the address in load/store halfword instructions (lh, lhu and sh) must be aligned on a 2-byte boundary (the last bit of the address must be zero);
- the address in load/store byte instructions (lb, lbu and sb) does not need to be aligned.

Address-misaligned exceptions must be resolved by the trap handler.

## 1.11 Microarchitecture

Steel has 3 pipeline stages, a single execution thread and issues only one instruction per clock cycle. Therefore, all instructions are executed in program order.

Fig. 1 (below) shows Steel microarchitecture in register-transfer level (RTL). Circles and trapezoids represent combinational logic units. Rectangles represent sequential logic units. Implementation details can be found in section 7.

Fig. 2 (next page) shows the tasks performed by each pipeline stage. In the first stage, the core generates the program counter and fetches the instruction from memory. In the second, the instruction is decoded and the control signals for all units are generated. Branches, jumps and stores are executed in advance in this stage, which also generates the immediates and fetches the data from memory for load instructions. The last stage executes all other instructions and writes back the results in the register file.

**Figure 1 – Steel Core microarchitecture in detail**



**Figure 2 – Steel Core pipeline overview**



## 2 Configuration

Steel configuration parameters can be modified by editing the `globals.vh` file (located inside the `rtl` directory). The following sections describe the parameters that can be changed.

### 2.1 Boot address

The `BOOT_ADDRESS` parameter sets the memory position of the first instruction the core will fetch after reset. It can be changed to any 32-bit value.

### 2.2 CSRs reset values

The reset values of several control and status registers (CSRs) can be modified. Table 2 shows these CSRs and the accepted values.

**Table 2 – Steel Core configuration parameters**

| Parameter                           | Description                                                                                                                                                                                            |
|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>MCYCLE_RESET</code>           | Value of the <code>mcycle</code> register after reset. It can be changed to any 32-bit value.                                                                                                          |
| <code>MCYCLES_RESET</code>          | Value of the <code>mcycles</code> register after reset. It can be changed to any 32-bit value.                                                                                                         |
| <code>TIME_RESET</code>             | Value of the <code>time</code> register after reset. It can be changed to any 32-bit value.                                                                                                            |
| <code>TIMEH_RESET</code>            | Value of the <code>timeh</code> register after reset. It can be changed to any 32-bit value.                                                                                                           |
| <code>MINSTRET_RESET</code>         | Value of the <code>minstret</code> register after reset. It can be changed to any 32-bit value.                                                                                                        |
| <code>MINSTRETH_RESET</code>        | Value of the <code>minstreth</code> register after reset. It can be changed to any 32-bit value.                                                                                                       |
| <code>MTVEC_BASE_RESET</code>       | Value of the <code>base</code> field of <code>mtvec</code> register after reset. The value is used in the trap handler address calculation (see section 5). It can be changed to any 30-bit value.     |
| <code>MTVEC_MODE_RESET</code>       | Value of the <code>mode</code> field of <code>mtvec</code> register after reset. It defines the interrupt mode (see section 5) and can be changed to the value 00 (direct mode) or 01 (vectored mode). |
| <code>MSCRATCH_RESET</code>         | Value of the <code>mscratch</code> register after reset. It can be changed to any 32-bit value.                                                                                                        |
| <code>MEPC_RESET</code>             | Value of the <code>mepc</code> register after reset. It can be changed to any 32-bit address aligned on a four byte boundary (the last two bits must be set to zero).                                  |
| <code>MCOUNTINHIBIT_CY_RESET</code> | Enables or inhibits cycle counting (1 inhibits, 0 enables).                                                                                                                                            |
| <code>MCOUNTINHIBIT_IR_RESET</code> | Enables or inhibits instructions retired counting (1 inhibits, 0 enables).                                                                                                                             |

### 3 Input and output signals

Fig. 3 (below) shows Steel input and output signals (except CLK and RESET). The following sections explain their functions in detail.

The interfaces to fetch instructions and to read/write data can be connected to any type of memory, but were designed to facilitate integration with Block RAMs in FPGAs.

**Figure 3 – Steel Core input and output signals**



#### 3.1 Interrupt controller interface

The interrupt controller interface has three signals used to request external, timer and software interrupts, shown in table 3. The interrupt request process is explained in sections 4.4 and 5.

**Table 3 – Interrupt controller interface signals**

| Signal | Width | Direction | Description                                        |
|--------|-------|-----------|----------------------------------------------------|
| E_IRQ  | 1 bit | Input     | When high indicates an external interrupt request. |
| T_IRQ  | 1 bit | Input     | When high indicates a timer interrupt request.     |
| S_IRQ  | 1 bit | Input     | When high indicates a software interrupt request.  |

#### 3.2 Instruction fetch interface

The instruction fetch interface has two signals used in the instruction fetch process, shown in table 4 (below). The process of fetching instructions is explained in section 4.1.

**Table 4 – Instruction fetch interface signals**

| Signal | Width   | Direction | Description                                                                  |
|--------|---------|-----------|------------------------------------------------------------------------------|
| INSTR  | 32 bits | Input     | Contains the instruction fetched from memory.                                |
| I_ADDR | 32 bits | Output    | Contains the address of the instruction the core wants to fetch from memory. |

### 3.3 Data read/write interface

The data read/write interface has five signals used in the process of reading/writing data from/to memory. The signals are shown in table 5 (below). The process of fetching data from memory is explained in section 4.2. The process of writing data is explained in section 4.3.

Table 5 – Data read/write interface signals

| Signal   | Width   | Direction | Description                                                                                                                                                                                                                                                                                   |
|----------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DATA_IN  | 32 bits | Input     | Contains the data fetched from memory.                                                                                                                                                                                                                                                        |
| D_ADDR   | 32 bits | Output    | In a write operation, contains the address of the memory position where the data will be stored. In a read operation, contains the address of the memory position where the data to be fetched is. The address is always aligned on a four byte boundary (the last two bits are always zero). |
| DATA_OUT | 32 bits | Output    | Contains the data to be stored in memory. Used only with write operations.                                                                                                                                                                                                                    |
| WR_REQ   | 1 bit   | Output    | When high, indicates a request to write data. Used only with write operations.                                                                                                                                                                                                                |
| WR_MASK  | 4 bits  | Output    | Contains a mask of four <i>byte-write enable</i> bits. A bit high indicates that the corresponding byte must be written. See section 4.3 for details. Used only with write operations.                                                                                                        |

### 3.4 Real time counter interface

The real time counter interface has just one signal used to update the `time` CSR, shown in table 6. The process of updating the `time` register is explained in section 4.5.

Table 6 – Real time counter interface

| Signal    | Width   | Direction | Description                                               |
|-----------|---------|-----------|-----------------------------------------------------------|
| REAL_TIME | 64 bits | Input     | Contains the current value read from a real time counter. |

### 3.5 CLK and RESET signals

The core also has the CLK and RESET input signals, which are not shown in figure 3 (above). The CLK signal must be connected to a clock source. The RESET signal is active high and resets the core asynchronously.

## 4 Timing diagrams

### 4.1 Instruction fetch

To fetch an instruction, the core places the instruction address on the I\_ADDR bus. The memory must place the instruction on the INSTR bus at the next clock rising edge. Fig. 4 (below) shows the timing diagram of this process. In the figure,  $\text{mem}[\text{addrX}]$  denotes the instruction stored at the memory position  $\text{addrX}$ .

Figure 4 – Instruction fetch timing diagram



### 4.2 Data fetch

To fetch data from memory, the core puts the data address on the D\_ADDR bus. The memory must place the data on the DATA\_IN bus at the next clock rising edge. Fig. 5 (below) shows the timing diagram of this process. In the figure,  $\text{mem}[\text{addrX}]$  denotes the data stored at the memory position  $\text{addrX}$ .

Figure 5 – Data fetch timing diagram



### 4.3 Data writing

To write data to memory, the core drives D\_ADDR, DATA\_OUT, WR\_REQ and WR\_MASK signals as follows:

- D\_ADDR receives the address of the memory position where the data must be written;
- DATA\_OUT receives the data to be written;
- WR\_REQ is set high;
- WR\_MASK receives a byte-write enable mask that indicates which bytes of DATA\_OUT must be written.

The memory must perform the write operation at the next clock rising edge. The core can request to write bytes, halfwords and words.

Fig. 6 (next page) shows the process of writing data to memory. DATA\_IN is not used in the process and appears only to show the memory contents after writing. The figure shows five clock cycles, in which the core requests to write in the second, third and fourth cycles. In the second clock cycle, the core requests to write the word 0x12345678 at the address addr2. In the third, requests to write the halfword 0xABCD at the upper half of addr2, and in the fourth requests to write the byte 0xEF at the second least significant byte of addr2. The content of addr2 after each of these operations appears on the DATA\_IN bus and are highlighted in blue.

**Figure 6 – Data writing timing diagram**



#### 4.4 Interrupt request

An external device (or an interrupt controller managing several devices) can request interrupts by setting high the appropriate IRQ signal, which is E\_IRQ for external interrupts, T\_IRQ for timer interrupts and S\_IRQ for software interrupts. The IRQ signal of the requested interrupt must be set high for one clock cycle and set low for the next.

Fig. 7 (below) shows the timing diagram of the interrupt request process. Since the process is the same for all types of interrupt, X\_IRQ is used to denote E\_IRQ, T\_IRQ or S\_IRQ. TRAP\_ADDR denotes the address of the trap handler first instruction.

**Figure 7 – Interrupt request timing diagram**



#### 4.5 time CSR update

When connected to a real-time counter, the core updates the time CSR with the value placed on REAL\_TIME at each clock rising edge, as shown in Fig. 8 (below). In the figure, timeX denotes arbitrary time values.

**Figure 8 – time CSR update timing diagram**



## 5 Exceptions and Interrupts

### 5.1 Supported exceptions and interrupts

Steel supports the exceptions and interrupts shown in table 7 (below). They are listed in descending priority order (the highest priority is at the top of the table). If two or more exceptions/interrupts occur at the same time, the one with the highest priority is taken.

Exceptions always cause a trap to be taken. An interrupt will cause a trap only if enabled. Each type of interrupt has an interrupt-enable bit in the `mie` register. Interrupts are globally enable/disabled by setting the MIE bit of `mstatus` register.

Table 7 – Steel supported exceptions and interrupts

| Exception / Interrupt                    | Interrupt | mcause value | Exception code |
|------------------------------------------|-----------|--------------|----------------|
| Machine external interrupt               | 1         | 11           |                |
| Machine software interrupt               | 1         | 3            |                |
| Machine timer interrupt                  | 1         | 7            |                |
| Illegal instruction exception            | 0         | 2            |                |
| Instruction address-misaligned exception | 0         | 0            |                |
| Environment call from M-mode exception   | 0         | 11           |                |
| Environment break exception              | 0         | 3            |                |
| Store address-misaligned exception       | 0         | 6            |                |
| Load address-misaligned exception        | 0         | 4            |                |

### 5.2 Trap handling in Steel

Exceptions and interrupts are handled by a trap handler routine stored in memory. The address of the trap handler first instruction is configured using the `mtvec` register. Steel supports both direct and vectorized interrupt modes. More information on interrupt modes and configuration of the `mtvec` register can be found in RISC-V specifications.

When a trap is taken, the core proceeds as follows:

- the address of the interrupted instruction (or the instruction that encountered the exception) is saved in the `mepc` register;
- the value of the `mtval` register is set to zero;
- the value of the `mstatus` MIE bit is saved in the MPIE field and then set to zero;
- the program counter is set to the trap handler first instruction.

The `mret` instruction is used to return from traps. When executed, the core proceeds as follows:

- the value of the `mstatus` MPIE bit is saved in the MIE field and then set to one;
- the program counter is set to the value of `mepc` register.

### 5.3 Nested interrupts capability

The core globally disables interrupts when takes into a trap. The trap handler can re-enable interrupts by setting the `mstatus` MIE bit to one, enabling nested interrupts. To return from nested traps, the trap handler must stack and manage the values of the `mepc` register in memory.

## 6 Steel-based system example

Fig. 9 (below) shows an example of how Steel can be used as processing unit in embedded systems projects. The same architecture can be used to build other systems. The example is composed of a dual-port Block RAM, a memory mapped GPIO unit, a bus arbiter, an interrupt controller and a real-time counter. In this system, the address space is divided into two ranges, one assigned to memory and the other to the GPIO unit. The bus arbiter is used to multiplex the data interface signals according to the address the core wants to access.

Software for this system can be compiled and assembled with RISC-V GNU Toolchain. The generated ELF files can be converted to HEX, MIF or COE formats and then used to initialize the Block RAM.

Figure 9 – Steel-based system example



## 7 Implementation details

This section contains information on implementation details. It is intended for those who want to know more about how Steel works.

### 7.1 Control Unit

The Control Unit (`control_unit.v`) decodes the instruction and generates the signals that control the memory, the Load Unit, the Store Unit, the ALU, the two register files (Integer and CSR), the Immediate Generator and the Writeback Multiplexer. The description of its input and output signals are shown in table 8 (below).

Table 8 – Control Unit signals

| Signal name              | Width  | Direction | Description                                                                                                               |
|--------------------------|--------|-----------|---------------------------------------------------------------------------------------------------------------------------|
| <b>OPCODE_6_TO_2</b>     | 5 bits | Input     | Connected to the instruction <i>opcode</i> field.                                                                         |
| <b>FUNCT7_5</b>          | 1 bit  | Input     | Connected to the instruction <i>funct7</i> field.                                                                         |
| <b>FUNCT3</b>            | 3 bits | Input     | Connected to the instruction <i>funct3</i> field.                                                                         |
| <b>IADDER_OUT_1_TO_0</b> | 2 bits | Input     | Used to verify the alignment of loads and stores.                                                                         |
| <b>TRAP_TAKEN</b>        | 1 bit  | Input     | When high indicates that a trap will be taken in the next clock cycle. Connected to the Machine Control module.           |
| <b>ALU_OPCODE</b>        | 4 bits | Output    | Selects the operation to be performed by the ALU. See table 10 for possible values.                                       |
| <b>MEM_WR_REQ</b>        | 1 bit  | Output    | When high indicates a request to write to memory.                                                                         |
| <b>LOAD_SIZE</b>         | 2 bits | Output    | Indicates the word size of load instruction. See table 15.                                                                |
| <b>LOAD_UNSIGNED</b>     | 1 bit  | Output    | Indicates the type of load instruction (signed or unsigned). See table 15.                                                |
| <b>ALU_SRC</b>           | 1 bit  | Output    | Selects the ALU 2nd operand.                                                                                              |
| <b>IADDER_SRC</b>        | 1 bit  | Output    | Selects the Immediate Adder 2nd operand.                                                                                  |
| <b>CSR_WR_EN</b>         | 1 bit  | Output    | Controls the WR_EN input of CSR Register File.                                                                            |
| <b>RF_WR_EN</b>          | 1      | Output    | Controls the WR_EN input of Integer Register File. See table 12.                                                          |
| <b>WB_MUX_SEL</b>        | 3      | Output    | Selects the data to be written in the Integer Register File.                                                              |
| <b>IMM_TYPE</b>          | 3      | Output    | Selects the immediate based on the type of the instruction.                                                               |
| <b>CSR_OP</b>            | 3      | Output    | Selects the operation to be performed by the CSR Register File (read/write, set or clear).                                |
| <b>ILLEGAL_INSTR</b>     | 1 bit  | Output    | When high indicates that an invalid or not implemented instruction was fetched from memory.                               |
| <b>MISALIGNED_LOAD</b>   | 1 bit  | Output    | When high indicates an attempt to read data in disagreement with the memory alignment rules. See section 1.10.            |
| <b>MISALIGNED_STORE</b>  | 1 bit  | Output    | When high indicates an attempt to write data to memory in disagreement with the memory alignment rules. See section 1.10. |

## 7.2 ALU

The ALU (alu.v) applies ten distinct logical and arithmetic operations in parallel to two 32-bit operands, outputting the result selected by OPCODE. ALU input and output signals are shown in table 9, and opcodes are shown in table 10.

The opcode values were assigned to facilitate instruction translation. The most significant bit of OPCODE matches with the second most significant bit in the instruction *funct7* field. The remaining three bits match with the instruction *funct3* field.

**Table 9 – ALU signals**

| Signal name | Width   | Direction | Description                                                                                  |
|-------------|---------|-----------|----------------------------------------------------------------------------------------------|
| OP_1        | 32 bits | Input     | Operation first operand.                                                                     |
| OP_2        | 32 bits | Input     | Operation second operand.                                                                    |
| OPCODE      | 4 bits  | Input     | Operation code. This signal is driven by <i>funct7</i> and <i>funct3</i> instruction fields. |
| RESULT      | 32 bits | Output    | Result of the requested operation.                                                           |

**Table 10 – ALU opcodes**

| Opcode   | Operation                 | Binary value |
|----------|---------------------------|--------------|
| ALU_ADD  | Addition                  | 4'b0000      |
| ALU_SUB  | Subtraction               | 4'b1000      |
| ALU_SLT  | Set on less than          | 4'b0010      |
| ALU_SLTU | Set on less than unsigned | 4'b0011      |
| ALU_AND  | Bitwise logical AND       | 4'b0111      |
| ALU_OR   | Bitwise logical OR        | 4'b0110      |
| ALU_XOR  | Bitwise logical XOR       | 4'b0100      |
| ALU_SLL  | Logical left shift        | 4'b0001      |
| ALU_SRL  | Logical right shift       | 4'b0101      |
| ALU_SRA  | Arithmetic right shift    | 4'b1101      |

### 7.3 Integer Register File

The Integer Register File (`integer_file.v`) has 32 general-purpose registers and supports read and write operations. Reads are requested in the pipeline stage 2 and provide data from one or two registers. Writes are requested in the pipeline stage 3 and put the data coming from the Writeback Multiplexer into the selected register. If stage 3 requests to write to a register being read by stage 2, the data to be written is immediately forwarded to stage 2. Each operation is driven by a distinct set of signals, shown in tables 11 and 12.

**Table 11 – Integer Register File signals for read operation**

| Signal name      | Width   | Direction | Description                                                                                       |
|------------------|---------|-----------|---------------------------------------------------------------------------------------------------|
| <b>RS_1_ADDR</b> | 5 bits  | Input     | <i>Register source 1 address.</i> The data is placed at RS_1 immediately after an address change. |
| <b>RS_2_ADDR</b> | 5 bits  | Input     | <i>Register source 2 address.</i> The data is placed at RS_2 immediately after an address change. |
| <b>RS_1</b>      | 32 bits | Output    | Data read (source 1).                                                                             |
| <b>RS_2</b>      | 32 bits | Output    | Data read (source 2).                                                                             |

**Table 12 – Integer Register File signals for write operation**

| Signal name    | Width   | Direction | Description                                                                                                                 |
|----------------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------------|
| <b>RD_ADDR</b> | 5 bits  | Input     | <i>Destination register address.</i>                                                                                        |
| <b>RD</b>      | 32 bits | Input     | Data to be written in the destination register.                                                                             |
| <b>WR_EN</b>   | 1 bit   | Input     | <i>Write enable.</i> When high, the data placed on RD is written in the destination register at the next clock rising edge. |

## 7.4 Branch Unit

The Branch Unit (`branch_unit.v`) decides if a branch instruction must be taken or not. It receives two operands from the Integer Register File and, based on the value of `opcode` and `funct3` instruction fields, decides the branch. Jump instructions are interpreted as branches that must always be taken. Internally, the unit realizes just two comparisons, deriving other four from them. Table 13 shows the module input and output signals.

Table 13 – Branch Unit signals

| Signal name          | Width   | Direction | Description                                             |
|----------------------|---------|-----------|---------------------------------------------------------|
| <b>OPCODE_6_TO_2</b> | 5 bits  | Input     | Connected to the <code>opcode</code> instruction field. |
| <b>FUNCT3</b>        | 3 bits  | Input     | Connected to the <code>funct3</code> instruction field. |
| <b>RS1</b>           | 32 bits | Input     | Connected to the register file 1st operand source.      |
| <b>RS2</b>           | 32 bits | Input     | Connected to the register file 2nd operand source.      |
| <b>BRANCH_TAKEN</b>  | 1 bit   | Output    | High if the branch must be taken, low otherwise.        |

## 7.5 Load Unit

The Load Unit (`load_unit.v`) reads `DATA_IN` input signal and forms a 32-bit value based on the load instruction type (encoded in the `funct3` field). The formed value (placed on `OUTPUT`) can then be written in the Integer Register File. The module input and output signals are shown in table 14. The value of `OUTPUT` is formed as shown in table 15.

**Table 14 – Load Unit signals**

| Signal name                    | Width   | Direction | Description                                                                                                    |
|--------------------------------|---------|-----------|----------------------------------------------------------------------------------------------------------------|
| <code>LOAD_SIZE</code>         | 2 bits  | Input     | Connected to the two least significant bits of the <code>funct3</code> instruction field.                      |
| <code>LOAD_UNSIGNED</code>     | 1 bit   | Input     | Connected to the most significant bit of the <code>funct3</code> instruction field.                            |
| <code>DATA_IN</code>           | 32 bits | Input     | 32-bit word read from memory.                                                                                  |
| <code>IADDER_OUT_1_TO_0</code> | 2 bits  | Input     | Indicates the byte/halfword position in <code>DATA_IN</code> . Used only with load byte/halfword instructions. |
| <code>OUTPUT</code>            | 32 bits | Output    | 32-bit value to be written in the Integer Register File.                                                       |

**Table 15 – Load Unit output generation**

| <code>LOAD_SIZE</code>     | <b>Effect on <code>OUTPUT</code></b>                                                                                                                                                                                           |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>2'b00</code>         | The byte in the position indicated by <code>IADDER_OUT_1_TO_0</code> is placed on the least significant byte of <code>OUTPUT</code> . The upper 24 bits are filled according to the <code>LOAD_UNSIGNED</code> signal.         |
| <code>2'b01</code>         | The halfword in the position indicated by <code>IADDER_OUT_1_TO_0</code> is placed on the least significant halfword of <code>OUTPUT</code> . The upper 16 bits are filled according to the <code>LOAD_UNSIGNED</code> signal. |
| <code>2'b10</code>         | All bits of <code>DATA_IN</code> are placed on <code>OUTPUT</code> .                                                                                                                                                           |
| <code>2'b11</code>         | All bits of <code>DATA_IN</code> are placed on <code>OUTPUT</code> .                                                                                                                                                           |
| <code>LOAD_UNSIGNED</code> | <b>Effect on <code>OUTPUT</code></b>                                                                                                                                                                                           |
| <code>1'b0</code>          | The remaining bits of <code>OUTPUT</code> are filled with the sign bit.                                                                                                                                                        |
| <code>1'b1</code>          | The remaining bits of <code>OUTPUT</code> are filled with zeros.                                                                                                                                                               |

## 7.6 Store Unit

The Store Unit (`store_unit.v`) drives the signals that interface with memory. It places the data to be written (which can be a byte, halfword or word) in the right position in `DATA_OUT` and sets the value of `WR_MASK` in an appropriate way. Table 16 (below) shows the unit input and output signals.

**Table 16 – Store Unit signals**

| Signal name       | Width   | Direction | Description                                                                                                           |
|-------------------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------|
| <b>FUNCT3</b>     | 3 bits  | Input     | Connected to the <i>funct3</i> instruction field. Indicates the data size (byte, halfword or word).                   |
| <b>IADDER_OUT</b> | 32 bits | Input     | Contains the address (possibly unaligned) where the data must be written.                                             |
| <b>RS2</b>        | 32 bits | Input     | Connected to Integer Register File source 2. Contains the data to be written (possibly in the wrong position).        |
| <b>MEM_WR_REQ</b> | 1 bit   | Input     | Control signal generated by the Control Unit. When high indicates a request to write to memory.                       |
| <b>DATA_OUT</b>   | 32 bits | Output    | Contains the data to be written in the right position.                                                                |
| <b>D_ADDR</b>     | 32 bits | Output    | Contains the address (aligned) where the data must be written.                                                        |
| <b>WR_MASK</b>    | 4 bits  | Output    | A bitmask that indicates which bytes of <code>DATA_OUT</code> must be written. For more information, see section 4.3. |
| <b>WR_REQ</b>     | 1 bit   | Output    | When high indicates a request to write to memory.                                                                     |

## 7.7 Immediate Generator

The Immediate Generator (`imm-generator.v`) rearranges the immediate bits contained in the instruction and, if necessary, sign-extends it to form a 32-bit value. The unit is controlled by the `IMM_TYPE` signal, generated by the Control Unit. Table 17 shows the unit input and output signals.

**Table 17 – Immediate Generator signals**

| Signal name           | Width   | Direction | Description                                                                                               |
|-----------------------|---------|-----------|-----------------------------------------------------------------------------------------------------------|
| <code>INSTR</code>    | 25 bits | Input     | Connected to the instruction bits (32 to 7).                                                              |
| <code>IMM_TYPE</code> | 2 bits  | Input     | Control signal generated by the Control Unit that indicated the type of immediate that must be generated. |
| <code>IMM</code>      | 32 bits | Output    | 32-bit generated immediate.                                                                               |

## 7.8 CSR Register File

The CSR Register File (`csr_file.v`) has the control and status registers required for the implementation of M-mode. Read/write, set and clear operations can be applied to the registers. Table 18 shows the unit input and output signals, except those used for communication with the Machine Control, which are shown in table 19.

**Table 18 – CSR Register File signals**

| Signal name         | Width   | Direction | Description                                                                                                                                        |
|---------------------|---------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>WR_EN</b>        | 1 bit   | Input     | <i>Write enable.</i> When high, updates the CSR addressed by CSR_ADDR at the next clock rising edge according to the operation selected by CSR_OP. |
| <b>CSR_ADDR</b>     | 12 bits | Input     | Address of the CSR to read/write/modify.                                                                                                           |
| <b>CSR_OP</b>       | 3 bits  | Input     | Control signal generated by the Control Unit. Selects the operation to be performed (read/write, set, clear or no operation).                      |
| <b>CSR_UIMM</b>     | 5 bits  | Input     | <i>Unsigned immediate.</i> Connected to the five least significant bits from the Immediate Generator output.                                       |
| <b>CSR_DATA_IN</b>  | 32 bits | Input     | In write operations, contains the data to be written. In set or clear operations, contains a bit mask.                                             |
| <b>PC</b>           | 32 bits | Input     | <i>Program counter value.</i> Used to update the mepc CSR.                                                                                         |
| <b>E_IRQ</b>        | 1 bit   | Input     | <i>External interrupt request.</i> Used to update the MEIP bit of mip CSR.                                                                         |
| <b>T_IRQ</b>        | 1 bit   | Input     | <i>Timer interrupt request.</i> Used to update the MTIP bit of mip CSR.                                                                            |
| <b>S_IRQ</b>        | 1 bit   | Input     | <i>Software interrupt request.</i> Used to update the MSIP bit of mip CSR.                                                                         |
| <b>REAL_TIME</b>    | 64 bits | Input     | Current value of the real time counter. Used to update the time and timeh CSRs.                                                                    |
| <b>CSR_DATA_OUT</b> | 32 bits | Output    | Contains the data read from the CSR addressed by CSR_ADDR.                                                                                         |
| <b>EPC_OUT</b>      | 32 bits | Output    | Current value of the mepc CSR.                                                                                                                     |
| <b>TRAP_ADDRESS</b> | 32 bits | Output    | Address of the trap handler first instruction.                                                                                                     |

**Table 19 – CSR Register File and Machine Control interface signals**

| Signal name | Width  | Direction <sup>1</sup> | Description                                                                                                                                                    |
|-------------|--------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| I_OR_E      | 1 bit  | Input                  | <i>Interrupt or exception.</i> When high indicates an interrupt, otherwise indicates an exception. Used to update the most significant bit of mcause register. |
| CAUSE_IN    | 4 bits | Input                  | Contains the exception code. Used to update the mcause register. See table 7.                                                                                  |
| SET_CAUSE   | 1 bit  | Input                  | When high updates the mcause register with the values of I_OR_E and CAUSE_IN.                                                                                  |
| SET_EPC     | 1 bit  | Input                  | When high, updates the mepc register with the value of PC.                                                                                                     |
| INSTRET_INC | 1 bit  | Input                  | When high enables the instructions retired counting.                                                                                                           |
| MIE_CLEAR   | 1 bit  | Input                  | When high sets the MIE bit of mstatus to zero (which globally disables interrupts). The old value of MIE is saved in the mstatus MPIE field.                   |
| MIE_SET     | 1 bit  | Input                  | When high sets the MPIE bit of mstatus to one. The old value of MPIE is saved in the mstatus MIE field.                                                        |
| MIE         | 1 bit  | Output                 | Current value of MIE bit of mstatus CSR.                                                                                                                       |
| MEIE_OUT    | 1 bit  | Output                 | Current value of MEIE bit of mie CSR.                                                                                                                          |
| MTIE_OUT    | 1 bit  | Output                 | Current value of MTIE bit of mie CSR.                                                                                                                          |
| MSIE_OUT    | 1 bit  | Output                 | Current value of MSIE bit of mie CSR.                                                                                                                          |
| MEIP_OUT    | 1 bit  | Output                 | Current value of MEIP bit of mip CSR.                                                                                                                          |
| MTIP_OUT    | 1 bit  | Output                 | Current value of MTIP bit of mip CSR.                                                                                                                          |
| MSIP_OUT    | 1 bit  | Output                 | Current value of MSIP bit of mip CSR.                                                                                                                          |

<sup>1</sup> Direction regarding to the CSR Register File. An input of CSR Register File is an output of Machine Control and vice-versa.

## 7.9 Machine Control

The Machine Control module (`machine_control.v`) implements the M-mode, controlling the program counter generation and updating several CSRs. It has a special communication interface with the CSR Register File, already shown in table 19 (above). Its input and output signals are shown in table 20 (below).

Internally, the module implements the finite state machine shown in figure 10 (next page).

**Table 20 – Machine Control module signals**

| Signal name             | Width  | Direction | Description                                                                                                                                                               |
|-------------------------|--------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>ILLEGAL_INSTR</b>    | 1 bit  | Input     | <i>Illegal instruction.</i> When high indicates that an invalid or not implemented instruction was fetched from memory.                                                   |
| <b>MISALIGNED_INSTR</b> | 1 bit  | Input     | <i>Misaligned instruction.</i> When high indicates an attempt to fetch an instruction which address is in disagreement with the memory alignment rules. See section 1.10. |
| <b>MISALIGNED_LOAD</b>  | 1 bit  | Input     | <i>Misaligned load.</i> When high indicates an attempt to read data in disagreement with the memory alignment rules. See section 1.10.                                    |
| <b>MISALIGNED_STORE</b> | 1 bit  | Input     | <i>Misaligned store.</i> When high indicates an attempt to write data to memory in disagreement with the memory alignment rules. See section 1.10.                        |
| <b>OPCODE_6_TO_2</b>    | 5 bits | Input     | Value of the <i>opcode</i> instruction field.                                                                                                                             |
| <b>FUNCT3</b>           | 3 bits | Input     | Value of the <i>funct3</i> instruction field.                                                                                                                             |
| <b>FUNCT7</b>           | 7 bits | Input     | Value of the <i>funct7</i> instruction field.                                                                                                                             |
| <b>RS1_ADDR</b>         | 5 bits | Input     | Value of the <i>rs1</i> instruction field.                                                                                                                                |
| <b>RS2_ADDR</b>         | 5 bits | Input     | Value of the <i>rs2</i> instruction field.                                                                                                                                |
| <b>RD_ADDR</b>          | 5 bits | Input     | Value of the <i>rd</i> instruction field.                                                                                                                                 |
| <b>E_IRQ</b>            | 1 bit  | Input     | <i>External interrupt request.</i>                                                                                                                                        |
| <b>T_IRQ</b>            | 1 bit  | Input     | <i>Timer interrupt request.</i>                                                                                                                                           |
| <b>S_IRQ</b>            | 1 bit  | Input     | <i>Software interrupt request.</i>                                                                                                                                        |
| <b>PC_SRC</b>           | 2 bit  | Output    | Selects the program counter source.                                                                                                                                       |
| <b>FLUSH</b>            | 1 bit  | Output    | Flushes the pipeline when set.                                                                                                                                            |
| <b>TRAP_TAKEN</b>       | 1 bit  | Output    | When high indicates that a trap will be taken in the next clock cycle.                                                                                                    |

**Figure 10 – M-mode finite state machine**

