



# DESIGN AND SIMULATION OF A SINGLE-CYCLE / PIPELINED RISC-V PROCESSOR

RANIA OSSAMA HASSAN 231001334

ADHAM AHMED MOHAMED 221001783

YOMNA MEDHAT SAAD 231000762

MOHAMMED HESHAM 231000359

JOMANA AMIN 231001427

# AGENDA

- Overview of the RISC-V processor project
- ALU & Arithmetic Logic Design
- Control Unit (Main + ALU Control)
- Register File & Immediate Generator
- Program Counter, Branch Logic & PC Update
- Instruction Memory, Data Memory & Top-Level Integration
- Simulation Results – Testbenches, PC updates, instruction execution, waveform verification

# OVERVIEW

- Design and implement a RISC-V processor in Verilog HDL.
- Supports R-Type, I-Type, Load/Store, and Branch instructions.
- Simulate both single-cycle and pipelined versions.
- Understand the internal CPU structure: instruction fetch, decode, execute, memory, and write-back.
- Verify module functionality using simulation waveforms.
- Gain hands-on experience in CPU design, control signals, and datapath integration.

# ALU & ARITHMETIC LOGIC DESIGN

- Located in the Execute stage of the 5-stage pipe
- Performs all RISC-V operations
- Uses signed comparison for slt ( $-5 < 10$  result = 1)
- Outputs a Zero flag using beq to detect equality
- Purely combinational (no clock or reset needed)
- Controlled by 4-bit ALUCtl from the ALU Control Unit
- Verified with full testbench (all operations tested)

|            |      |
|------------|------|
| add / addi | 0010 |
| sub        | 0110 |
| slt / slti | 0100 |
| and / andi | 0000 |
| or / ori   | 0001 |
| xor / xori | 0111 |
| sll / slli | 0011 |
| srl / srli | 1000 |
| sra / srai | 1010 |

# ALU & ARITHMETIC LOGIC DESIGN

## ALU Control Codes (ALUControl)

|            |      |
|------------|------|
| add / addi | 0010 |
| sub        | 0110 |
| slt / slti | 0100 |
| and / andi | 0000 |
| or / ori   | 0001 |
| xor / xori | 0111 |
| sll / slli | 0011 |
| srl / srli | 1000 |
| sra / srai | 1010 |

# CONTROL UNIT

- Located in the Decode stage of the 5-stage pipeline
- Generates control signals based on opcode and finds what type of instruction the ALU will do (R-type, I-type ... etc)

| Instruction Type | [0] | [1] | [2] | [3] | [4] | [5] | [6] |
|------------------|-----|-----|-----|-----|-----|-----|-----|
| R-Type           | 1   | 1   | 0   | 0   | 1   | 1   | 0   |
| I-type           | 1   | 1   | 0   | 0   | 1   | 0   | 0   |
| Load             | 1   | 1   | 0   | 0   | 0   | 0   | 0   |
| Store            | 1   | 1   | 0   | 0   | 0   | 1   | 0   |
| Branch           | 1   | 1   | 0   | 0   | 0   | 1   | 1   |

- Purely combinational (no clock or reset needed)
- Verified with full testbench

| Instruction Type | RegWrite | MemRead | MemWrite | Branch | ALUSrc | MemtoReg | ALUOp |
|------------------|----------|---------|----------|--------|--------|----------|-------|
| R-Type           | 1        | 0       | 0        | 0      | 0      | 0        | 10    |
| I-Type           | 1        | 0       | 0        | 0      | 1      | 0        | 11    |
| Load             | 1        | 1       | 0        | 0      | 1      | 1        | 00    |
| Store            | 0        | 0       | 1        | 0      | 1      | X        | 00    |
| Branch           | 0        | 0       | 0        | 1      | 0      | X        | 01    |

# IMMEDIATE GENERATOR



- Extracts and sign-extends immediates from instructions
- I-type, S-type and B-type instructions
- Handles shift-immediates
- Combinational logic

| TYPE     | INSTRUCTION    |
|----------|----------------|
| I - type | addi, andi, ld |
| S - type | sd             |
| B - type | beq            |
| Shift    | slli, srli     |

# REGISTER FILE & IMMEDIATE GENERATOR

- 32 Registers
- $x_0$  always 0
- Two read ports
- One write port
- Synchronous write
- Asynchronous read



# PROGRAM COUNTER & BRANCH LOGIC

- Designed the Program Counter (PC) unit for instruction sequencing.
- PC handles two cases:
- Normal instruction: PC increments by 4.
- Branch taken: PC updated to branch target address.
- Implemented asynchronous reset to initialize PC to zero.
- Added control signals:
- PCWrite → allows stalling of PC.
- BranchTaken → indicates if a branch is taken.

```
1 module PC_Unit (
2     input  wire      clk,
3     input  wire      reset,
4
5     input  wire      PCWrite,
6     input  wire      BranchTaken,
7     input  wire [63:0] BranchTarget,
8
9     output reg [63:0] PC
10 );
11
12 wire [63:0] PC_plus4;
13 wire [63:0] PC_next;
14
15 assign PC_plus4 = PC + 64'd4;
16
17 assign PC_next = (BranchTaken) ? BranchTarget : PC_plus4;
18
19 always @ (posedge clk or posedge reset) begin
20     if (reset)
21         PC <= 64'd0;
22     else if (PCWrite)
23         PC <= PC_next;
24 end
25
26 endmodule
```

# PROGRAM COUNTER & BRANCH LOGIC

```
1 `timescale 1ns / 1ps
2
3 module PC_Unit_tb;
4
5
6   reg clk;
7   reg reset;
8   reg PCWrite;
9   reg BranchTaken;
10  reg [63:0] BranchTarget;
11
12
13  wire [63:0] PC;
14
15
16  PC_Unit dut (
17    .clk(clk),
18    .reset(reset),
19    .PCWrite(PCWrite),
20    .BranchTaken(BranchTaken),
21    .BranchTarget(BranchTarget),
22    .PC(PC)
23  );
24
25  always #5 clk = ~clk;
26
27  initial begin
28    clk = 0;
29    reset = 1;
30    PCWrite = 1;
31    BranchTaken = 0;
32    BranchTarget = 64'd0;
33
34
35    #10;
36    reset = 0;
37
38
```

- Tested functionality using simulation:
- Normal PC increment
- Branch taken
- Branch not taken

```
38
39  #10;
40  #10;
41  #10;
42
43  PCWrite = 0;
44  #20;
45
46
47  PCWrite = 1;
48  #10;
49
50  BranchTaken = 1;
51  BranchTarget = 64'd100;
52  #10;
53  BranchTaken = 0;
54  #10; // PC = 104
55
56
57  #10;
58  $stop;
59  end
60
61 endmodule
```

# RISC-V PIPELINED PROCESSOR

## Pipeline Stage Operations & Decisions

- IF → ID: Instruction is fetched using the PC; PC is updated based on sequential execution or a branch decision.
- ID → EX: Instruction is decoded, control signals are generated, source registers are read, and immediates are prepared.
- EX → MEM: ALU performs arithmetic/logic operations, computes memory addresses, and evaluates branch conditions.
- MEM → WB: Data memory is accessed for load/store instructions, and the correct result is selected.
- WB: Final result is written back to the register file if required by the instruction.

## Hazard Handling

**Data Hazards:** Forwarding unit enables ALU-to-ALU and MEM-to-ALU bypassing, ensuring zero stalls for register dependencies.

### Control Hazards:

- Simple prediction: branch not taken

```
1 module ForwardingUnit(
2   input wire [4:0] EX_rs1, EX_rs2,      // source registers of
3   input wire [4:0] MEM_rd, WB_rd,        // destination register
4   input wire MEM_RegWrite, WB_RegWrite,
5   output reg [1:0] forwardA, forwardB
6 );
7
8   always @(*) begin
9     forwardA = 2'b00;
10    forwardB = 2'b00;
11
12    // EX hazard (forward from EX/MEM)
13    if (MEM_RegWrite && MEM_rd != 0 && MEM_rd == EX_rs1)
14      forwardA = 2'b10;
15    if (MEM_RegWrite && MEM_rd != 0 && MEM_rd == EX_rs2)
16      forwardB = 2'b10;
17
18    // MEM/WB hazard (forward from WB)
19    if (WB_RegWrite && WB_rd != 0 && WB_rd == EX_rs1)
20      forwardA = 2'b01;
21    if (WB_RegWrite && WB_rd != 0 && WB_rd == EX_rs2)
22      forwardB = 2'b01;
23
24  end
endmodule
```

# IMPLEMENTATION, VERIFICATION & RESULTS

# Verification

- Implemented and simulated in Vivado, verifying:
    - Correct instruction execution
    - Proper register write-back
    - Correct memory load/store behavior
    - Accurate branch handling and pipeline flushing

# Results

- Expected outputs observed in simulation
  - Processor executes RISC-V programs correctly under pipelining
  - No functional stalls for ALU-to-ALU dependencies





# DISCUSSION OF VERIFICATION

The most significant proof of correct operation is the value of x17. For the processor to reach the value 23, the following must have occurred correctly:

1. The ALU correctly compared x24 and x7 in the beq instruction.
2. The Branch Logic detected the equality and calculated the target PC.
3. The IF/ID Flush successfully cleared the instructions that entered the pipe before the branch was resolved.
4. The Write-Back stage successfully committed the value 23 to the register file on the negative edge of the clock.

| Signal / Register Logical Purpose          | Expected Value | Simulated Result | Status |
|--------------------------------------------|----------------|------------------|--------|
| x17<br>Final result after Branch (L1)      | 23             | 23               | Pass   |
| x7<br>Result of logical shift (SRL)        | 0              | 0                | Pass   |
| x24<br>Result of Load from Memory          | 0              | 0                | Pass   |
| Memory [48]<br>Target of Store instruction | 0              | 0                | Pass   |
| Forwarding<br>ALU-to-ALU Dependency        | No Stalls      | Active (FwdA/B)  | Pass   |

# THANK YOU

GitHub Repository: <https://github.com/JomanaAmin/Computer-Architecture-Project>