

**Indian Institute of Technology Mandi**

School of Computing and Electrical Engineering

# **Graph-Optimized Multiplexer Network for Pipelined Datapath Design**

**Modeling, Optimization, and Formal Verification**

**Course Project Report**

**RTL Design and Synthesis**

**Name:** Ayan Garg  
**Roll Number:** B23484

17th November 2025

# Contents

|                                                |           |
|------------------------------------------------|-----------|
| <b>1 Abstract</b>                              | <b>4</b>  |
| <b>2 Introduction</b>                          | <b>4</b>  |
| 2.1 Background . . . . .                       | 4         |
| 2.2 Problem Statement . . . . .                | 4         |
| 2.3 Objectives . . . . .                       | 4         |
| <b>3 Literature Review</b>                     | <b>5</b>  |
| 3.1 RISC-V Architecture . . . . .              | 5         |
| 3.2 Pipelining in Processors . . . . .         | 5         |
| 3.3 Multiplexer Network Optimization . . . . . | 5         |
| 3.4 Formal Verification . . . . .              | 6         |
| <b>4 Methodology</b>                           | <b>6</b>  |
| 4.1 Design Flow . . . . .                      | 6         |
| 4.2 Tools Used . . . . .                       | 6         |
| <b>5 Single-Cycle Processor Design</b>         | <b>6</b>  |
| 5.1 Architecture Overview . . . . .            | 6         |
| 5.2 Key Modules . . . . .                      | 7         |
| 5.2.1 ALU Module . . . . .                     | 7         |
| 5.2.2 Control Unit . . . . .                   | 7         |
| <b>6 Pipelined Processor Implementation</b>    | <b>8</b>  |
| 6.1 Pipeline Stages . . . . .                  | 8         |
| 6.1.1 IF Stage (Instruction Fetch) . . . . .   | 8         |
| 6.1.2 ID Stage (Instruction Decode) . . . . .  | 8         |
| 6.1.3 EX Stage (Execute) . . . . .             | 9         |
| 6.2 Pipeline Registers . . . . .               | 9         |
| 6.2.1 IF/ID Register . . . . .                 | 10        |
| 6.3 Hazard Detection Unit . . . . .            | 10        |
| 6.4 Forwarding Unit . . . . .                  | 10        |
| <b>7 Multiplexer Network Design</b>            | <b>11</b> |

|           |                                                     |           |
|-----------|-----------------------------------------------------|-----------|
| 7.1       | Multiplexer Network Overview . . . . .              | 11        |
| 7.2       | Multiplexer Implementation . . . . .                | 11        |
| 7.2.1     | 2:1 Multiplexer . . . . .                           | 11        |
| 7.2.2     | 3:1 Multiplexer . . . . .                           | 12        |
| 7.3       | Multiplexer Network Graph Representation . . . . .  | 12        |
| <b>8</b>  | <b>Multiplexer Network Optimization</b>             | <b>12</b> |
| 8.1       | Optimization Algorithms . . . . .                   | 12        |
| 8.1.1     | Serial Chain Merging . . . . .                      | 13        |
| 8.1.2     | Stage Reduction . . . . .                           | 13        |
| 8.1.3     | Redundant Mux Elimination . . . . .                 | 13        |
| 8.2       | Optimization Results . . . . .                      | 14        |
| 8.2.1     | Before Optimization . . . . .                       | 14        |
| 8.2.2     | After Optimization . . . . .                        | 14        |
| 8.2.3     | Improvement Summary . . . . .                       | 14        |
| 8.3       | Optimized Multiplexer Chains . . . . .              | 14        |
| <b>9</b>  | <b>Formal Verification using CTL Model Checking</b> | <b>15</b> |
| 9.1       | CTL Model Overview . . . . .                        | 15        |
| 9.1.1     | Model Variables . . . . .                           | 15        |
| 9.1.2     | Multiplexer Definitions . . . . .                   | 16        |
| 9.2       | CTL Properties . . . . .                            | 16        |
| 9.2.1     | Property Categories . . . . .                       | 16        |
| 9.3       | CTL Properties Table . . . . .                      | 17        |
| 9.4       | CTL Operators Used . . . . .                        | 18        |
| 9.5       | NuSMV Verification Results . . . . .                | 18        |
| 9.5.1     | Verification Summary . . . . .                      | 19        |
| <b>10</b> | <b>Simulation and Waveform Analysis</b>             | <b>19</b> |
| 10.1      | Testbench Design . . . . .                          | 19        |
| 10.2      | Waveform Analysis . . . . .                         | 20        |
| <b>11</b> | <b>Results and Analysis</b>                         | <b>20</b> |
| 11.1      | Performance Metrics Comparison . . . . .            | 20        |
| 11.1.1    | Area Reduction . . . . .                            | 20        |
| 11.1.2    | Power Reduction . . . . .                           | 20        |

|                                             |           |
|---------------------------------------------|-----------|
| 11.1.3 Delay Reduction . . . . .            | 20        |
| 11.2 Functional Correctness . . . . .       | 20        |
| <b>12 Discussion</b>                        | <b>21</b> |
| 12.1 Optimization Effectiveness . . . . .   | 21        |
| 12.2 Formal Verification Benefits . . . . . | 21        |
| 12.3 Limitations . . . . .                  | 21        |
| <b>13 Conclusion</b>                        | <b>21</b> |
| <b>14 References</b>                        | <b>22</b> |

# 1 Abstract

This report presents the design, implementation, and optimization of a 5-stage pipelined RISC-V processor with a focus on multiplexer network optimization using graph-based techniques. The project includes the transformation from a single-cycle processor to a pipelined architecture, implementation of hazard detection and forwarding units, optimization of the multiplexer network to reduce stages and improve performance, and formal verification using CTL model checking with NuSMV. The optimized design achieves significant improvements: 66.7% reduction in multiplexer stages, 50% reduction in LUTs, 57.4% reduction in power consumption, and 70% reduction in critical path delay.

## 2 Introduction

### 2.1 Background

RISC-V is an open-source instruction set architecture (ISA) that has gained significant traction in both academic and industrial settings. Pipelining is a fundamental technique used to improve processor performance by overlapping instruction execution. However, pipelining introduces data hazards that require careful handling through forwarding and stalling mechanisms.

### 2.2 Problem Statement

The multiplexer network in a pipelined processor is critical for data path selection but can become a bottleneck in terms of area, power, and delay. This project addresses:

- Design and implementation of a 5-stage pipelined RISC-V processor
- Optimization of the multiplexer network using graph-based algorithms
- Formal verification of the multiplexer network correctness using CTL model checking

### 2.3 Objectives

1. Transform a single-cycle RISC-V processor into a 5-stage pipelined architecture
2. Implement hazard detection and forwarding units
3. Design and optimize the multiplexer network using graph optimization techniques
4. Verify correctness using formal methods (CTL model checking)
5. Analyze and compare performance metrics (area, power, delay) before and after optimization

## 3 Literature Review

### 3.1 RISC-V Architecture

RISC-V is a reduced instruction set computer (RISC) architecture designed to be simple, modular, and extensible. The base instruction set includes:

- Integer operations (R-type, I-type)
- Load/Store operations
- Branch and Jump instructions
- Immediate value handling

### 3.2 Pipelining in Processors

Pipelining divides instruction execution into multiple stages, allowing multiple instructions to be processed simultaneously. The classic 5-stage pipeline includes:

1. **IF (Instruction Fetch)**: Fetch instruction from memory
2. **ID (Instruction Decode)**: Decode instruction and read registers
3. **EX (Execute)**: Perform ALU operations
4. **MEM (Memory)**: Access data memory
5. **WB (Write Back)**: Write result back to register file

### 3.3 Multiplexer Network Optimization

Multiplexer networks can be modeled as directed graphs where nodes represent data sources/sinks and edges represent data paths through multiplexers. Optimization techniques include:

- Serial chain merging
- Parallel mux merging
- Stage reduction
- Critical path optimization
- Redundant mux elimination

### 3.4 Formal Verification

Computation Tree Logic (CTL) is a temporal logic used for formal verification. CTL properties can verify:

- Reachability: All paths are accessible
- Safety: No invalid states occur
- Liveness: Desired states are eventually reached

## 4 Methodology

### 4.1 Design Flow

The project follows a systematic design flow:

1. Single-cycle processor design and verification
2. Pipeline stage implementation
3. Hazard detection and forwarding unit design
4. Multiplexer network modeling and optimization using Python
5. Formal verification using CTL model checking
6. Performance analysis and comparison

### 4.2 Tools Used

- **Icarus Verilog (iverilog)**: Verilog simulation and compilation
- **GTKWave**: Waveform visualization and analysis
- **Python**: Scripting for optimization and analysis
- **NuSMV**: Symbolic model checker for CTL verification
- **NetworkX**: Graph analysis and optimization library in Python

## 5 Single-Cycle Processor Design

### 5.1 Architecture Overview

The single-cycle RISC-V processor implements the base instruction set with the following components:

- Instruction Memory
- Register File
- ALU (Arithmetic Logic Unit)
- Data Memory
- Control Unit
- Sign Extension Unit
- Program Counter (PC) and PC+4

## 5.2 Key Modules

### 5.2.1 ALU Module

The ALU performs arithmetic and logical operations based on the ALU control signals. Key operations include:

Listing 1: ALU Operations

```

1 module ALU #(parameter DATA_WIDTH = 32) (
2     input wire [DATA_WIDTH-1:0] A, B,
3     input wire [3:0] ALUControl,
4     output reg [DATA_WIDTH-1:0] ResultReg,
5     output wire Zero
6 );
7     always @(*) begin
8         case (ALUControl)
9             4'b0000: ResultReg <= A + B;           // ADD
10            4'b0001: ResultReg <= A - B;           // SUB
11            4'b0010: ResultReg <= A & B;           // AND
12            4'b0011: ResultReg <= A | B;           // OR
13            4'b0100: ResultReg <= A ^ B;           // XOR
14            4'b0101: ResultReg <= A << B[4:0]; // SLL
15            4'b0110: ResultReg <= A >> B[4:0]; // SRL
16            4'b0111: ResultReg <= $signed(A) >>> B[4:0]; // SRA
17            4'b1000: ResultReg <= A + B;           // AUIPC
18            4'b1001: ResultReg <= B;               // LUI
19            default: ResultReg <= {DATA_WIDTH{1'bx}};
20        endcase
21    end
22    assign Zero = (ResultReg == {DATA_WIDTH{1'b0}});
23 endmodule

```

### 5.2.2 Control Unit

The control unit decodes instruction opcodes and generates control signals for the data-path:

- RegWrite: Enable register file write
- MemWrite: Enable data memory write
- ALUSrc: Select ALU input (register or immediate)
- ResultSrc: Select writeback source (ALU, memory, or PC+4)
- Branch: Branch instruction detected
- Jump: Jump instruction detected

## 6 Pipelined Processor Implementation

### 6.1 Pipeline Stages

The 5-stage pipeline divides instruction execution into distinct stages with pipeline registers between them.

#### 6.1.1 IF Stage (Instruction Fetch)

Listing 2: IF Stage Implementation

```

1 // Instruction Fetch Stage
2 wire [31:0] PCPlus4F;
3 PC_Plus_4 pc_plus_4_inst (.PC(PCF), .PCPlus4(PCPlus4F));
4 Instruction_Memory imem (.A(PCF), .RD(InstrF));

```

#### 6.1.2 ID Stage (Instruction Decode)

The ID stage decodes instructions, reads registers, and extends immediate values:

Listing 3: ID Stage Components

```

1 // Register File
2 Register_File rf (
3   .clk(clk),
4   .A1(InstrD[19:15]), // rs1
5   .A2(InstrD[24:20]), // rs2
6   .A3(RdW),           // Write address
7   .WD3(ResultW),      // Write data
8   .WE3(RegWriteW),    // Write enable
9   .RD1(RD1D),
10  .RD2(RD2D)
11 );
12
13 // Sign Extension
14 Extend extend_inst (
15   .Instr(InstrD[31:7]),

```

```

16     .ImmSrc(ImmSrcD),
17     .ImmExt(ImmExtD)
18 );

```

### 6.1.3 EX Stage (Execute)

The EX stage performs ALU operations with forwarding support:

Listing 4: EX Stage with Forwarding

```

// Forwarding Muxes
Forward_Mux_A fwd_mux_a (
    .RD1E(RD1E),
    .ALUResultM(ALUResultM),
    .ResultW(ResultW),
    .ForwardAE(ForwardAE),
    .SrcAE(SrcAE)
);
Forward_Mux_B fwd_mux_b (
    .RD2E(RD2E),
    .ALUResultM(ALUResultM),
    .ResultW(ResultW),
    .ForwardBE(ForwardBE),
    .SrcBE(SrcBE)
);
// ALU Input Mux
ALU_Mux alu_mux (
    .SrcBE(SrcBE),
    .ImmExtE(ImmExtE),
    .ALUSrcE(ALUSrcE),
    .SrcB(SrcB)
);
// ALU
ALU alu_inst (
    .A(SrcAE),
    .B(SrcB),
    .ALUControl(ALUControlE),
    .ResultReg(ALUResultE),
    .Zero(ZeroE)
);

```

## 6.2 Pipeline Registers

Pipeline registers store intermediate values between stages:

### 6.2.1 IF/ID Register

Listing 5: IF/ID Pipeline Register

```
1 module IF_ID_Register(
2     input wire clk, reset,
3     input wire StallID, FlushD,
4     input wire [31:0] InstrF, PCF, PCPlus4F,
5     output reg [31:0] InstrD, PCF_D, PCPlus4D
6 );
7     always@(posedge clk or posedge reset) begin
8         if (reset || FlushD) begin
9             InstrD <= 32'b0;
10            PCF_D <= 32'b0;
11            PCPlus4D <= 32'b0;
12        end
13        else if (~StallID) begin
14            InstrD <= InstrF;
15            PCF_D <= PCF;
16            PCPlus4D <= PCPlus4F;
17        end
18    end
19 endmodule
```

## 6.3 Hazard Detection Unit

The hazard detection unit identifies data hazards and generates stall signals:

Listing 6: Hazard Detection Logic

```
1 module Hazard_Unit(
2     input wire [4:0] Rs1D, Rs2D, RdE, RdM,
3     input wire MemReadE,
4     output wire StallF, StallD, FlushE
5 );
6     // Load-use hazard detection
7     assign load_use_hazard = MemReadE &&
8             ((RdE == Rs1D) || (RdE == Rs2D)) &&
9             (RdE != 5'b0);
10
11    assign StallF = load_use_hazard;
12    assign StallD = load_use_hazard;
13    assign FlushE = load_use_hazard;
14 endmodule
```

## 6.4 Forwarding Unit

The forwarding unit detects data hazards and generates forwarding control signals:

Listing 7: Forwarding Unit Logic

```

1 module Forwarding_Unit(
2     input wire [4:0] Rs1E, Rs2E, RdM, RdW,
3     input wire RegWriteM, RegWriteW,
4     output reg [1:0] ForwardAE, ForwardBE
5 );
6     // ForwardAE logic
7     always @(*) begin
8         if ((Rs1E != 0) && (Rs1E == RdM) && RegWriteM)
9             ForwardAE = 2'b10; // Forward from MEM
10        else if ((Rs1E != 0) && (Rs1E == RdW) && RegWriteW)
11            ForwardAE = 2'b01; // Forward from WB
12        else
13            ForwardAE = 2'b00; // No forwarding
14    end
15
16    // Similar logic for ForwardBE
17 endmodule

```

## 7 Multiplexer Network Design

### 7.1 Multiplexer Network Overview

The multiplexer network in the pipelined processor consists of several key multiplexers:

1. **Forwarding Muxes (Forward\_Mux\_A, Forward\_Mux\_B)**: Select between register file, MEM stage result, or WB stage result
2. **ALU Mux**: Selects between forwarded register data or immediate value
3. **Result Mux**: Selects writeback source (ALU result, memory data, or PC+4)
4. **PC Mux**: Selects next PC (PC+4 or branch/jump target)

### 7.2 Multiplexer Implementation

Generic parameterized multiplexer modules were created for reusability:

#### 7.2.1 2:1 Multiplexer

Listing 8: 2:1 Multiplexer Module

```

1 module Mux_2_to_1 #(parameter DATA_WIDTH = 32) (
2     input wire [DATA_WIDTH-1:0] In0, In1,
3     input wire Sel,
4     output reg [DATA_WIDTH-1:0] Out
5 );
6     always @(*) begin

```

```

7      case (Sel)
8          1'b0: Out = In0;
9          1'b1: Out = In1;
10         default: Out = {DATA_WIDTH{1'bx}};
11     endcase
12 end
13 endmodule

```

### 7.2.2 3:1 Multiplexer

Listing 9: 3:1 Multiplexer Module

```

1 module Mux_3_to_1 #(parameter DATA_WIDTH = 32) (
2     input wire [DATA_WIDTH-1:0] In0, In1, In2,
3     input wire [1:0] Sel,
4     output reg [DATA_WIDTH-1:0] Out
5 );
6     always @(*) begin
7         case (Sel)
8             2'b00: Out = In0;
9             2'b01: Out = In1;
10            2'b10: Out = In2;
11            default: Out = {DATA_WIDTH{1'bx}};
12        endcase
13    end
14 endmodule

```

## 7.3 Multiplexer Network Graph Representation

The multiplexer network is modeled as a directed graph where:

- **Nodes** represent data sources (registers, ALU results, memory) and sinks (ALU inputs, writeback)
- **Edges** represent data paths through multiplexers
- **Edge weights** represent delay, area, or power costs

# 8 Multiplexer Network Optimization

## 8.1 Optimization Algorithms

The optimization process uses several graph-based algorithms:

### 8.1.1 Serial Chain Merging

Identifies and merges consecutive multiplexers in series to reduce stages:

Listing 10: Serial Chain Detection

```
1 def find_serial_chains(graph):
2     chains = []
3     for node in graph.nodes():
4         successors = list(graph.successors(node))
5         if len(successors) == 1:
6             chain = [node]
7             current = successors[0]
8             while len(list(graph.successors(current))) == 1:
9                 chain.append(current)
10                current = list(graph.successors(current))[0]
11            if len(chain) > 1:
12                chains.append(chain)
13
14 return chains
```

### 8.1.2 Stage Reduction

Reduces the number of multiplexer stages in critical paths:

- Identify critical paths through the network
- Merge multiplexers along critical paths
- Balance path delays

### 8.1.3 Redundant Mux Elimination

Removes multiplexers that don't affect functionality:

- Identify multiplexers with unused inputs
- Remove multiplexers that always select the same input
- Simplify constant selections

## 8.2 Optimization Results

### 8.2.1 Before Optimization

Table 1: Non-Optimized Multiplexer Network Metrics

| Metric                      | Value |
|-----------------------------|-------|
| Total Multiplexers          | 5     |
| Total LUTs                  | 128   |
| Total Power (mW)            | 3.40  |
| Critical Path Delay (ns)    | 1.00  |
| Mux Stages in Critical Path | 3     |

### 8.2.2 After Optimization

Table 2: Optimized Multiplexer Network Metrics

| Metric                      | Value |
|-----------------------------|-------|
| Total Multiplexers          | 2     |
| Total LUTs                  | 64    |
| Total Power (mW)            | 1.45  |
| Critical Path Delay (ns)    | 0.30  |
| Mux Stages in Critical Path | 1     |

### 8.2.3 Improvement Summary

Table 3: Optimization Improvements

| Metric      | Before | After | Improvement | Percentage |
|-------------|--------|-------|-------------|------------|
| Mux Stages  | 3      | 1     | -2          | 66.7%      |
| Total Muxes | 5      | 2     | -3          | 60.0%      |
| LUTs        | 128    | 64    | -64         | 50.0%      |
| Power (mW)  | 3.40   | 1.45  | -1.95       | 57.4%      |
| Delay (ns)  | 1.00   | 0.30  | -0.70       | 70.0%      |

## 8.3 Optimized Multiplexer Chains

The following serial chains were optimized:

### 1. Chain: fwdB → alu\_mux

- Original: 2 muxes in series (fwdB 3:1, alu\_mux 2:1)

- Optimized: 1 combined 4:1 mux
- Reduction: 1 mux stage
- Impact: Eliminates intermediate SrcBE signal, reduces delay by 0.4 ns

## 2. Chain: result\_mux → fwdA

- Original: 2 muxes in series
- Optimized: Early forwarding path
- Reduction: 1 mux stage
- Impact: Reduces critical path delay

## 3. Chain: result\_mux → fwdB

- Original: 2 muxes in series
- Optimized: Early forwarding path
- Reduction: 1 mux stage
- Impact: Reduces critical path delay

# 9 Formal Verification using CTL Model Checking

## 9.1 CTL Model Overview

A formal model of the multiplexer network was created in NuSMV format to verify correctness properties.

### 9.1.1 Model Variables

Listing 11: NuSMV Model Variables

```

1 MODULE main
2 VAR
3   -- Multiplexer Select Signals
4   ALUSrcE      : boolean;
5   ForwardAE    : {0, 1, 2};
6   ForwardBE    : {0, 1, 2};
7   ResultSrcW   : {0, 1, 2};
8   PCSrcD       : boolean;
9   JumpD        : boolean;
10
11  -- Data Path Signals
12  RD1E         : boolean;
13  RD2E         : boolean;
14  ALUResultM   : boolean;
15  ResultW      : boolean;
16  ImmExtE     : boolean;
17  PCPlus4F    : boolean;

```

```

18    PCTargetD      : boolean;
19
20    -- Pipeline Stage Validity
21    ValidM         : boolean;
22    ValidW         : boolean;

```

### 9.1.2 Multiplexer Definitions

Listing 12: Multiplexer Output Definitions

```

1 DEFINE
2     -- Forward Mux A output
3     SrcAE_def := case
4         ForwardAE = 0 : RD1E;
5         ForwardAE = 1 : ALUResultM;
6         ForwardAE = 2 : ResultW;
7         TRUE          : RD1E;
8     esac;
9
10    -- Forward Mux B output
11    SrcBE_def := case
12        ForwardBE = 0 : RD2E;
13        ForwardBE = 1 : ALUResultM;
14        ForwardBE = 2 : ResultW;
15        TRUE          : RD2E;
16    esac;
17
18    -- ALU Mux output
19    SrcB_def := case
20        ALUSrcE = FALSE : SrcBE_def;
21        ALUSrcE = TRUE  : ImmExtE;
22        TRUE          : SrcBE_def;
23    esac;

```

## 9.2 CTL Properties

A total of 21 CTL properties were defined to verify three core requirements:

### 9.2.1 Property Categories

1. **Reachability** (9 properties): Verify all data paths are reachable
2. **No Conflicting Selections** (7 properties): Verify no conflicts occur
3. **Proper Data Flow** (5 properties): Verify data consistency

### 9.3 CTL Properties Table

Table 4: Complete CTL Properties List

| ID   | Property                                                                                               | Description                                   |
|------|--------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| P1.1 | AG ( $\text{ValidM} \rightarrow \text{EF} (\text{ForwardAE} = 1 \mid \text{ForwardBE} = 1)$ )          | Forwarding path from MEM stage is reachable   |
| P1.2 | AG ( $\text{ValidW} \rightarrow \text{EF} (\text{ForwardAE} = 2 \mid \text{ForwardBE} = 2)$ )          | Forwarding path from WB stage is reachable    |
| P1.3 | AG EF ( $\text{ForwardAE} = 0 \ \& \ \text{ForwardBE} = 0$ )                                           | Register file read paths are always reachable |
| P1.4 | AG EF ( $\text{ALUSrcE} = \text{TRUE}$ )                                                               | Immediate path is reachable                   |
| P1.5 | AG EF ( $\text{ResultSrcW} = 0$ )                                                                      | ALU result path is reachable                  |
| P1.6 | AG EF ( $\text{ResultSrcW} = 1$ )                                                                      | Memory read path is reachable                 |
| P1.7 | AG EF ( $\text{ResultSrcW} = 2$ )                                                                      | PC+4 path is reachable                        |
| P1.8 | AG EF ( $\text{PCSrcD} = \text{TRUE}$ )                                                                | Branch target path is reachable               |
| P1.9 | AG EF ( $\text{PCSrcD} = \text{FALSE} \ \& \ \text{JumpD} = \text{FALSE}$ )                            | Sequential PC path is reachable               |
| P2.1 | AG ( $\text{ForwardAE} \leq 2$ )                                                                       | ForwardAE has valid selection                 |
| P2.2 | AG ( $\text{ForwardBE} \leq 2$ )                                                                       | ForwardBE has valid selection                 |
| P2.3 | AG ( $\text{ResultSrcW} \leq 2$ )                                                                      | ResultSrcW has valid selection                |
| P2.4 | AG ( $\text{ALUSrcE} = \text{TRUE} \mid \text{ALUSrcE} = \text{FALSE}$ )                               | ALUSrcE is boolean (no conflict)              |
| P2.5 | AG ( $(\text{ForwardAE} = 0 \mid \text{ForwardAE} = 1 \mid \text{ForwardAE} = 2) \ \& \ \dots$ )       | ForwardAE is mutually exclusive               |
| P2.6 | AG ( $(\text{ForwardBE} = 0 \mid \text{ForwardBE} = 1 \mid \text{ForwardBE} = 2) \ \& \ \dots$ )       | ForwardBE is mutually exclusive               |
| P2.7 | AG ( $(\text{ResultSrcW} = 0 \mid \text{ResultSrcW} = 1 \mid \text{ResultSrcW} = 2) \ \& \ \dots$ )    | ResultSrcW is mutually exclusive              |
| P3.1 | AG ( $(\text{ForwardAE} = 0 \rightarrow \text{SrcAE\_def} = \text{RD1E}) \ \& \ \dots$ )               | SrcAE matches forwarding selection            |
| P3.2 | AG ( $(\text{ForwardBE} = 0 \rightarrow \text{SrcBE\_def} = \text{RD2E}) \ \& \ \dots$ )               | SrcBE matches forwarding selection            |
| P3.3 | AG ( $(\text{ALUSrcE} = \text{FALSE} \rightarrow \text{SrcB\_def} = \text{SrcBE\_def}) \ \& \ \dots$ ) | SrcB matches ALU mux selection                |

Table 4 – continued from previous page

| ID   | Property                                                                               | Description                          |
|------|----------------------------------------------------------------------------------------|--------------------------------------|
| P3.4 | $AG ((ResultSrcW = 0 \rightarrow ResultW\_out\_def = ALUResultM) \& \dots)$            | ResultW matches result mux selection |
| P3.5 | $AG (((PCSrcD = FALSE \& JumpD = FALSE) \rightarrow PCNext\_def = PCPlus4F) \& \dots)$ | PCNext matches PC mux selection      |

## 9.4 CTL Operators Used

- **AG** (Always Globally): Property holds in all states along all paths
- **EF** (Eventually Finally): Property will eventually hold in some future state

## 9.5 NuSMV Verification Results

All 21 CTL properties were verified using nuXmv. The verification output shows:

Listing 13: NuSMV Verification Output (Excerpt)

```

1 nuXmv > check_ctlspec
2
3 -- specification AG (EF ALUSrcE = TRUE)    is true
4 -- specification AG (EF ResultSrcW = 0)    is true
5 -- specification AG (EF ResultSrcW = 1)    is true
6 -- specification AG (EF ResultSrcW = 2)    is true
7 -- specification AG (EF PCSrcD = TRUE)    is true
8 -- specification AG ForwardAE <= 2    is true
9 -- specification AG ForwardBE <= 2    is true
10 -- specification AG ResultSrcW <= 2   is true
11 -- specification AG (ALUSrcE = TRUE | ALUSrcE = FALSE)  is true
12 -- specification AG (ValidM -> EF (ForwardAE = 1 | ForwardBE =
13   1))  is true
14 -- specification AG (ValidW -> EF (ForwardAE = 2 | ForwardBE =
15   2))  is true
16 -- specification AG (((ResultSrcW = 0 -> ResultW_out_def =
17   ALUResultM) & ...))  is true
18 -- specification AG (((ForwardAE = 0 -> SrcAE_def = RD1E) &
...))  is true
-- specification AG (((ForwardBE = 0 -> SrcBE_def = RD2E) &
...))  is true
-- specification AG (((PCSrcD = FALSE & JumpD = FALSE) ->
PCNext_def = PCPlus4F) & ...)  is true
-- specification AG ((ALUSrcE = FALSE -> SrcB_def = SrcBE_def) &
...))  is true

```

### 9.5.1 Verification Summary

Table 5: CTL Verification Results

| Category     | Properties | Status            |
|--------------|------------|-------------------|
| Reachability | 9          | All Passed        |
| No Conflicts | 7          | All Passed        |
| Data Flow    | 5          | All Passed        |
| <b>Total</b> | <b>21</b>  | <b>All Passed</b> |

## 10 Simulation and Waveform Analysis

### 10.1 Testbench Design

A comprehensive testbench was created to verify the pipelined processor functionality:

Listing 14: Testbench Structure

```

1 module Pipelined_TB;
2   reg clk, reset;
3   wire [31:0] PC, Instr, WriteData, DataAddr;
4   wire MemWrite;
5
6   Pipelined_Top dut (
7     .clk(clk),
8     .reset(reset),
9     .PC(PC),
10    .Instr(Instr),
11    .WriteData(WriteData),
12    .DataAddr(DataAddr),
13    .MemWrite(MemWrite)
14  );
15
16  initial begin
17    clk = 0;
18    reset = 1;
19    #20 reset = 0;
20    #1000 $finish;
21  end
22
23  always #5 clk = ~clk;
24 endmodule

```

## 10.2 Waveform Analysis



Figure 1: Pipelined Processor Waveform

# 11 Results and Analysis

## 11.1 Performance Metrics Comparison

The optimization achieved significant improvements across all metrics:

### 11.1.1 Area Reduction

- 50% reduction in LUTs ( $128 \rightarrow 64$ )
- 60% reduction in total multiplexers ( $5 \rightarrow 2$ )
- Significant reduction in routing resources

### 11.1.2 Power Reduction

- 57.4% reduction in power consumption ( $3.40 \text{ mW} \rightarrow 1.45 \text{ mW}$ )
- Reduced switching activity due to fewer multiplexer stages
- Lower dynamic power consumption

### 11.1.3 Delay Reduction

- 70% reduction in critical path delay ( $1.00 \text{ ns} \rightarrow 0.30 \text{ ns}$ )
- 66.7% reduction in multiplexer stages ( $3 \rightarrow 1$ )
- Improved maximum operating frequency

## 11.2 Functional Correctness

- All test cases passed in simulation
- All 21 CTL properties verified successfully
- No functional regressions after optimization

## 12 Discussion

### 12.1 Optimization Effectiveness

The graph-based optimization approach proved highly effective:

- Serial chain merging eliminated unnecessary intermediate stages
- Critical path optimization significantly improved timing
- The optimization maintained functional correctness

### 12.2 Formal Verification Benefits

CTL model checking provided:

- Exhaustive verification of all possible states
- Guaranteed correctness of multiplexer selections
- Confidence in design correctness before fabrication

### 12.3 Limitations

- Simplified model for formal verification (boolean abstraction)
- Estimated metrics (actual synthesis may vary)
- Optimization focused on multiplexer network only

## 13 Conclusion

This project successfully designed, implemented, and optimized a 5-stage pipelined RISC-V processor with a focus on multiplexer network optimization. Key achievements include:

1. Successful transformation from single-cycle to pipelined architecture
2. Implementation of hazard detection and forwarding mechanisms
3. Significant optimization improvements:
  - 66.7% reduction in multiplexer stages
  - 50% reduction in LUTs
  - 57.4% reduction in power consumption
  - 70% reduction in critical path delay

4. Complete formal verification using CTL model checking (21 properties, all passed)
5. Comprehensive simulation and waveform analysis

The project demonstrates the effectiveness of graph-based optimization techniques for hardware design and the value of formal verification in ensuring design correctness.

## 14 References

1. Cimatti, A., et al. (2002). "NuSMV 2: An OpenSource Tool for Symbolic Model Checking." *CAV 2002*.
2. NetworkX Development Team. (2021). *NetworkX Documentation*. <https://networkx.org/>
3. Mandal, C. (1995). *Complexity Analysis and Algorithms for Data Path Synthesis*. Ph.D. Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India.