

# CSE 112: Computer Organization

---

Instructor: Sujay Deb

## Lecture 19



INDRAPRASTHA INSTITUTE of  
INFORMATION TECHNOLOGY  
**DELHI**



# Pipelined Processor Control Hazards

# Control Hazards

- **beq:**
  - Branch **not determined until the Execute stage** of pipeline
  - **Instructions** after branch **fetched before branch occurs**
  - These **2 instructions must be flushed** if branch happens

# Control Hazards



## Branch misprediction penalty:

The number of instructions flushed when a branch is taken (in this case, 2 instructions)

# Control Hazards: Flushing Logic

- If branch is taken in execute stage, need to flush the instructions in the Fetch and Decode stages
  - Do this by clearing Decode and Execute Pipeline registers using *FlushD* and *FlushE*
- **Equations:**

$$\textit{FlushD} = \textit{PCSrcE}$$

$$\textit{FlushE} = \textit{lwStall} \text{ OR } \textit{PCSrcE}$$

# Control Hazards: Flushing Hardware



# RISC-V Pipelined Processor with Hazard Unit



# Summary of Hazard Logic

## Data hazard logic (shown for SrcA of ALU):

```
if      ((Rs1E == RdM) AND RegWriteM) AND (Rs1E != 0)    // Case 1  
        ForwardAE = 10  
else if ((Rs1E == RdW) AND RegWriteW) AND (Rs1E != 0)    // Case 2  
        ForwardAE = 01  
else            ForwardAE = 00                                // Case 3
```

## Load word stall logic:

$lwStall = ((Rs1D == RdE) \text{ OR } (Rs2D == RdE)) \text{ AND } ResultSrcE_0$   
 $StallF = StallD = lwStall$

## Control hazard flush:

$FlushD = PCSrcE$   
 $FlushE = lwStall \text{ OR } PCSrcE$

# Class Interaction # 21

---



# Pipelined Performance

# Pipelined Processor Performance Example

- SPECINT2000 benchmark:
  - 25% loads
  - 10% stores
  - 13% branches
  - 52% R-type
- Suppose:
  - 40% of loads used by next instruction
  - 50% of branches mispredicted
- **What is the average CPI?** (Ideally it's 1, but...)

# Pipelined Processor Performance Example

Pipelined processor critical path:

$$T_{c\_pipelined} = \text{max of}$$

$$t_{pcq} + t_{mem} + t_{setup} \quad \text{Fetch}$$

$$2(t_{RFread} + t_{setup}) \quad \text{Decode}$$

$$t_{pcq} + 4t_{mux} + t_{ALU} + t_{AND-OR} + t_{setup} \quad \text{Execute}$$

$$t_{pcq} + t_{mem} + t_{setup} \quad \text{Memory}$$

$$2(t_{pcq} + t_{mux} + t_{RFwrite}) \quad \text{Writeback}$$

- Decode and Writeback stages **both use the register file** in each cycle
- So each stage gets half of the cycle time ( $T_c/2$ ) to do their work
- Or, stated a different way, **2x of their work** must fit in a cycle ( $T_c$ )

# Pipelined Critical Path: Execute Stage



# Pipelined Performance Example

| Element                | Parameter            | Delay (ps) |
|------------------------|----------------------|------------|
| Register clock-to-Q    | $t_{pcq\_PC}$        | 40         |
| Register setup         | $t_{\text{setup}}$   | 50         |
| Multiplexer            | $t_{\text{mux}}$     | 30         |
| AND-OR gate            | $t_{\text{AND-OR}}$  | 20         |
| ALU                    | $t_{\text{ALU}}$     | 120        |
| Decoder (Control Unit) | $t_{\text{dec}}$     | 25         |
| Extend unit            | $t_{\text{dec}}$     | 35         |
| Memory read            | $t_{\text{mem}}$     | 200        |
| Register file read     | $t_{RF\text{read}}$  | 100        |
| Register file setup    | $t_{RF\text{setup}}$ | 60         |

$$T_{c\_pipelined} = t_{pcq} + 4t_{\text{mux}} + t_{\text{ALU}} + t_{\text{AND-OR}} + t_{\text{setup}}$$

=

# Pipelined Performance Example

Program with 100 billion instructions

$$\begin{aligned}\text{Execution Time} &= (\# \text{ instructions}) \times \text{CPI} \times T_c \\ &= (100 \times 10^9)(1.23)(350 \times 10^{-12}) \\ &= \text{43 seconds}\end{aligned}$$

# Processor Performance Comparison

| Processor    | Execution Time (seconds) | Speedup (single-cycle as baseline) |
|--------------|--------------------------|------------------------------------|
| Single-cycle | 75                       | 1                                  |
| Pipelined    | 43                       | 1.7                                |

# Class Interaction # 22

---

