

# ECE/CSE 511: Computer Architecture

---

## Lecture 10: Superscalar Processor



INDRAPRASTHA INSTITUTE of  
INFORMATION TECHNOLOGY  
**DELHI**



# Baseline 2-Way In-Order Superscalar Processor





# Issue Logic Pipeline Diagrams

|     |   |   |    |    |    |    |   |
|-----|---|---|----|----|----|----|---|
| OpA | F | D | A0 | A1 | W  |    |   |
| OpB | F | D | B0 | B1 | W  |    |   |
| OpC |   | F | D  | A0 | A1 | W  |   |
| OpD |   | F | D  | B0 | B1 | W  |   |
| OpE |   |   | F  | D  | A0 | A1 | W |
| OpF |   |   | F  | D  | B0 | B1 | W |

CPI = 0.5 (IPC = 2)

Double Issue Pipeline  
Can have two instructions in  
same stage at same time

|       |   |   |    |    |    |    |   |
|-------|---|---|----|----|----|----|---|
| ADDIU | F | D | A0 | A1 | W  |    |   |
| LW    | F | D | B0 | B1 | W  |    |   |
| LW    | F | D | B0 | B1 | W  |    |   |
| ADDIU | F | D | A0 | A1 | W  |    |   |
| LW    |   | F | D  | B0 | B1 | W  |   |
| LW    |   | F | D  | D  | B0 | B1 | W |

Instruction Issue Logic swaps from  
natural position

Structural  
Hazard



# With Alignment Constraints

| Cyc | Addr  | Instr   |
|-----|-------|---------|
| ?   | 0x000 | OpA     |
| ?   | 0x004 | OpB     |
| ?   | 0x008 | OpC     |
| ?   | 0x00C | J 0x100 |
| ... |       |         |
| ?   | 0x100 | OpD     |
| ?   | 0x104 | J 0x204 |
| ... |       |         |
| ?   | 0x204 | OpE     |
| ?   | 0x208 | J 0x30C |
| ... |       |         |
| ?   | 0x30C | OpF     |
| ?   | 0x310 | OpG     |
| ?   | 0x314 | OpH     |

|     |   |   |   |
|-----|---|---|---|
| 0   | 0 | 1 | 1 |
| ... |   |   |   |
| 2   | 2 |   |   |
| ... |   |   |   |
| 3   | X | 3 | X |
| ... |   |   |   |
| 5   | X | 5 |   |
| ... |   |   |   |
| 6   | 6 |   |   |



# With Alignment Constraints

| Cyc | Addr  | Instr   | F | D | A0 | A1 | W |
|-----|-------|---------|---|---|----|----|---|
| 1   | 0x000 | OpA     | F | D | A0 | A1 | W |
| 1   | 0x004 | OpB     | F | D | B0 | B1 | W |
| 2   | 0x008 | OpC     | F | D | B0 | B1 | W |
| 2   | 0x00C | J 0x100 | F | D | A0 | A1 | W |
| 3   | 0x100 | OpD     | F | D | B0 | B1 | W |
| 3   | 0x104 | J 0x204 | F | D | A0 | A1 | W |
| 4   | 0x200 | ?       | F | - | -  | -  | - |
| 4   | 0x204 | OpE     | F | D | A0 | A1 | W |
| 5   | 0x208 | J 0x30C | F | D | A0 | A1 | W |
| 5   | 0x20C | ?       | F | - | -  | -  | - |
| 6   | 0x308 | ?       | F | - | -  | -  | - |
| 6   | 0x30C | OpF     | F | D | A0 | A1 | W |
| 7   | 0x310 | OpG     | F | D | A0 | A1 | W |
| 7   | 0x314 | OpH     | F | D | B0 | B1 | W |

# Bypassing in Superscalar Pipelines



# Breaking Decode and Issue Stage



- Bypass Network can become very complex
- Can motivate breaking Decode and Issue Stage

D = Decode, Possibly resolve structural Hazards

I = Register file read, Bypassing, Issue/Steer  
Instructions to proper unit

OpA F D I A0 A1 W

OpB F D I B0 B1 W

OpC F D I A0 A1 W

OpD F D I B0 B1 W

---

# Exceptions



INDRAPRASTHA INSTITUTE *of*  
INFORMATION TECHNOLOGY  
**DELHI**



# Interrupts:

altering the normal flow of control



An external or internal event that needs to be processed by another (system) program. The event is usually unexpected or rare from program's point of view.



# Causes of Exceptions

Interrupt: an *event* that requests the attention of the processor

- Asynchronous: an *external event*
  - input/output device service request
  - timer expiration
  - power disruptions, hardware failure
- Synchronous: an *internal exception*  
*(a.k.a.exceptions/trap)*
  - undefined opcode, privileged instruction
  - arithmetic overflow, FPU exception
  - misaligned memory access
  - *virtual memory exceptions*: page faults, TLB misses, protection violations
  - *software exceptions*: system calls, e.g., jumps into kernel

# Asynchronous Interrupts:

## invoking the interrupt handler



- An I/O device requests attention by asserting one of the *prioritized interrupt request lines*
- When the processor decides to process the interrupt
  - It stops the current program at instruction  $I_i$ , completing all the instructions up to  $I_{i-1}$  (a *precise interrupt*)
  - It saves the PC of instruction  $I_i$  in a special register (EPC)
  - It disables interrupts and transfers control to a designated interrupt handler running in the kernel mode



# Interrupt Handler

---

- Saves EPC before re-enabling interrupts to allow nested interrupts ⇒
  - need an instruction to move EPC into GPRs
  - need a way to mask further interrupts at least until EPC can be saved
- Needs to read a *status register* that indicates the cause of the interrupt
- Uses a special indirect jump instruction RFE (*return-from-exception*) to resume user code, this:
  - enables interrupts
  - restores the processor to the user mode
  - restores hardware status and control state



# Synchronous Interrupts

---

- A synchronous interrupt (exception) is caused by a *particular instruction*
- In general, the instruction cannot be completed and needs to be *restarted* after the exception has been handled
  - requires undoing the effect of one or more partially executed instructions
- In the case of a system call trap, the instruction is considered to have been completed
  - syscall is a special jump instruction involving a change to privileged kernel mode
  - Handler resumes at instruction after system call

# Exception Handling 5-Stage Pipeline



- How to handle multiple simultaneous exceptions in different pipeline stages?
- How and where to handle external asynchronous interrupts?

# Class Interaction # 10

---



# Exception Handling



5-Stage Pipeline





## Exception Handling 5-Stage Pipeline

---

- Hold exception flags in pipeline until commit point (M stage)
- Exceptions in earlier pipe stages override later exceptions *for a given instruction*
- Inject external interrupts at commit point (override others)
- If exception at commit: update Cause and EPC registers, kill all stages, inject handler PC into fetch stage



# Speculating on Exceptions

---

- Prediction mechanism
  - Exceptions are rare, so simply predicting no exceptions is very accurate!
- Check prediction mechanism
  - Exceptions detected at end of instruction execution pipeline, special hardware for various exception types
- Recovery mechanism
  - Only write architectural state at commit point, so can throw away partially executed instructions after exception
  - Launch exception handler after flushing pipeline
- Bypassing allows use of uncommitted instruction results by following instructions

# Exception Pipeline Diagram

|                                     | time            |                 |                 |                 |                 |                 |                 |                 |                 |           |
|-------------------------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------|
|                                     | t0              | t1              | t2              | t3              | t4              | t5              | t6              | t7              | ...             |           |
| (I <sub>1</sub> ) 096: ADD          | IF <sub>1</sub> | ID <sub>1</sub> | EX <sub>1</sub> | MA <sub>1</sub> | nop             |                 |                 |                 |                 | overflow! |
| (I <sub>2</sub> ) 100: XOR          |                 | IF <sub>2</sub> | ID <sub>2</sub> | EX <sub>2</sub> | nop             | nop             |                 |                 |                 |           |
| (I <sub>3</sub> ) 104: SUB          |                 |                 | IF <sub>3</sub> | ID <sub>3</sub> | nop             | nop             | nop             |                 |                 |           |
| (I <sub>4</sub> ) 108: ADD          |                 |                 |                 | IF <sub>4</sub> | nop             | nop             | nop             | nop             |                 |           |
| (I <sub>5</sub> ) Exc. Handler code |                 |                 |                 |                 | IF <sub>5</sub> | ID <sub>5</sub> | EX <sub>5</sub> | MA <sub>5</sub> | WB <sub>5</sub> |           |

| Resource Usage | time           |                |                |                |                |                |                |                |                |  |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|--|
|                | t0             | t1             | t2             | t3             | t4             | t5             | t6             | t7             | ...            |  |
| IF             | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>5</sub> |                |                |                |  |
| ID             |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | nop            |                |                |                |                |  |
| EX             |                |                | I <sub>1</sub> | I <sub>2</sub> | nop            | nop            | I <sub>5</sub> | I <sub>5</sub> |                |  |
| MA             |                |                |                | I <sub>1</sub> | nop            | nop            | nop            |                |                |  |
| WB             |                |                |                |                | nop            | nop            | nop            | nop            | I <sub>5</sub> |  |

# ECE/CSE 511: Computer Architecture

---

## Out-of-Order Processors



INDRAPRASTHA INSTITUTE *of*  
INFORMATION TECHNOLOGY  
**DELHI**





# Out-Of-Order (OOO) Introduction

| Name | Frontend | Issue | Writeback | Commit |                                                              |
|------|----------|-------|-----------|--------|--------------------------------------------------------------|
| I4   | IO       | IO    | IO        | IO     | Fixed Length Pipelines Scoreboard                            |
| I2O2 | IO       | IO    | 000       | 000    | Scoreboard                                                   |
| I2OI | IO       | IO    | 000       | IO     | Scoreboard,<br>Reorder Buffer, and Store Buffer              |
| I03  | IO       | 000   | 000       | 000    | Scoreboard and Issue Queue                                   |
| IO2I | IO       | 000   | 000       | IO     | Scoreboard, Issue Queue, Reorder<br>Buffer, and Store Buffer |