

**CPE 431/531**

**Chapter 4 – The Processor**

**Dr. Rhonda Kay Gaede**



## 4.1 Implementation Basics

---

- Performance Factors
  - Instruction Count
  - Cycle Time
  - CPI
- A Basic MIPS Implementation
  - Simple subset: **lw**, **sw**, **add**, **sub**, **and**, **or**, **slt**,  
**beq**, **j**

## 4.1 Implementation Overview

---

- All instructions begin the same way

- ---
- ---
- ---

- Then, it depends on the instruction

- **lw**
- **sw**
- **add et.al.**
- **beq**

# 4.1 Datapath



# 4.1 Datapath + Control



## 4.2 Classes and Values

---

- Two classes of logic

- -



- Two logic values

- -



## 4.2 Sequential Elements

- Register with write control
  - Only updates on clock edge when write control input is 1
  - Used when stored value is required later.



## 4.2 Clocking Methodology

- A clocking methodology defines when signals can be \_\_\_\_\_ and when they can be \_\_\_\_\_.
- We assume an \_\_\_\_\_ clocking methodology.
- Longest delay determines clock period



# 4.3 Instruction Fetch and Default Sequencing



a. Instruction memory

b. Program counter

c. Adder



## 4.3 R-type Instruction Requirements

- Read \_\_\_\_ register operands
- Perform \_\_\_\_\_ operation
- \_\_\_\_\_ register results



a. Registers



b. ALU

## 4.3 lw / sw Instruction Requirements

- Read register operands
- Calculate \_\_\_\_\_ using 16-bit offset
- Load: \_\_\_\_\_ memory and \_\_\_\_\_ register
- Store: \_\_\_\_\_ register value to \_\_\_\_\_



a. Data memory unit

b. Sign extension unit

## 4.3 beq Instruction Requirements

- Read register operands
- \_\_\_\_\_ operands
- Calculate \_\_\_\_\_



## 4.3 Creating a Single Datapath: R-type + lw / sw

- We want a \_\_\_\_\_ implementation.
- Separate \_\_\_\_\_ and \_\_\_\_\_ memories are required.
- When I say \_\_\_\_\_, you say \_\_\_\_\_.



## 4.3 Adding beq to the R-type + lw/sw Datapath



# 4.4 Defining Necessary Control



# 4.4 Adding a Control Unit

| Instruction | RegDst | ALUSrc | Memto-Reg | Reg-Write | Mem-Read | Mem-Write | Branch | ALUOp1 | ALUOp0 |
|-------------|--------|--------|-----------|-----------|----------|-----------|--------|--------|--------|
| R-format    | 1      | 0      | 0         | 1         | 0        | 0         | 0      | 1      | 0      |
| lw          | 0      | 1      | 1         | 1         | 1        | 0         | 0      | 0      | 0      |
| sw          | X      | 1      | X         | 0         | 0        | 1         | 0      | 0      | 0      |
| beq         | X      | 0      | X         | 0         | 0        | 0         | 1      | 0      | 1      |



# 4.4 R-type Instruction Execution



# 4.4 lw Instruction Execution



# 4.4 beq Instruction Execution



# 4.4 Implementing Jumps



## 4.4 R-type Instruction Timing



# 4.4 1w Instruction Timing



## 4.4 SW Instruction Timing



# 4.4 beq Instruction Timing



## 4.4 ↴ Instruction Timing

---



## 4.5 An Overview of Pipelining

---

- Pipelining is an implementation technique in which \_\_\_\_\_ are \_\_\_\_\_ in execution.
- Pipelining helps \_\_\_\_\_, not individual \_\_\_\_\_.
- You can't \_\_\_\_\_ a stage.

## 4.5 The Laundry Analogy



### Speedup

- One Load



- Four Loads

- N Loads as  $N \rightarrow \infty$

# 4.5 MIPS Processor Stages

---

- Instruction fetch (IF)
- Instruction decode and register read (ID)
- Execution (calculate address) (EX)
- Memory access (MEM)
- Register write (WB)

# 4.5 Single Cycle vs. Pipelined Performance

- Consider: **lw, sw, add, sub, and, or, slt, beq**
- Operation times: memory, ALU 200 ps, register 100 ps

| Instr    | Instruction fetch | Register read | ALU op | Memory access | Register write | Total time |
|----------|-------------------|---------------|--------|---------------|----------------|------------|
| lw       | 200ps             | 100 ps        | 200ps  | 200ps         | 100 ps         | 800ps      |
| sw       | 200ps             | 100 ps        | 200ps  | 200ps         |                | 700ps      |
| R-format | 200ps             | 100 ps        | 200ps  |               | 100 ps         | 600ps      |
| beq      | 200ps             | 100 ps        | 200ps  |               |                | 500ps      |

# 4.5 Single Cycle vs. Pipelined Timeline



## 4.5 Designing Instruction Sets for Pipelining

---

- All MIPS instructions are the same \_\_\_\_\_.
- MIPS has only a few \_\_\_\_\_ \_\_\_\_\_.
- Memory operands appear only in \_\_\_\_\_.
- Operands must be \_\_\_\_\_ in memory.

## 4.5 Pipeline Hazards

---

- Structural Hazard - not enough \_\_\_\_\_.
- Data Hazards – one \_\_\_\_\_ needs the \_\_\_\_\_ of another
- Control Hazard – \_\_\_\_\_ aren't made
  - Conservative Approach – \_\_\_\_\_
  - Alternative Approach – \_\_\_\_\_

## 4.5 Data Hazards

---

- A \_\_\_\_\_ hazard occurs when a \_\_\_\_\_ has not yet been \_\_\_\_\_ to the \_\_\_\_\_.
- Consider
  - add \$s0, \$t0, \$t1
  - sub \$t2, \$s0, \$t3
- Though the result is not \_\_\_\_\_ until \_\_\_, it is available after the add has finished the EX stage, \_\_\_\_\_ it to the right place.

## 4.5 Two Instruction Forwarding

- Forwarding paths are \_\_\_\_\_ only if the destination stage is \_\_\_\_\_ than the source stage.



## 4.5 More on Forwarding

- Forwarding can't fix everything.
- Consider

**lw**            \$s0 , 20(\$t1)

**sub**            \$t2 , \$s0 , \$t3



## 4.2 The Compiler Can Help

---

- Consider the following:

**a = b + e;**

**c = b + f;**

```
lw      $t1, 0($t0)
lw      $t2, 4($t0)
add   $t3, $t1, $t2
sw      $t3, 12($t0)    ➔
lw      $t4, 8($t0)
add   $t5, $t1, $t4
sw      $t5, 16($t0)
```

## 4.2 Stalling on Control Hazards



Performance of “Stall on Branch”

## 4.2 Prediction for Control Hazards



## 4.5 More Realistic Branch Prediction

---

- \_\_\_\_\_ branch prediction
  - Based on \_\_\_\_\_ branch behavior
  - Example: \_\_\_\_\_ branches
    - Predict \_\_\_\_\_ branches \_\_\_\_\_
    - Predict \_\_\_\_\_ branches \_\_\_\_\_
- \_\_\_\_\_ branch prediction
  - \_\_\_\_\_ measures \_\_\_\_\_ branch behavior
    - Record \_\_\_\_\_ of each branch
  - Assume \_\_\_\_\_ behavior will continue the trend
    - When \_\_\_\_\_, stall while re-fetching, and update \_\_\_\_\_

# 4.6 Identifying the Pipeline Stages



# 4.6 Adding Pipeline Registers



## 4.6 1w Instruction Execution: IF Stage



# 4.6 `lw` Instruction Execution: ID Stage



# 4.6 `lw` Instruction Execution: EX Stage



# 4.6 $l_w$ Instruction Execution: MEM Stage



# 4.6 $l_w$ Instruction Execution: WB Stage

$l_w$   
 Write back



# 4.6 Corrected Datapath for $l_w$



# 4.6 SW Instruction Execution: EX Stage



# 4.6 SW Instruction Execution: MEM Stage



# 4.6 SW Instruction Execution: WB Stage



# 4.6 Stylized Multiple Clock Cycle Diagrams



# 4.6 Traditional Multiple Clock Cycle Diagrams



# 4.6 Alternate Multiple Clock Cycle Diagrams

| Cycle | IF       | ID       | EX       | MEM      | WB       |
|-------|----------|----------|----------|----------|----------|
| 1     | lw \$10  | ahead1   | ahead2   | ahead3   | ahead4   |
| 2     | sub \$11 | lw \$10  | ahead1   | ahead2   | ahead3   |
| 3     | add \$12 | sub \$11 | lw \$10  | ahead1   | ahead2   |
| 4     | lw \$13  | add \$12 | sub \$11 | lw \$10  | ahead1   |
| 5     | add \$14 | lw \$13  | add \$12 | sub \$11 | lw \$10  |
| 6     | after1   | add \$14 | lw \$13  | add \$12 | sub \$11 |
| 7     | after2   | after1   | add \$14 | lw \$13  | add \$12 |
| 8     | after3   | after2   | after1   | add \$14 | lw \$13  |
| 9     | after4   | after3   | after2   | after1   | add \$14 |

# 4.6 Cycle 5 Slice



# 4.6 Identifying Pipelined Control Lines



# 4.6 Generating and Saving Control Lines



- EX
- MEM
- WB

# 4.6 Putting it all Together



## 4.7 Data Dependencies

---

In the previous example, there were no data dependencies.  
Now, the rest of the story.

```
1  sub    $2,  $1,  $3
2  and    $12,  $2,  $5
3  or     $13,  $6,  $2,
4  add    $14,  $2,  $2
5  sw     $15, 100($2)
```

# 4.7 Which Data Dependencies are Hazards



## 4.7 Forwarding in Action



# 4.7 Forwarding Paths



b. With forwarding

## 4.7 Forwarding Unit: Classifying Hazards

---

- Type 1 – The information needed in the \_\_\_\_\_ stage by an instruction is the result of the instruction \_\_\_\_\_ stage ahead (found in the \_\_\_\_\_ pipeline register)
  - a. The information is needed in R[rs]
  - b. The information is needed as R[rt]
  
- Type 2 – The information needed in the \_\_\_\_\_ stage by an instruction is the result of the instruction \_\_\_\_\_ stages ahead (found in the \_\_\_\_\_ pipeline register)
  - a. The information is needed in R[rs]
  - b. The information is needed as R[rt]

## 4.7 Forwarding Unit: Type 1 Hazards

---

### Type 1 Hazard

If (ahead1 writes to register and (ahead1 doesn't have \$zero as the destination register) and (ahead1 is writing to a register read by Current Instruction)) Forward from EX/MEM a) Forward to RegisterRs, b) Forward to RegisterRd

ahead1 writes to register

ahead1 doesn't have \$zero as the destination register

ahead1 is writing to a register read by Current Instruction

- a)
- b)

## 4.7 Forwarding Unit: Type 2 Hazards

---

### Type 2 Hazard

If (ahead2 writes to register and (ahead2 doesn't have \$zero as the destination register) and (ahead2 is writing to a register read by Current Instruction)) Forward from MEM/WB  
a) Forward to RegisterRs, b) Forward to RegisterRd

ahead2 writes to register

ahead2 doesn't have \$zero as the destination register

ahead2 is writing to a register read by Current Instruction

- a)
- b)

# 4.7 Double Data Hazard Jeopardy

---

Consider

```
add    $1,  $1,  $2
add    $1,  $1,  $3
add    $1,  $1,  $4
```

## Type 2 Hazard

If (ahead2 writes to register and (ahead2 doesn't have \$zero as the destination register) and (ahead2 is writing to a register read by Current Instruction ) and (no Type 1 Hazard)) Forward from MEM/WB a) Forward to RegisterRs, b) Forward to RegisterRd

# 4.7 Forwarding Datapath with Control



# 4.7 EX Forwarding Completed



# 4.7 Forwarding Can't Always Save the Day



# 4.7 Stalling in Action



# 4.7 Hazard Detection Unit



# 4.8 Control Hazard Example



## 4.8 Approaches to Control Hazards

---

- Assume Branch Not Taken
  - \_\_\_\_\_
  - \_\_\_\_\_
- Reducing the Delay of Branches – Two Items Needed
  - \_\_\_\_\_
  - \_\_\_\_\_
  - \_\_\_\_\_
  - \_\_\_\_\_

# 4.8 Changes to Reduce Branch Delay



# 4.8 Branch Taken Example



## 4.8 Data Hazards for Branch (1)

If a \_\_\_\_\_ register is a \_\_\_\_\_ of 2<sup>nd</sup> or 3<sup>rd</sup> preceding ALU instruction

add \$1, \$2, \$3



add \$4, \$5, \$6



...



beq \$1, \$4, target



Can resolve using \_\_\_\_\_

## 4.8 Data Hazards for Branches (2)

- If a comparison register is a destination of \_\_\_\_\_ ALU instruction or \_\_\_\_\_ instruction
  - Need \_\_\_\_\_

lw \$1, addr



add \$4, \$5, \$6



beq stalled



beq \$1, \$4, target



## 4.8 Data Hazards for Branches (3)

- If a comparison register is a destination of \_\_\_\_\_ instruction
  - Need \_\_\_\_\_ -



## 4.8 Dynamic Branch Prediction

---

- In \_\_\_\_\_ and \_\_\_\_\_ pipelines, branch \_\_\_\_\_ is more significant
- Use \_\_\_\_\_ prediction
  - Branch \_\_\_\_\_ table
  - \_\_\_\_\_ by \_\_\_\_\_ branch instruction addresses
  - Stores \_\_\_\_\_
  - To execute a branch
    - Check \_\_\_\_\_, expect the \_\_\_\_\_ outcome
    - Fetch from \_\_\_\_\_
    - Correct if necessary and \_\_\_\_\_ table

## 4.8 Shortcoming of 1-Bit History

- Mispredict as taken on \_\_\_\_\_ iteration of \_\_\_\_\_ loop
- Then mispredict as not taken on \_\_\_\_\_ iteration of \_\_\_\_\_ loop \_\_\_\_\_

Outer   Inner   Predict   Actual



- Inner loop branches mispredicted twice!

## 4.9 Exceptions and Interrupts

---

- \_\_\_\_\_ events requiring change in \_\_\_\_\_ of control
  - Different ISAs use the terms differently
- Exception
  - Arises \_\_\_\_\_, e.g., undefined opcode, overflow, syscall, ...
- Interrupt
  - From an \_\_\_\_\_ I/O controller
- Dealing with them without sacrificing performance is hard

## 4.9 Handling Exceptions

---

- In MIPS, exceptions managed by a System Control Coprocessor (CP0)
- Save PC of \_\_\_\_\_ (or \_\_\_\_\_) instruction
  - In MIPS: Exception Program Counter (EPC)
- Save indication of the problem
  - In MIPS: \_\_\_\_\_ register
  - We'll assume 1-bit, 0 for undefined opcode, 1 for overflow
- Jump to handler at \_\_\_\_\_

## 4.9 An Alternate Mechanism

---

- \_\_\_\_\_ Interrupts
  - Handler \_\_\_\_\_ determined by the \_\_\_\_\_
- Example:
  - Undefined opcode: C000 0000
  - Overflow: C000 0020
  - ...: C000 0040
- Instructions \_\_\_\_\_ either
  - \_\_\_\_\_ the interrupt, or
  - \_\_\_\_\_ to \_\_\_\_\_ handler

## 4.9 Handler Actions

---

- \_\_\_\_\_, and transfer to \_\_\_\_\_ handler
- If \_\_\_\_\_
  - Take corrective action
  - use \_\_\_\_\_ to return to \_\_\_\_\_
- Otherwise
  - \_\_\_\_\_ program
  - Report \_\_\_\_\_ using EPC, cause, ...

# 4.9 Pipeline with Exceptions



## 4.9 Exception Properties

---

- \_\_\_\_\_ exceptions
  - Pipeline can \_\_\_\_\_ the instruction
  - Handler executes, then returns to the instruction
    - \_\_\_\_\_
- PC saved in \_\_\_\_\_ register
  - Identifies \_\_\_\_\_ instruction
  - Actually \_\_\_\_\_ is saved
    - \_\_\_\_\_ must adjust

## 4.9 Exception Example (1)

---

- Exception on add in

```
40      sub    $11,   $2,   $4
44      and    $12,   $2,   $5
48      or     $13,   $2,   $6
4C      add    $1,    $2,   $1
50      slt    $15,   $6,   $7
54      lw     $16,   50($7)
```

...

- Handler

```
80000180      sw     $25,  1000($0)
80000184      sw     $26,  1004($0)
```

# 4.9 Exception Example (2)



# 4.9 Exception Example (3)



## 4.9 Multiple Exceptions

---

- Pipelining \_\_\_\_\_ multiple instructions
  - Could have \_\_\_\_\_ exceptions \_\_\_\_\_
- Simple approach: deal with exception from \_\_\_\_\_ instruction
  - Flush \_\_\_\_\_ instructions
  - \_\_\_\_\_ exceptions
- In complex pipelines
  - \_\_\_\_\_ instructions issued \_\_\_\_\_
  - \_\_\_\_\_ completion
  - Maintaining \_\_\_\_\_ exceptions is \_\_\_\_\_!

## 4.9 Imprecise Exceptions

---

- Just \_\_\_\_\_ pipeline and \_\_\_\_\_, including exception cause(s)
- Let the handler work out
  - \_\_\_\_\_
  - \_\_\_\_\_
- Simplifies \_\_\_\_\_, but \_\_\_\_\_  
\_\_\_\_\_ software
- Not feasible for complex multiple-issue  
out-of-order pipelines

## 4.10 Parallelism via Instructions

---

- Pipelining: executing multiple instructions in parallel
- To increase ILP ( \_\_\_\_\_ )
  - Deeper pipeline
    - Less work per stage  $\Rightarrow$  shorter clock cycle
  - Multiple \_\_\_\_\_
    - \_\_\_\_\_ pipeline stages  $\Rightarrow$  \_\_\_\_\_ pipelines
    - Start \_\_\_\_\_ instructions per clock cycle
    - CPI < 1, so use \_\_\_\_\_
    - E.g., 4GHz 4-way multiple-issue, 16 BIPS, peak CPI = 0.25, peak \_\_\_\_\_ = 4
    - But \_\_\_\_\_ reduce this in practice

## 4.10 Multiple Issue

---

- \_\_\_\_\_ multiple issue
  - \_\_\_\_\_ groups instructions to be \_\_\_\_\_ together
  - \_\_\_\_\_ them into \_\_\_\_\_
  - \_\_\_\_\_ detects and \_\_\_\_\_ hazards
- \_\_\_\_\_ multiple issue
  - \_\_\_\_\_ examines instruction stream and \_\_\_\_\_  
\_\_\_\_\_ to issue \_\_\_\_\_ cycle
  - \_\_\_\_\_ can help by \_\_\_\_\_ instructions
  - CPU \_\_\_\_\_ hazards using advanced techniques at  
\_\_\_\_\_

## 4.10 Speculation

---

- “Guess” what to do with an instruction
  - Start operation as soon as possible
  - \_\_\_\_\_ whether guess was right
    - If so, \_\_\_\_\_ the operation
    - If not, \_\_\_\_\_ and do the right thing
- \_\_\_\_\_ to static and dynamic multiple issue
- Examples
  - Speculate on branch outcome
    - Roll back if path taken is different
  - Speculate on load
    - Roll back if location is updated

## 4.10 Compiler/Hardware Speculation

---

- Compiler can \_\_\_\_\_ instructions
  - e.g., move \_\_\_\_\_ before \_\_\_\_\_
  - Can include \_\_\_\_\_ instructions to \_\_\_\_\_ from \_\_\_\_\_ guess
- Hardware can \_\_\_\_\_ for instructions to execute
  - \_\_\_\_\_ results until it determines they are \_\_\_\_\_
  - \_\_\_\_\_ buffers on \_\_\_\_\_ speculation

## 4.10 Static Multiple Issue

---

- \_\_\_\_\_ groups instructions into \_\_\_\_\_
  - \_\_\_\_\_ of instructions that can be issued on a \_\_\_\_\_ cycle
  - Determined by \_\_\_\_\_ required
- Think of an \_\_\_\_\_ as a very long instruction
  - Specifies \_\_\_\_\_ operations
  - ⇒ \_\_\_\_\_ (VLIW)

## 4.10 Scheduling Static Multiple Issue

---

- Compiler must remove \_\_\_\_\_ hazards
  - \_\_\_\_\_ instructions into issue packets
  - No dependencies \_\_\_\_\_ a packet
  - Possibly some dependencies \_\_\_\_\_ packets
    - \_\_\_\_\_; compiler must know!
  - Pad with \_\_\_\_\_ if necessary

# 4.10 MIPS Static Dual Issue (1)

---

- Two-issue packets
  - One \_\_\_\_\_ instruction
  - One \_\_\_\_\_ instruction

| Instruction type | Pipeline Stages |    |    |     |     |     |    |
|------------------|-----------------|----|----|-----|-----|-----|----|
| ALU/branch       | IF              | ID | EX | MEM | WB  |     |    |
| Load/store       | IF              | ID | EX | MEM | WB  |     |    |
| ALU/branch       |                 | IF | ID | EX  | MEM | WB  |    |
| Load/store       |                 | IF | ID | EX  | MEM | WB  |    |
| ALU/branch       |                 |    | IF | ID  | EX  | MEM | WB |
| Load/store       |                 |    | IF | ID  | EX  | MEM | WB |

# 4.10 MIPS Static Dual Issue (2)



## 4.10 Scheduling Example

---

```
Loop: lw    $t0, 0($s1)      # $t0=array element
      addu $t0, $t0, $s2      # add scalar in $s2
      sw    $t0, 0($s1)      # store result
      addi $s1, $s1,-4        # decrement pointer
      bne  $s1, $zero, Loop  # branch $s1!=0
```

|       | ALU/branch | Load/store | cycle |
|-------|------------|------------|-------|
| Loop: |            |            | 1     |
|       |            |            | 2     |
|       |            |            | 3     |
|       |            |            | 4     |

- IPC = 5/4 = 1.25 (c.f. peak IPC = 2)

## 4.10 Loop Unrolling (1)

---

Loop:

## 4.10 Loop Unrolling (2)

---

Loop:

## 4.10 Loop Unrolling (3)

---

Loop:

## 4.10 Scheduling Unrolled Loop

---

- Replicate loop body to expose more parallelism
  - Reduces loop-control overhead
- Use different registers per replication
  - Called “register renaming”

|       | ALU/branch | Load/store | cycle |
|-------|------------|------------|-------|
| Loop: |            |            | 1     |
|       |            |            | 2     |
|       |            |            | 3     |
|       |            |            | 4     |
|       |            |            | 5     |
|       |            |            | 6     |
|       |            |            | 7     |
|       |            |            | 8     |

# 4.10 Dynamically Scheduled CPU



## 4.10 Speculation

---

- \_\_\_\_\_ branch and continue \_\_\_\_\_
  - Don't \_\_\_\_\_ until branch \_\_\_\_\_ determined
- \_\_\_\_\_ speculation
  - Avoid load and \_\_\_\_\_ delay
    - Predict the \_\_\_\_\_ address
    - Predict \_\_\_\_\_ value
    - Load before completing \_\_\_\_\_ stores
    - \_\_\_\_\_ stored values to load unit
  - Don't \_\_\_\_\_ load until \_\_\_\_\_ cleared

## 4.10 Does Multiple Issue Work?

---

- Yes, but not as much as we'd like
- Programs have real \_\_\_\_\_ that limit ILP
- Some \_\_\_\_\_ are hard to \_\_\_\_\_
  - \_\_\_\_\_
- Some \_\_\_\_\_ is hard to expose
  - Limited \_\_\_\_\_ during instruction issue
- Memory delays and \_\_\_\_\_
  - Hard to keep pipelines \_\_\_\_\_
- \_\_\_\_\_ can help if done well

## 4.10 Power Efficiency

---

- Complexity of dynamic scheduling and speculations requires power
- Multiple simpler cores may be better

| Microprocessor | Year | Clock Rate | Pipeline Stages | Issue width | Out-of-order/Speculation | Cores | Power |
|----------------|------|------------|-----------------|-------------|--------------------------|-------|-------|
| i486           | 1989 | 25MHz      | 5               | 1           | No                       | 1     | 5W    |
| Pentium        | 1993 | 66MHz      | 5               | 2           | No                       | 1     | 10W   |
| Pentium Pro    | 1997 | 200MHz     | 10              | 3           | Yes                      | 1     | 29W   |
| P4 Willamette  | 2001 | 2000MHz    | 22              | 3           | Yes                      | 1     | 75W   |
| P4 Prescott    | 2004 | 3600MHz    | 31              | 3           | Yes                      | 1     | 103W  |
| Core           | 2006 | 2930MHz    | 14              | 4           | Yes                      | 2     | 75W   |
| UltraSparc III | 2003 | 1950MHz    | 14              | 4           | No                       | 1     | 90W   |
| UltraSparc T1  | 2005 | 1200MHz    | 6               | 1           | No                       | 8     | 70W   |

# 4.11 ARM Cortex A8 Pipeline

F0 F1 F2 D0 D1 D2 D3 D4 E0 E1 E2 E3 E4 E5

Branch mispredict  
penalty = 13 cycles



# 4.11 Core i7 Pipeline



## 4.15 Concluding Remarks

---

- \_\_\_\_\_ influences design of \_\_\_\_\_ and \_\_\_\_\_
- \_\_\_\_\_ and \_\_\_\_\_ influence design of \_\_\_\_\_
- Pipelining improves \_\_\_\_\_  
using parallelism
  - More \_\_\_\_\_ completed \_\_\_\_\_
  - \_\_\_\_\_ for each instruction \_\_\_\_\_
- Hazards: \_\_\_\_\_, \_\_\_\_\_, \_\_\_\_\_
- Multiple issue and dynamic scheduling (ILP)
  - \_\_\_\_\_ limit achievable parallelism
  - \_\_\_\_\_ leads to the power wall