

# A Typical Datapath



# Example: MIPS R3000



Programmable storage

$2^{32} \times \text{bytes}$

31 x 32-bit GPRs (R0=0)

32 x 32-bit FP regs (paired DP)

HI, LO, PC

Data types ?

Format ?

Addressing Modes?

Arithmetic logical

Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,

AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI

SLL, SRL, SRÁ, SLLV, SRLV, SRAV

Memory Access

LB, LBU, LH, LHU, LW, LWL, LWR

SB, SH, SW, SWL, SWR

Control

J, JAL, JR, JALR

BEq, BNE, BLEZ, BGTZ, BLTZ, BGEZ, BLTZAL, BGEZAL

Where is mul / div

Why so many variants

32-bit instructions on word boundary



# The Memory Abstraction

- Association of <name, value> pairs
  - typically named as byte addresses
  - often values aligned on multiples of size

Size of ( )

Char a[100]  
\_\_\_\_\_

\_\_\_\_\_

Struct S

int a;

char b;

} myData;

my Data xyz [100]

, , ,

command (R/W)

address (name)

data (W)

data (R)



# Levels of the Memory Hierarchy

Capacity  
Access Time  
Cost

CPU Registers  
100s Bytes  
 $\ll 1\text{ ns}$

Cache  
10s-100s K Bytes  
 $\sim 1\text{ ns}$   
 $\$1/\text{MByte}$

Main Memory  
M Bytes  
100ns- 300ns  
 $\$ < 1/\text{MByte}$

Disk  
10s G Bytes, 10 ms  
(10,000,000 ns)  
 $\$0.001/\text{MByte}$

Tape  
infinite  
sec-min  
 $\$0.0014/\text{MByte}$



- Principle of Locality:
  - Program access a relatively small portion of the address space at any instant of time.
- Two Different Types of Locality:
  - Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse)
  - Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access)
- Last 30 years, HW relied on locality for speed



# Processor performance equation

$$\text{CPU time} = \frac{\text{Seconds}}{\text{Program}} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Cycles}}{\text{Instruction}} \times \frac{\text{Seconds}}{\text{Cycle}}$$



|              | Inst Count | CPI | Clock Rate                                      |
|--------------|------------|-----|-------------------------------------------------|
| Program      | X          |     | mem [x]                                         |
| Compiler     | X          | (X) | R1 op [+/-]                                     |
| Inst. Set.   | X          | X   | mem [y]                                         |
| Organization |            | X   | Ld R2 mem(x)<br>add R3, R2, R1<br>St R3, mem(y) |
| Technology   |            | X   |                                                 |



# Pipelined Instruction Execution



# Introduce Pipeline Registers





## *Pipelining is not quite that easy!*

- Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
  - Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)
  - Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
  - Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).



# One Memory Port/Structural Hazards





# Speed Up Equation for Pipelining

$$CPI_{\text{pipelined}} = \text{Ideal CPI} + \text{Average Stall cycles per Inst}$$

$$\text{Speedup} = \frac{\text{Ideal CPI} \times \text{Pipeline depth}}{\text{Ideal CPI} + \text{Pipeline stall CPI}} \times \frac{\text{Cycle Time}_{\text{unpipelined}}}{\text{Cycle Time}_{\text{pipelined}}}$$

For simple RISC pipeline, CPI = 1:

$$\text{Speedup} = \frac{\text{Pipeline depth}}{1 + \text{Pipeline stall CPI}} \times \frac{\text{Cycle Time}_{\text{unpipelined}}}{\text{Cycle Time}_{\text{pipelined}}}$$



# Data Hazard on R1

Time (clock cycles)

I  
n  
s  
t  
r.  
o  
r  
d  
e  
r

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF ID/RF

EX

MEM

WB





## Three Generic Data Hazards

- Read After Write (RAW)

Instr<sub>J</sub> tries to read operand before Instr<sub>I</sub> writes it

 I: add r1,r2,r3  
J: sub r4,r1,r3

- Caused by a “Dependence” (in compiler nomenclature)  
This hazard results from an actual need for communication.



# Three Generic Data Hazards

- Write After Read (WAR)

Instr<sub>J</sub> writes operand before Instr<sub>I</sub> reads it

 I: sub r4, r1, r3  
J: add r1, r2, r3  
K: mul r6, r1, r7

- Called an “anti-dependence” by compiler writers.  
This results from reuse of the name “r1”.
- Can’t happen in simple RISC 5 stage pipeline because:
  - All instructions take 5 stages, and
  - Reads are always in stage 2, and
  - Writes are always in stage 5

# Three Generic Data Hazards

2

- Write After Write (WAW)  
Instr, writes operand before Instr, writes it.

I: sub r1,r4,r3  
J: add r1,r2,r3  
K: mul r6,r1,r7

- Called an “output dependence” by compiler writers  
This also results from the reuse of name “r1”.
- Can’t happen in Simple 5 stage pipeline because:
  - All instructions take 5 stages, and
  - Writes are always in stage 5
- Will see WAR and WAW in more complicated pipes



# Forwarding to Avoid Data Hazard

