

# COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE

21/10/22 ..

Components →



Memory cells  
track sectors



Assembly language is a command based language which works on hexadecimal number system.

Mov [command] dl [address] al [value]

cut  
ADD  
SUB  
MUL  
DIV

OB E  
60 passing

Why Assembly language?

- ① Assembly language is closer to machine language.
- ② To communicate b/w high & machine language.
- ③ To increase speed/optimize results.

A. S → CS  
S.S (Embedded Programming).

Memory Management



closer to processor,  
smaller in size,  
further to processor,  
larger in size



Caches are in Kbs  
Registers are in bytes  
Storage is in GB/TB

25/10/22



Computer architecture: interconnecting hardware components in a such a logic that it'll be cost effective. • network moving technology.

### Buses

- i) Data Bus → carries data
- ii) Control Bus → data reaches its destination
- iii) Address Bus → identifies the data's address for its destination

28/10/22

### Architecture & Organisation

- └ Instruction set

Functional View.

- ① Data movement  $\rightarrow$  input & output of data.
- ② Storage  $\rightarrow$  data input first stored, retrieving old data from storage (Hard disk)
- ③ Processing from /to storage: get data from storage, process it (perform certain functions), then store it back

- ④ Processing from storage to I/O.



$$8 \text{ bits} = 1 \text{ byte}$$

4 bits      4 bits

Quad      4 bits.  
  Nibble

Dual Core Processor  
Dual Processor

core  $\rightarrow$  layer  
 $\searrow$  caches levels

① S Multitasking  
( Multiprogramming  
  IDA )

Time Division Accessibil  
/ Time Slicing  
/ Time Oriented Job.

② Multiprocessing  
( at same time )

## Moscoo's Law

$$P_n = P_0 * 2^n$$



$P_n \Rightarrow$  Power of processor in future

$P_0 \Rightarrow$  Power of processor at current time transistors act as switch

$n \Rightarrow$  number of years.

$$n = N/2$$

control temperature  
increase speed.

## Parallel Computing:

Parallel processing ] interchangeable  
parallel programming terms.

parallel tasking

Distributed computing

Cloud computing

File Distributive System

Buffering  
(temporary memory computer)

## Big Data: Hadoop

multiple types of data.

Hadoop  $\rightarrow$  Map Reduce

shortest, fastest, secure.

4/11/2022.

## Classes of Parallelism:

① BLP (Bit-Level Parallelism).

adjust acc. to processor's size

② DLP (Data Level Parallelism).

D. Packet 2KB



③ TLP (Task Level Parallelism).



## FLYNN'S TAXONOMY.

Michael Flynn divide into 4 categories (acc. to data flow / instructions)

- ① SISD (Single Instruction Stream, Single Data Stream)  
ex:-  $a = 2$  just instruction to store data.
- ② SIMD (Single Instruction Stream, Multiple Data Stream).  
ex:- result =  $2 + 4$  in assembly (add: 2, 4)
- ③ MISD (Multiple Instruction Stream, Single Data Stream).  
ex:-  $\text{int } \& x;$  pipelined (data linked)
- ④ MIMD (Multiple Instruction Stream, Multiple Data Stream).  
ex:-  $3 * (2 + 2)$   
Shared Memory  
Distributed Memory.



Mirroring  $\rightarrow$  Real time processing.

# INSTRUCTION SET ARCHITECTURE (ISA):



- Registers hold data for a time being & it also performs calculation.
- Stack is a register accumulator.

- Pointers find the address in memory.
- Stack, Accumulator.

Reg - Memory.

11/11/22.

Memory - Memory Architecture.  
Load Store Architecture.

Assignment no. 1. Solve equations for all models  
of ISA :-  $A - B * (C + D * E)$

## MEMORY STRUCTURE:

- Memory is divided into chunks.
- Every chunk / cell contains an address.
- First instruction will be on lower address in memory.
- Every cell divided into four parts.





Fetch → Decode → Execute.  
cycle

17/NOV/22

## INTEGRATED CIRCUITS (IC)

Discrete      Continuous.      Hybrid (mixed).

wafers sheet.

- modem  $\rightarrow$  modulation  
conversion of digital into analog is
- modem  $\rightarrow$  demodulation  
conversion of analog into digital is.

Mixed ICs:

analog  $\rightarrow$  alternate current  
digital  $\rightarrow$  direct current.

Cost of IC = Cost of die + Cost of testing die +  $\frac{\text{Cost of packaging}}{\text{Final test yield}}$

- Germanium / Silicon is used.

Cost of IC = Cost of die + Cost of testing die + Cost of packaging  
Final test yield.

Cost of die =  $\frac{\text{Cost of wafer}}{\text{Dies per wafer} \times \text{Die yield}}$

$$\text{Dies per wafer} = \frac{\pi \times (\text{wafer-diameter}/2)^2}{\text{Die area}} - \frac{\pi \times \text{wafer-diameter}}{\sqrt{2 \times \text{Die area}}}$$

### Example

Find the no. of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm on a side & for a die that is 1.0 cm on a side.

∴ Dies per wafer = ?

$$① \text{ area of Die} = 1.5 \times 1.5 = 2.25 \text{ cm}^2$$

$$② \text{ Dies per wafer} = \frac{\pi * (W.D/2)^2}{D.A} - \frac{\pi * W.D}{\sqrt{2 * D.A}}$$

$$D/W = \frac{3.14 * (30/2)^2}{2.25} - \frac{3.14 * 30}{\sqrt{2 * 2.25}}$$

$$D/W = \frac{3.14 * 225}{2.25} - \frac{94.26}{2.12}$$

$$D/W = 314.2 - 44.46$$

$$D/W = 269.73 \text{ die}$$

$$③ \text{ area of die} = 1 \times 1 = 1 \text{ cm}^2.$$

$$D/W = \frac{\pi * (30/2)^2}{1} - \frac{\pi * 30}{\sqrt{2 * 1}}$$

$$D/W = \pi * 225 - \pi * 30/\sqrt{2}$$

$$D/W = 640.21$$

Install DOSBOX in your PC.

How die made?

Die yield = Wafer yield  $\times \frac{1}{(1 + \text{Defects per unit area} / \text{Die area})}$

Example

Find the die yield for dies that are 1.5cm on a side & 1.0cm on a side, assuming a defect density of 0.031 per  $\text{cm}^2$  and  $N$  is 13.5.

$N \rightarrow (11.5 - 18.5)$  range for  $N$ .

$$\begin{aligned} a) \quad 1.5 \text{ cm} &\rightarrow A = 1.5 \times 1.5 \\ &A = 2.25 \text{ cm}^2 \\ b) \quad 1 \text{ cm} &\rightarrow A = 1 \times 1 \\ &A = 1 \text{ cm}^2. \end{aligned}$$

$$(a) D.Y = \frac{1 \times 1}{(1 + 0.031 \times 2.25)^{13.5}}$$

$$D.Y = 0.402$$

TP.

• Die area decreases  
die yield increases

$$(b) D.Y = \frac{1 \times 1}{(1 + 0.031 \times 1)^{13.5}}$$

$$D.Y = 0.66.$$

## DEPENDABILITY:

- Reliability
- Availability

## Module Reliability:

FIT: Failure In Time

MTTF: Mean Time To Failure

MTTR: Mean Time To Repair

MTBF: Mean Time B/w Failures

Q. A system having failure in time 1000. Calculate its mean time to failure, if the mean time to recover is 72 hours then what would be the mean availability of the system?

$$FIT = 1000 = 10^3$$

$$MTTF = ?$$

$$MTTR = 72 \text{ hrs}$$

$$M.A. = \text{mean availability} = ?$$

$$\text{MTTF} = \frac{10^9}{\text{FIT}} \xrightarrow{\text{Mean time between failures}}$$

$$MTTF = \frac{10^9}{10^3}$$

$$\boxed{MTTF = 10^6}$$

$$M.A = \frac{MTTF}{MTTF + MTTR}$$

$$M.A = \frac{10^6}{10^6 + 72}$$

$$\boxed{M.A = 0.99}$$

Q. In an organization, the system has failure in time 10000 hrs. Find its mean time to failure & if system takes 60 hours to run again, what would be the availability of the system?

$$FIT = 10000 \rightarrow 10^4$$

$$MTTR = 60 \text{ hrs}$$

$$MTTF = ? \quad M.A = ?$$

$$MTTF = \frac{FIT}{10^4} = \frac{10^9}{10^4} = \boxed{10^5}$$

$$M.A = \frac{MTTF}{MTTF + MTTR} = \frac{10^5}{10^5 + 60} = \boxed{0.99}$$

## REGISTERS:

fastest storage locations.

14 Registers.

1<sup>st</sup> register  $\rightarrow$  8 bits  $\rightarrow$  a-accumulator.



$\times 2$  Ea  $\rightarrow$  Enhanced accumulator  $\rightarrow$  16 bits

$\times 2$  Xa  $\rightarrow$  Extra Enhanced accumulator  $\rightarrow$  32 bits

$\times 2$  Ra  $\rightarrow$  Reg a  $\rightarrow$  64 bits

a  $\rightarrow$  accumulator

1. ax  $\rightarrow$  accumulator (enhanced)

2. bx  $\rightarrow$  base register

3. cx  $\rightarrow$  counter register

4. dx  $\rightarrow$  data register

G.P.R

General Purpose  
Registers.

5. CS → code segment.  
6. DS → data segment  
7. SS → stack segment  
8. ES → extra segment.

} → Segment Registers.

9. SI → index register. } source index  
10. DI → data index } destination index  
} Index registers.

11. IP → index pointer }  
12. SP → segment/stack pointer } Pointer registers  
Special.

13. F → Flag register.

14. B → base register.

G.P.R

- Accumulator: Input / Output, Operations.
- Base: Holds address of data.
- Counter: Used in loop. (e.g. PC counter)
- Data: Holds data for output.

Special

- Instruction Pointer: next instruction.
- Stack pointer: top instruction of stack.

- Flag register: Flag register holds if carry or is borrow.

- Base register: top of the stack will be considered as base.

Commands for DOS Box:

path → mount C C:\MP.

dir

→ C:

→ edit class\_1.asm (Creating file) (and to open the file)  
Save the file.

for comments:- ; ---

int → intercept.

nasm is assembler

⇒ nasm File.asm → (to assemble the file)

↳ nasm File.asm -o File.com

com - command.

executable file

⇒ afd File.com → afd is a debugger

entered in CPU's brain

0100 B8 0500 MOV AX, 0005

↓ ↓ ↓  
address Machine assembly F1  
of register Code code.

instruction one to one correspondence.

DOS's executable file is .com.

25/Nov/22

[089 0X0100]

DOS will start instructions  
from 0100.

B8 0500

B8 → is opcode for mov <sup>ax</sup> command

B8 → is opcode for mov bx

B8 0500

mov ax, 5

B8 → mov ax.

5 → 0500 — ??

why 0500, why not 0005  
 ISSUE 0500?

0005

0500

Intel ⇒ Lower → Higher

Other ⇒ Higher → Lower

LByte MByte  
↓ ↓  
Lower Higher → so, it is 5  
odd odd

Mov ax, [0x4C00] → exit command

int 0x21 → program will interrupt itself  
then OS will come & move  
to the command for exit.

### Interrupts

Hardware base &

Software base

Maskable

& Non-maskable

ignores until work  
is done.

can't ignore  
high priority → low priority

Multiple interrupts.

Saad

1st / Dec / 2022

Label / Title / Tag =

Label: amov ax, 5

dw → define word.

Address label.

Register size ⇒ word (16 bit register).

4 bytes / nibble.

any declaration after instruction has global scope.

global variables are loaded first from hard disk to RAM. It is done by Loader. after global declaration, it will move to first Program Counter's instruction & then so on.

mov ax, 5 ; hard coded → immediate operands  
mov bx, bx ; register to register  
\* mov 5, ax ; not possible

- [ ] → calling data from memory  
e.g. mov ax, [num1] ; reading data from memory (register to memory).  
• mov [num1], ax ; writing data from register into memory (register to memory)  
• mov [num1], [num2] ; reading & writing at same time (nope!) (memory to memory)

- First Code:

2-Dec-2022

[org 0x0100]

MOV ax, [num1]

MOV bx, [num2]

ADD ax, bx

MOV bx, [num1+2].

MOV bx, [num3]

ADD ax, bx

MOV bx, [num1+4]

MOV [num4], ax.

MOV [num1+6], ax

MOV ax, 0x4C00

int 0X21

Num1: dw 5

Num2: dw 8

Num3: dw 12

Num4: dw 0

{ m1 0109  
↓ RAM's Memory Address of instruction

# MEMORY HIERARCHY DESIGN



Temporal → Virtual time  
Space ] both are inversely proportional  
to each other

Speed in memory is vice versa.

→ mov [ax, 2] → operations  
OP/instruction



I → Instruction  
D → Data.

L<sub>2</sub> : U-cache

L<sub>3</sub> : U-cache

U → unified,  
defined / reserved.

1. Direct Mapping: \*\*\* 1 to 1 Mapping

bytes / block

2. Set Associative Mapping

3. Full Associative Mapping.

M.M

Direct Mapping

every block of cache will  
be mapped to M.M

Formula for calc. of mapping  
M.M Block Address

MOD (%)

No. of Blocks in Cache

(decides which cache's block will  
be mapped to which M.M's block)



② Set Associative Mapping:  
n-way set association.

2-way



4-way

S<sub>1</sub>

S<sub>2</sub>

③ Full Associative Mapping

Full cache will be mapped to a block of  
M.M.

Miss / Hit

more miss, more penalty.

$$\text{access time} = \text{miss time} + \text{penalty time}$$



RAM.



LRU  $\Rightarrow$  Least Recently Used

$$12 \text{ bits} / 8 = 4$$



8/Dec/2022



Address of M.M.B MOD No of B of cache.  
17 % 8  
= 1

## Practice Problems on Direct Mapping

### Problem #1

Consider a direct mapped cache of size 16 KB with a block size of 256 bytes. If the size of the main memory is 128 KB - then determine the bits in the tag?

Solution

$$\text{Cache size} = 16 \text{ KB}$$

$$\begin{matrix} \text{B.S/F.S/Line size} = 256 \text{ bytes} \\ \downarrow \\ \text{Block} \quad \text{Frame} \end{matrix}$$

| Tag                       | B.S | Flag. |
|---------------------------|-----|-------|
| Address of Physical Block |     |       |

$$M \cdot M \cdot S = 128 \text{ KB} \times 2^{\underline{10}} = 2^{17} \text{ bytes.}$$

$$\text{No. of Bits in tag} = ? \quad (1024)$$

No. of Bits in Physical Address:

$$\begin{aligned} M \cdot M \cdot \text{Size.} &= 128 \text{ Kbs} \times 1024 = 131072 \text{ bytes.} \\ &\Rightarrow 2^{17} \text{ Bytes} \\ &\Rightarrow 17 \text{ bits} \end{aligned}$$

131072 can be represented in 17 bits.

# of bits in block offset:

$$\begin{aligned} \text{Block size} &= 256 \text{ bytes} \\ &= 2^8 \text{ bytes} \\ &= 8 \text{ bits.} \end{aligned}$$

# of bits in a line size.

$\Rightarrow$  Cache size / line size.

$$\Rightarrow \frac{16 \text{ KB}}{256 \text{ bytes}}$$

$$\Rightarrow \frac{2^4 \text{ KB}}{2^8 \text{ B}}$$

$$\Rightarrow \frac{2^4 \times 2^{10} \text{ B}}{2^8 \text{ B}}$$

$$\Rightarrow \frac{2^{14} \text{ B}}{2^8 \text{ B}}$$

$$\Rightarrow 2^{14-8} \text{ Bytes}$$

$$\Rightarrow 2^6 \text{ Bytes} \Rightarrow 6 \text{ bits}$$

$$\begin{aligned}
 & \# \text{ of Bits in tag} \\
 &= \# \text{ of Physical address bits} - \text{line size} - \text{Block offset} \\
 &= 17 - 6 - 8 \\
 &= 3 \text{ Bits}
 \end{aligned}$$

9/Dec/22.

Problem no. 02.

Solution:

Cache size = 512 KB

Block size = 1 KB

Tag bits = 7 Bits

S.M.M. = ?

size of main memory.



# of Bits of Block offset.

$$\begin{aligned}
 \Rightarrow \text{Cache Block size} &= 1 \text{ KB} \Rightarrow 1024 \text{ bytes} \\
 &\Rightarrow 2^{10} \text{ bytes} \\
 &\Rightarrow 10 \text{ bits}
 \end{aligned}$$

# of Bits in line size.

$$\begin{aligned}
 \# \text{ of line size} &= \text{Cache size / line size.} \\
 &\Rightarrow 512 \text{ KB} / 1 \text{ KB.}
 \end{aligned}$$

$$\begin{aligned}
 &\Rightarrow 512 \times 2^9 \text{ lines} \\
 &\Rightarrow 9 \text{ Bits.}
 \end{aligned}$$

# of Bits in Physical address.

$$\begin{aligned}
 \Rightarrow \text{Tag bits} + \text{L-size Bits} + \text{B.offset bits} \\
 \Rightarrow 7 + 9 + 10. \\
 \Rightarrow 26 \text{ bits.}
 \end{aligned}$$

SMM

= # of Bits in Physical address.  
26 bits.

$$2^{26} \Rightarrow 2^{10} \text{ kb} \Rightarrow 2^6 \text{ mb} = \boxed{64 \text{ mb}}$$

Problem # 03.

A direct mapped cache with block size of 4KB.

The size of main memory is 16 GB & there are 10 bits in the tag. Find.

- ① Size of cache memory.

Solution

Cache Block size = 4kb

Size of main memory = 16GB.

Tag's bits = 10 bits

size of cache memory =

| Tag | Index / Line size | Block offset |
|-----|-------------------|--------------|
| 10  | ↓ 12 bits         | ↓ 12         |

# of bits in Block offset

$$\Rightarrow \text{cache block size} = 4\text{KB} = 4 \times 1024 \text{ bytes}$$

$$\Rightarrow 2^2 \times 2^{10}$$

$$\Rightarrow 2^{12}$$

$$\Rightarrow 12 \text{ bits}$$

# of bits in physical address.

Size of M.M. = 16 GB.

$$\Rightarrow 2^{10} \times 2^{10} \times 2^{10} \times 2^4$$

$$\Rightarrow 2^{34} \text{ bytes}$$

$$\Rightarrow 34 \text{ bits}$$

Cache memory size

$$= \# \text{ of Bits of line size} = 12 \text{ bits.}$$

$$\Rightarrow \text{Cache Block size} = 2^{12} \text{ bytes. times}$$

$$= 4096 \text{ bytes.}$$

$$= 4 \text{ KB}$$

Cache memory size = Total no. of lines in cache

$\times$  Line/Block size/Frame

$$= 2^{12} \times 4 \text{ KB}$$

$$= 2^{12} \times 2^2 \times 2^{10}$$

$$= 2^{24} \text{ Bytes}$$

$$= 2^{14} \text{ Kb}$$

$$= 2^4 \text{ Mb}$$

$$= \boxed{16 \text{ Mb}}$$

Consider a two way set associative mapped cache of size 16 KB with the block size of 256 bytes. If the size of the main memory is 128 KB then determine the bits in tag?

Solution

Set size = 2.

Cache size = 16 KB

Line / Block size = 256 bytes

Main memory size = 128 KB

# of bits in tag = ?

|                      |     |           |
|----------------------|-----|-----------|
| 17 bits in Phy. Add. |     |           |
| Tag                  | L/S | B. offset |

4. 5 8

1. # of bits in physical address:

$$\text{M.M size} = 128 \text{ KB}$$

$$= 2^7 \times 2^{10} \text{ bytes}$$

$$= 2^{17} \text{ bytes}$$

$$= 17 \text{ bits}$$

2. No. of bits in block offset:

$$\text{Line size} = 256 \text{ bytes} = 2^8 \text{ bytes}$$

$$= 8 \text{ bits}$$

3. No. of lines in cache:

$$= \text{Cache size} / \text{Block size}$$

$$= \frac{16 \text{ KB}}{256 \text{ bytes}} = \frac{2^4 \times 2^{10} \text{ bytes}}{2^8 \text{ bytes}} = \frac{2^{14}}{2^8} = \frac{2^6}{2^8} \text{ lines}$$

$$= 64 \text{ lines}$$

4. No. of sets in cache:

$$\text{Set size} = 2$$

$$\frac{\text{No. of lines}}{\text{Set size}} = \frac{64}{2} = 32 \text{ sets}$$

$$= 2^5 \text{ sets}$$

$$= 5 \text{ bits}$$

5. No. of Tag bits:

$$= \text{P.A bits} - \text{L.S bits}$$

$$- \text{B.O bits}$$

$$= 17 - 5 - 8$$

$$= 4 \text{ bits}$$

Let a 8-way set-associative mapped cache of size 512 KB with a block size of 1 KB. If 7 bits are used in tag - then determine the size of main memory?

Solution:

Set = 8

Cache size = 512 KB

Block size = 1 KB | Tag bits = 7.

Size of MM = ?

|                 |     |     |
|-----------------|-----|-----|
| 23 bits in P.A. |     |     |
| Tag             | L.S | B.O |

7. 6 10.

1. No of bits in block offset:

$$\text{Block / line size} = 1 \text{ KB} = 2^{10} \text{ bytes}$$

10 bits

2. No. of bits in lines

$$\frac{\text{Cache size}}{\text{line size}} = \frac{512 \text{ KB}}{1 \text{ KB}} = \frac{2^9 \times 2^10 \text{ bytes}}{2^{10} \text{ bytes}} = 2^9 \text{ lines}$$

3. No. of sets.

$$\frac{512}{8} = 64 = 2^6 \text{ sets.}$$

6 bits

4. Size of No. of bits in P.A.

$$= \text{Tag bits} + \text{L.S bits} + \text{B.O bits}$$

$$= 7 + 6 + 10.$$

$$= 23 \text{ bits.}$$

5. Size of main memory.

23 bits.

$$2^{23} \text{ bytes} \Rightarrow 2^{13} \text{ KB} \Rightarrow 2^3 \text{ MB}$$

$$\Rightarrow \boxed{8 \text{ MB}}$$

Consider a 4-way set associative mapped cache with a block size of 4KB. If the size of main memory is 16 GB and 10 bits are used in tag then determine the size of the caches?

Solution

$$\text{Set} = 4$$

$$\text{Block size} = 4 \text{ KB}$$

$$\text{Size of M.M} = 16 \text{ GB}$$

$$\text{Tag bits} = 10 \text{ bits}$$

$$\text{Size of Cache} = ?$$

34 bits in P.A.

| Tag | L.S | B.O |
|-----|-----|-----|
| 10  | 12  | 12  |

# of bits in block offset:

$$\text{Block size} = 4 \text{ KB} = 2^2 \times 2^{10} = 2^{12} \text{ bytes}$$

$$\Rightarrow 12 \text{ bits}$$

# of bits in physical address:

$$\text{Size of M.M} = 16 \text{ GB} = 2^4 \times 2^{10} \times 2^{10} \times 2^{10} = 2^{34} \text{ bytes}$$

$$\Rightarrow 34 \text{ bits}$$

$$\# 2^{12} \text{ bytes} \times 2^2 \text{ lines.} = 2^{14} \text{ lines.}$$

$$\begin{aligned} 2^{14} \text{ bytes} \times 2^2 \text{ KB} &\Rightarrow 2^4 \times 2^2 \times 2^{10} \times 2^{10} \\ \text{sets} &\Rightarrow 2^6 \text{ MB} \\ &\Rightarrow 64 \text{ MB} \end{aligned}$$

12 bits in line size mean ~~sets~~ sets. = 12 bits

$$\text{No. of sets in cache} = 2^{12} \Rightarrow 12 \text{ bits}$$

Since cache is 4-way mapped each set contain 4 lines.  
Hence, no. of lines in cache  $\Rightarrow 2^{12} \times 4$  lines

$$2^{12} \times 2^2$$

$$2^{14} \text{ lines.}$$

Size of cache memory  $\Rightarrow$  Total lines  $\times$  Block size.

$$2^{14} \times 4 \text{ KB}$$

$$2^{14} \times 2^2 \text{ KB Bytes}$$

$$2^{16} \text{ KB}$$

$$2^{16} \text{ MB} = 64 \text{ MB}$$

PAPERWORK

Problem - 07.

Consider a direct mapped cache with 8 cache block (0-7). If the memory block requests are in the following order. 3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 12, 63, 5, 82, 17, 24. Which of the following memory blocks will not be in the cache at the end of the sequence?

1) 3

2) 18 ✓

3) 20

4) 30.

Also, calculate the hit and miss ratio.

| C.M |            |   |             |            |  |  |  |  |
|-----|------------|---|-------------|------------|--|--|--|--|
| L-0 | 8 0 16 24  | * | 3 % 8 = 3-M | 17 % 8 = 1 |  |  |  |  |
| L-1 | 2 27 28 17 |   | 5 % 8 = 5-M | 25 % 8 = 1 |  |  |  |  |
| L-2 | 22 48 28 2 |   | 2 % 8 = 2-M | 18 % 8 = 2 |  |  |  |  |
| L-3 | 3          | * | 0 % 8 = 0-M | 30 % 8 = 6 |  |  |  |  |
| L-4 | 20         |   | 0 % 8 = 0-M | 24 % 8 = 0 |  |  |  |  |
| L-5 | 5          | * | 6 % 8 = 6   | 2 % 8 = 2  |  |  |  |  |
| L-6 | 18 30      |   | 3 % 8 = 3-H | 3 % 8 = 3  |  |  |  |  |
| L-7 | 63         |   | 9 % 8 = 1   | 5 % 8 = 5  |  |  |  |  |
|     |            |   | 16 % 8 = 0  | 82 % 8 = 2 |  |  |  |  |
|     |            |   | 20 % 8 = 4  | 17 % 8 = 1 |  |  |  |  |
|     |            |   |             | 24 % 8 = 0 |  |  |  |  |

$$\text{Hit Rate} = \frac{3}{21}$$

Hit : Miss

$$\text{Miss Rate} = \frac{17}{21}$$

$$\frac{3}{21} : \frac{17}{21}$$

### Problem: 8

Consider a fully-associative mapped cache with 8 cache blocks (0-7). The memory block requests are in order.

4, 3, 1, 2, 5, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7

If LRU replacement policy is used, which cache block will have memory block 7?

Also calculate hit ratio and miss ratio. (17) blocks

Least Recently Used - LRU.

C.M

$\rightarrow *$  → HIT

|     |      |   |
|-----|------|---|
| L-0 | 4 45 |   |
| L-1 | 3 22 |   |
| L-2 | 25   | * |
| L-3 | 8    | * |
| L-4 | 19 3 |   |
| L-5 | 6 7  |   |
| L-6 | 16   | * |
| L-7 | 35   |   |

Line 5 will have memory block 7.

$$\text{Hit rate} = 5/17$$

hit : miss

$$\text{Miss rate} = 12/17$$

$5/17 : 12/17$

### Problem: 9

Consider a 4-way set associative mapping with 16 cache blocks

Problem: 9

Consider a 4-way set associative mapping with 16 cache blocks. The memory block requests are in order-

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 185.

If LRU replacement policy is used, which cache block will not be present in the cache?

1. 3
2. 8
3. 129
4. 216. ✓

17 blocks.

\* → HIT

|      | CM      |
|------|---------|
| L-0  | 0 48    |
| L-1  | 4 32    |
| L-2  | 8       |
| L-3  | 216 92  |
| L-4  | 1       |
| L-5  | 133     |
| L-6  | 129     |
| L-7  | 73      |
| L-8  |         |
| L-9  |         |
| L-10 |         |
| L-11 |         |
| L-12 | 255 155 |
| L-13 | 3       |
| L-14 | 159     |
| L-15 | 63      |

## Assignment.

### Problem-16

Q1. Consider a cache has 4 blocks for the following memory references

5, 12, 13, 17, 4, 12, 13, 17, 2, 13, 19, 13, 43, 1, 19

What is the hit ratio for the following cache replacement algorithms, and miss ratio.

1. FIFO

2. LRU

3. Direct mapping

4. 2-way set associative mapping with LRU

Q2. A hierarchical memory system has the following specification, 20MB main storage with access time of 300ns, 256 bytes cache with access time of 50ns, word size of 4 bytes, page size 8 words. What will be the hit ratio if the page address trace of a program has the pattern 0, 1, 2, 3, 0, 1, 2, 4 following LRU page replacement technique?

Q3. Consider an array A [100] and each element occupies 4 words. A 32 word cache is used and divided into 8 word blocks. What is the hit ratio after the following code.

for (i=0; i < 100; i++)

A[i] = A[i] + 10;

DONE

Q<sup>2</sup> A main processor, processing 20 instructions of a processor. Each instruction requires 2.5 ns to get executed. Calculate the size of the main process, time taken to complete the process in time required for each instruction.

Solution

$$\textcircled{1} \text{ Size} = \boxed{20 \text{ Instructions}}$$

$$\textcircled{2} \text{ Time} = \boxed{2.5 \times 20 = 50 \text{ ns}} \text{ for whole process}$$

$$\textcircled{3} \text{ Time} = \boxed{2.5 \text{ ns / each instruction}}$$

$$\textcircled{4} \text{ Waiting time for each instruction no. 3.} = 2.5 \times 2 = \boxed{5 \text{ ns}}$$

Q<sup>2</sup> Using above numerical and interrupt occurs at 4th instruction of the main process, the time slot for this interrupt is 10 ns. If the size of the interrupt is 8 instructions then determine the time required for each instruction of interrupt, waiting time of the main process instruction after the interrupt, waiting time for the main process, total time required to complete the whole process.

Solution

$$\text{M.P. size} = 20 \text{ inst.}$$

$$\text{Time for each inst.} = 2.5 \text{ ns.}$$

$$\text{Time for entire process} = 50 \text{ ns.}$$

$$\text{size of interrupt} = 8 \text{ inst.}$$

$$\text{Time required for entire interrupt} = 10 \text{ ns.}$$

$$\text{Time required for each instruction of interrupt} = \frac{10}{8} = \boxed{1.25 \text{ ns}}$$

$$\text{Waiting time of main processor} = \text{Time for entire interrupt} = \boxed{10 \text{ ns}}$$

$$\text{Waiting time of 5th instruction of main process} = (4 \times 2.5) + 10 = \boxed{20 \text{ ns}}$$

$$\text{Total time of main process to get completed} = 10 + 50 = \boxed{60 \text{ ns}}$$



Q.3. Using the same numerical in the main process, two interrupts occur at a time at instruction no. 6. Interrupt 1 can be completed in 10 ns with 1.25 ns for each instruction, interrupt 2 consists of 5 instructions, each instruction requires 1.5 ns. Interrupt 2 is non-maskable.

Solution

$$M \cdot P \text{ size} = 20 \text{ inst.}$$

$$T \text{ of } M \cdot P = 2.5 \times 20 = 50 \text{ ns}$$

$$T \text{ of each inst.} = 2.5 \text{ ns}$$

Interrupt-2 ( $I_2$ )

$$\text{Size of } I_2 = 5 \text{ inst.}$$

$$T. \text{ of } I_2 = 5 \times 1.5 = 7.5 \text{ ns}$$

$$T. \text{ of each inst. of } I_2 = 1.5 \text{ ns}$$



$$\text{Waiting time of Inst. 7} = (6 \times 2.5) + 7.5 = 22.5 \text{ ns}$$

Interrupt-1 ( $I_1$ )

$$\text{Size of } I_1 = 10 / 1.25 = 8 \text{ inst.}$$

$$T. \text{ of } I_1 = 10 \text{ ns.}$$

$$T. \text{ of each inst. of } I_1 = 1.25 \text{ ns}$$

$$\text{Waiting time for } I_1 = 14 \text{ inst.} \times 2.5 = 35 \text{ ns}$$

$$\text{Waiting time of } M \cdot P = 10 + 7.5 = 17.5 \text{ ns}$$

$$\text{Total time to complete the } M \cdot P = 50 + 17.5 = 67.5 \text{ ns.}$$

Q.1: A processor processes different jobs in which it execute a process of 20 instructions.

Solution



Size of M.P = 20 inst.

Time of M.P = 50 ns.

Time for each inst. of M.P =  $20/50 = 2.5 \text{ ns}$

Size of I<sub>1</sub> = 7 inst.

Time for each inst. of I<sub>1</sub> =  $1.5 \text{ ns}$

Time for I<sub>1</sub> =  $7 \times 1.5 = 10.5 \text{ ns}$

Size of I<sub>3</sub> = 5 inst.

Time for each inst. of I<sub>3</sub> =  $1 \text{ ns}$

Time for I<sub>3</sub> =  $5 \times 1 = 5 \text{ ns}$

Time for I<sub>2</sub> = 8 ns.

Time for each inst. of I<sub>2</sub> =  $1 \text{ ns}$ .

Size of I<sub>2</sub> =  $8 \text{ inst.}$

Waiting time of M.P =  $10.5 + 5 = 15.5 \text{ ns}$ .  
wst. quesn.

Total completing time =  $50 + 15.5 + 8 = 73.5 \text{ ns}$

Waiting time for I<sub>2</sub> =  $(15 \times 2.5) + 5 = 42.5 \text{ ns}$ .

# Pipelining:

- Series / sequential processing.
- Pipelining increases throughput.
- Dependable on tasks.
- Throughput  
number of tasks  
executed in a unit  
time.

|                | IF | ID | EX | MEM | WB | XX |
|----------------|----|----|----|-----|----|----|
| I <sub>1</sub> |    |    |    |     |    |    |
| I <sub>2</sub> |    |    |    |     |    |    |
| I <sub>3</sub> |    |    |    |     |    |    |
| I <sub>4</sub> |    |    |    |     |    |    |
| ⋮              |    |    |    |     |    |    |
| I <sub>n</sub> |    |    |    |     |    |    |

Every instruction has to pass through all 5 stages, but MEM is optional.

- Instruction Cache  
- Data Cache.



$$\text{Speed up (S)} = \frac{\text{Non-pipelined execution time}}{\text{Pipelined execution time}}$$

$$\text{Efficiency (\eta)} = \frac{\text{Speed up}}{\text{No. of stages in pipelined Architecture}}$$

$$\text{Efficiency (\eta)} = \frac{\text{No. of boxes utilized in phase-time diagram}}{\text{Total no. of boxes in phase-time diagram}}$$

$$\text{Throughput} = \frac{\text{No. of instructions executed}}{\text{Total time taken.}}$$

### Problem.01.

Consider a pipelined system having four phases/stage with a duration 60, 50, 90 and 80 ns. If the latch delay is 10ns then calculate pipeline cycle time, non-pipeline execution time, speed up ration, pipeline time for 1000 tasks, sequential time for 1000 tasks and throughput.

Solution



#### ① Pipeline cycle time:

$$P.C.T = \text{Max}(\Phi_1, \Phi_2, \Phi_3, \Phi_4) + \text{Delay time.}$$

$$\text{''} = \text{Max}(60, 50, 90, 80) + 10.$$

$$\text{''} = 90 + 10$$

$$P.C.T = 100 \text{ ns}$$

$\because$  Delay time is constant due to pipelining

#### ② Non-pipeline execution time..

$$n\bar{P}_T = \sum_{i=0}^n \Phi_{N_i}$$

$$= \Phi_1 + \Phi_2 + \Phi_3 + \Phi_4$$

$$= 60 + 50 + 90 + 80$$

$$n\bar{P}_T = 280 \text{ ns}$$

$$③ \text{ Speed up} = \frac{n\bar{P}_T}{P_T} = \frac{280}{100} = 2.8$$

#### ④ Pipeline Time for 1000 tasks..

= Time taken for 1<sup>st</sup> task + Time taken for remaining 999 tasks.

$$= 1 \times 4 \times 100 + 999 \times 100$$

$$= 100300 \text{ ns}$$

#### ⑤ Sequential time for 1000 tasks.

= Total tasks  $\times$  Time taken for 1 task ( $n\bar{P}_T$ )

$$= 1000 \times 280$$

$$= 280000 \text{ ns}$$

- Q 4 instructions are executed in a 4 stages pipelining.  
 Assume that each instruction requires different stage delay (in terms of number of clock cycles) as mentioned in the given table.

|                | IF | ID | EX | WB |     |
|----------------|----|----|----|----|-----|
| I <sub>1</sub> | 1  | 3  | 2  | 1  | → 7 |
| I <sub>2</sub> | 2  | 2  | 3  | 1  | → 8 |
| I <sub>3</sub> | 1  | 1  | 1  | 1  | → 4 |
| I <sub>4</sub> | 2  | 1  | 1  | 2  | → 6 |

How many clock cycles will be required to complete these 4 instructions in the given pipelining system.

|    | I <sub>1</sub> | I <sub>2</sub>                                                    | I <sub>3</sub>                                                    | I <sub>4</sub>                  |                |
|----|----------------|-------------------------------------------------------------------|-------------------------------------------------------------------|---------------------------------|----------------|
| IF | I <sub>1</sub> | I <sub>2</sub>                                                    | I <sub>3</sub>                                                    | I <sub>4</sub>                  |                |
| ID |                | I <sub>1</sub> , I <sub>2</sub> , I <sub>3</sub> , I <sub>4</sub> | I <sub>2</sub> , I <sub>3</sub> , I <sub>4</sub>                  | I <sub>3</sub> , I <sub>4</sub> |                |
| EX |                |                                                                   | I <sub>1</sub> , I <sub>2</sub> , I <sub>3</sub> , I <sub>4</sub> | I <sub>2</sub> , I <sub>3</sub> | I <sub>4</sub> |

$$P_T = \underline{13 \text{ cycles}}$$

$$\eta P_T = 7 + 8 + 4 + 6 = \underline{\underline{25 \text{ cycles}}}$$

$$\text{Speed up} = \frac{\eta P_T}{P_T} = \frac{25}{13} = \underline{\underline{1.92}}$$

⑥ Throughput: No. of instructions executed  
 Total time taken

$$\approx \sqrt{1000 / 100.300}$$

$$\approx 0.00997$$

5/01/23

Q) A 4 stage pipeline system has the stage delays as, 250, 120, 660, 340 ns respectively. Registers are used between the stages and have a delay of 5ns each.

Assuming constant clock rate the total time taken to process 1000 data items on the pipeline will be



Solution

① Pipeline cycle time

$$P.C.T = \text{Max}(250, 120, 660, 340) + 5$$

$$P.C.T = 660 + 5$$

$$P.C.T = 665 \text{ ns.}$$

② Non-pipeline execution time.

$$n P_T = \sum_{i=0}^n P_{T,i}$$

$$n P_T = 250 + 120 + 660 + 340$$

$$n P_T = 1370 \text{ ns.}$$

Time taken for 1st data item + Time taken for 999

$$1 \times 4 \times 665 + 999 \times 665$$

Data Items,

$$2660 + 664335$$

Time taken to process 666995 ns.

1000 data items.

$$\text{Speed up} = \frac{n P_T}{P_P} = \frac{1370}{665} \checkmark$$

## Assignment no. 2.

5/01/23

- Q1: The stage delays in a 4-stage pipeline are 800, 500, 400, 300 ps (pico seconds). The first stage is replaced with a functionality equivalent design involving two stages with respective delays 600 and 350 ps. The throughput increase of the pipeline is 33.33 %.

Solution :

- Q2: A non-pipelined single cycle processor operating at 100 MHz. is converted into synchronous pipelined processor with 5 stages requiring 2.5 ns, 1.5 ns, 2 ns, 1.5 ns and 2.5 ns respectively. The delay of the latches is 0.5 seconds. The speed up of the pipeline processor for the large number of instruction is 3.33

- Q Consider a 4-stage pipeline processor. The number of cycles needed by the 4 instructions  $I_1, I_2, I_3, I_4$  in stages  $S_1, S_2, S_3, S_4$  is shown below.

|       | $S_1$ | $S_2$ | $S_3$ | $S_4$ |
|-------|-------|-------|-------|-------|
| $I_1$ | 2     | 1     | 1     | 1     |
| $I_2$ | 1     | 3     | 2     | 2     |
| $I_3$ | 2     | 1     | 1     | 3     |
| $I_4$ | 1     | 2     | 2     | 3     |

What is the number of cycles needed to execute the following loop?

for ( $i=1$  to 2) { $I_1, I_2, I_3, I_4$ }

Solution

|       | $t_1$ | $t_2$ | $t_3$ | $t_4$ | $t_5$ | $t_6$ | $t_7$ | $t_8$ | $t_9$ | $t_{10}$ | $t_{11}$ | $t_{12}$ | $t_{13}$ | $t_{14}$ | $t_{15}$ |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|----------|----------|----------|----------|----------|
| $S_1$ |       |       |       |       |       |       |       |       |       |          |          |          |          |          |          |
| $S_2$ |       |       |       |       |       |       |       |       |       |          |          |          |          |          |          |
| $S_3$ |       |       |       |       |       |       |       |       |       |          |          |          |          |          |          |
| $S_4$ |       |       |       |       |       |       |       |       |       |          |          |          |          |          |          |

|       | $t_1$          | $t_2$          | $t_3$          | $t_4$          | $t_5$          | $t_6$          | $t_7$          | $t_8$          | $t_9$          | $t_{10}$       | $t_{11}$       | $t_{12}$       | $t_{13}$       | $t_{14}$       | $t_{15}$       | $t_{16}$       | $t_{17}$       | $t_{18}$       | $t_{19}$ |
|-------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------|
| $S_1$ | I <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>2</sub> | I <sub>4</sub> |                |                |                |                |                |                |          |
| $S_2$ |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>4</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>4</sub> |                |                |                |          |
| $S_3$ |                |                | I <sub>1</sub> |                |                | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>1</sub> |                | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>4</sub> |                |          |
| $S_4$ |                |                |                | I <sub>1</sub> |                |                | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>3</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>4</sub> | I <sub>4</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> |          |

|       | $t_{20}$       | $t_{21}$       | $t_{22}$       | $t_{23}$       |
|-------|----------------|----------------|----------------|----------------|
| $S_4$ | I <sub>3</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>4</sub> |

$$P_T = 23 \text{ cycles.}$$

MP



6/01/23

size of MP = 10 inst.

Time / inst = 2.5 ns

Time for M.P =  $10 \times 2.5 = 25$  ns

at instruction 4 we have 2 interrupts - Interrupt 2 is non-maskable. based on 6 instructions - The total time required to process this interrupt is 8 ns. Interrupt 1 requires 4 ns to process. Each instruction may be executed with 1.5 ns. When we were attempting interrupt 2, interrupt 3 occurs with 3 instructions off bus whole.

I<sub>2</sub>

Size of I<sub>2</sub> = 6 inst.

Total time for I<sub>2</sub> = 8 ns.

Time for each inst =  $\frac{8}{6} = \frac{4}{3}$  ns = 1.3 ns  
Waiting time for I<sub>2</sub> = 0 ns

I<sub>3</sub> size of I<sub>3</sub> = 3 inst

Total time for I<sub>3</sub> = 6 ns

Time for each inst =  $6/3 = 2$  ns

Waiting time for I<sub>3</sub> = 0 ns since we don't know  
when it happened

Total time to complete I<sub>2</sub> = 8 + 6 = 14 ns

Wait Time for inst 7 = (6 × 2.5) + 8 + 6 = 29 ns

Total time for I<sub>1</sub> = 4 ns

Time for each inst = 1.5 ns

Size of I<sub>1</sub> = 4 / 1.5 = 2.6 ns,  $\approx$  3 ns

Waiting time for I<sub>1</sub> = (6 × 2.5) + 8 + 6 = 29 ns

Waiting time of M·P = 14 ns

Total time of completion of M·P = 25 + 8 + 6 + 4 = 43 ns

### Pipelining Problem:

- Q. Consider a pipeline processor with 4 stages - IF, ID, EX and WB. If the IF and ID and WB stages take 1 clock cycle each to complete the operation. The no. of clock cycle for the execution stage EX depends on the instruction. The ADD ADD and SUB instruction needs 1 clock cycle and the MUL instruction needs 3 clock cycles in the execution stage, operand forwarding is used in the pipelined processor. What is the number of clock cycle is taken to complete the following sequence of instructions?

### Solution

|    | t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> | t <sub>4</sub> | t <sub>5</sub> | t <sub>6</sub> | t <sub>7</sub> | t <sub>8</sub> | t <sub>9</sub> | t <sub>10</sub> |     |
|----|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|-----------------|-----|
| IF | ADD            | MUL            | SUB            |                |                |                |                |                |                |                 | ADD |
| ID |                | ADD            | MUL            | SUB            |                |                |                |                |                |                 | MUL |
| EX |                |                | ADD            | MUL            | MUL            | MUL            | SUB            |                |                |                 | SUB |
| WB |                |                |                | ADD            | -              | -              | MUL            | SUB            |                |                 |     |

Total clock cycles = P<sub>T</sub> = 8

Non-pipeline = nP<sub>T</sub> = 14

$$\text{Speed up} = \frac{P_T}{nP_T} = \frac{8}{14} = 1.75$$

## PIPELINE HAZARDS:

- Structural Hazards  $\rightarrow$  Architectural
- Data Hazards  $\rightarrow$  Cannot save multiple instructions to single data
- Control Hazards  $\rightarrow$  empty space between instructions.

### STRUCTURAL:-

- If MIPS had only one memory was used to.
  - Fetch instruction
  - Load / Store data

## DATA DEPENDENCE & HAZARDS.

$$a = 2 + 1$$

$$b \neq a + 1$$

### Data Dependence

Add S<sub>0</sub>, T<sub>1</sub> (write in S<sub>0</sub> in cycle 5)

data dependency

Sub T<sub>2</sub>, S<sub>0</sub>, T<sub>3</sub> (use S<sub>0</sub> as input in cycle 5)

$\Rightarrow$  data is dependent; solution is to insert nops/bubble so on; that cycle processor is unused for that instruction, however it is time taking but it is a solution.

13/01/2023

$$T_n = n * t_n$$

$$T_k = (k * n - 1) t_p$$

T<sub>n</sub> = Total time / clock cycle  
for non-pipelined

T<sub>p</sub> = Total time for pipeline

k = no. of stages

n = no. of tasks

t<sub>p</sub> = time / clock cycle

for pipeline.

the time required  
to complete  
tasks.

- Q. Determine the no. of clock cycles that it takes to process 200 stages tasks in a six segment pipeline.

Solve

$$k = 6$$

$$n = 200$$

no. of clock cycles = ?

$$\text{no. of clock cycles} \Rightarrow T_p = (6 + 200 - 1) \text{ clock cycles}$$

$$T_p = 205 \text{ clock cycles.}$$

- Q<sub>2</sub>: A non-pipelined system takes 50ns to process a task. The same task can be processed in a six segment pipeline with a clock cycle of 10ns. Determine the speed up ratio of a pipeline for 100 tasks. What is the maximum speed up that can be achieved?

Solve

$$t_n = 50 \text{ ns}$$

$$K = 6$$

$$t_p = 10 \text{ ns}$$

$$n = 100$$

$$\text{Speed up} = \frac{\text{Non-pipelined exec. time}}{\text{Pipelined exec. time}}$$

$$T_n = 100 \times 5 = 500 \text{ ns}$$

$$T_p = (6 + 100 - 1) + 0 = 1050.$$

$$\text{Speed up} = \frac{5000}{1050} = 4.76$$

20/01/2023

ILP  $\rightarrow$  Instruction Level Parallelism  $\therefore$  Khud paheo.  
 Multiple operation / instructions can be executed in parallel.

2 approaches to exploit ILP.

$\rightarrow$  Hardware approach  $\rightarrow$  dynamically.

$\rightarrow$  Software approach  $\rightarrow$  statically at compile time.

① Hardware approach: An approach that relies on hardware to help discover & exploit parallelism dynamically.

② Software approach: An approach that relies on software technology to find parallelism, statically at compiletime.  
 • software is hardware based

① Data Dependency

Data Dependence.

② Name Dependency

③ Control Dependency

How to measure dependency? in instructions?

$\rightarrow$  Memory locations

$\rightarrow$  Registers