

Datapath:

Computer basic components

processor



memory



Inputs



Outputs

Flow of data is datapath.

→ flow from one register to another, or memory, or which step should come after which step is datapath

→ transmission of data is datapath.

Instruction have 2 parts:

1) Fetch cycle      2) Execution Cycle.

→ In an instruction set arch, ~~all~~ every ins has same FC.

→ The steps for FC is same for all instructions.

→ For eg:- In MIPS, FC for all ins is same.

→ EC will be different

→ EC depends on types of instructions.

→ EC depends on types of instructions:

In MIPS, 3 types of instructions:

1) I-type      2) R-type

(register +  
memory)

(all register  
involved)

3) J-type.

(Register or  
immediate  
constant value)

→ Due to different categories or type of ins.

Processor:

Datapath contains functional units of processor.

SAP arch, unit called Control Word present  
12-bit CW. (Only command to control it)

Datapath : 1) <sup>12 bits</sup> Holds data.

2) Operate on those data.

To hold and perform ops - we need hardware elements / data store elements.

For FC, what elements do we need?

1) Instruction memory.

Ins mem structure given.

Input - <sup>Ins</sup> address Output → address content

→ address will come from Program counter.

→ Instruction memory is a memory element.

→ Ins mem is a memory that holds ins address.

2) Program Counter: When you write a program, program lines are instructions.

They are stored in PC. PC points the ins being executed currently. PC sends current ins add to IM.

~~ress~~

18101089

PC does 2 things : 1) Holds current ins  
current ins, in that 2) After execution of  
clock cycle, ins is incremented  
to send next ins using Adder.

→ J-type ins breaks the sequence of PC. It  
jumps to another line instead of running  
program ins sequentially.

⇒ For FC, common elements are : Ins mem -

PC and adder.

4 how much  
u will increment

because in MIPS, for every  
data length you need 4 bytes.



→ Instruction fetching



\* Same for all instruc ↑

It is the hardware  
element / datapath for  
FC. And it is same  
for all ins.

\* Instruction decoding : determining fields within

the instruction.

\* Ins. execution : Perform op indicated by ins.

After decoding, ALU gets command  
from control unit / func selection unit



which func to be executed  
now, it'll be implemented thru  
CU.

There are steps of Datapath, explain by yourself  
using ins given.

Datapath has 2 elements : data hold - ①  
operand on that data - ②

EU is ALU.

how will the register/fields in ~~other~~ ins format be involved with add operation?

ALU required for all kinds of ~~operation~~ ins:  
address calc, data calc. In func selection of ALU, add & func command will be given. Then ALU performs calculations.

Ins execution : Register R/W / Mem R/W,  
func execution in ALU.

FC (Datapath) is complete now.

EO depends on type of ins.

We have seen 3 elements : PC, IM, adder.

New element : Register file for R-format ins.

2 source registers read and values sent to

ALU, then result in register write.

from register file element.

→ Write data is a command.

Reg write is a part of CU.

Reg write is a part of CU.

control bit set to 1, output data should be write

to write register

RD 1 - Content of S<sub>0</sub>

RD 2 - Content of S<sub>1</sub>

Write data - content of dest. register.

when write, moved to write register.

$$B = 7$$

$$C = 5$$

$$A = B + C$$

↓  
to

S<sub>1</sub> S<sub>2</sub>

content of to



Content of S<sub>1</sub>  
Content of S<sub>2</sub>

After calcu, data kept in Write Data. When C bit = 1  
output goes to WR (write register) otherwise not.

C bit → control bit

i Add \$t<sub>1</sub>; \$S<sub>3</sub>; \$S<sub>2</sub>;



ALU:

4 bit wide operation selector value.

$2^4 = 16$  operations in ALU.

Zero° is zero flag.

Status flag → zero flag.

ZF = 1 result is zero.

from ALU

R-type 5 elements: PC, IM, RF, ALU, Adder

Read and Decode Instruc~~n~~ IM.

which register no. is paired to RR<sub>1</sub>, RR<sub>2</sub>, WR.

## I-format Ins: Lecture 05

→ base register, source/destination.  
 lw → dest, base (read)  
 sw → source, base (write)

Datapath for memory read/write:

- Datapath consists of processor's functional units.

rs - source register (base register)

rt - source/destination

lw      \$to      |      32 (\$s<sub>0</sub>) → rs

1) for sign extending, sign extension element used.  
 → convert 16 bit → 32 bits

|    |    |    |         |
|----|----|----|---------|
| op | rs | rt | address |
| 6  | 5  | 5  | 16 bits |

→ 32 bits (convert)

16 bit value : (2)

sign extension, msb

if 0, 16 0s will be copied on left

if 1, 16 1s will be copied on left

6<sup>th</sup> element - sign extension element.

6<sup>th</sup> element - sign extension element (lw):

Datapath for load word (lw):

7<sup>th</sup> element - Data memory  
 on which address to perform  
 memory read/write.



memwrite 1 for write.  
 0 for read.

base register + immediate value  $\rightarrow$  mem address  
In ALU, rs + imm  $\rightarrow$  physical address, using PA  
we get address, from there we read data and  
~~is~~ is sent to writedata.

# II. lw \$t0, 32(\$s1);

### Datapath:

As in R-type, 8 steps were described. Here,  
the steps are not described. We have to describe  
ourselves.

- ① PC provides ins address, ins  $\uparrow$  sent to IM
- ② From IM, ins decoded and sent to Register file.

Instruction is lw \$t0, 32(\$s1);

- ③ In RF, rd, rs. ~~Send to~~ S<sub>1</sub> to rs, 32 is 16 bit value, 32 sent to SE (SignExt) converted from 16  $\rightarrow$  32 bits. RD1 and 32 bit value to ALU. (base reg value and offset value). ALU gives PA, ~~send~~ address sent DM (Data Mem).
- ④ In DM, the data or content in that address is read. The content is sent to Write data of RF (Reg File) when RegWrite is 1. In rd, to is passed after calculation.
- ⑤ Last clk cycle, Adder adds 4 bytes to current ins address and it is passed to PC.

Datapath for lw:



Fig: DP for lw

DM has 2 control units : MemW / MemR  
we should set control bit for preferred op.

Store Word:

sw \$rt , immmed ( qrs );

rt's contents will be stored in that mem's specific add.

→ Everything will be same as lw till add calcu.

→ From RF, RD2 to DM , writedata.

Here MemWrite should be active.

R + I type DP:

2 MUXs added - 2x1 MUX

1st mux → I<sub>0</sub>, I<sub>1</sub>

till IM same  
(FC)

I<sub>0</sub> → 0 (R-type) RD2

I<sub>1</sub> → 1 (I-type) SE

Type of ins  
selection:

DP for SW:- SW \$t<sub>2</sub>, 100(\$S0);  
 ↳ 10 ↳ 16 rs



I-type → lw, sw  
 elements - 7 total

R + I DP :-

1st MUX selects type of Ins.  
 1 → base + SE value → PA (memory address)  
 0 → RD2 only → ALU does R-type ins.  
 calcu according to

2nd MUX ALUSrc - 0/1

1 → Mem to Reg (I-type) # Write for I/R type?

0 → R-type

→ last MUX going to R-type result.

Branching instructions:-

DP for BEQ

BEQ falls in I-type

Since in PC sequence is broken; because of breaking the value of sign ext. is used to calcu

- branch target address using add sum.  
→ Another element used here for shift.  
→ Data path = Processor's functional  
→ Branch will not normally be unit  
given in exam.
- Jump target address of instruction.  
→ In Jump, address will be changed, and sequence  
will be broken. Because of that, 2 bit left  
shift is used and address is calculated and  
sent to PC. jump

### Single Cycle Control :-

- CU is present in ALU, Register File, MUX (control  
func is controlled), memory read/write control  
function is considered.

### ALU Control lines:-

- SAP 1 has control wires which defines  
in which block cycle what should happen.  
→ ALU has function table, func table values are  
changed to perform diff operations.  
→ Rest should reviewed by ourselves  
→ CU is added in single control, to decide  
which bit should go where.

Data path ends here.

18/10/1989

①

Quiz - 3

Answer to Question 3 a:

$$R[i] = R[i+55] + y;$$

$$\therefore i = 89$$

$$R[89] = R[144] + y; \quad \begin{matrix} \hookrightarrow S_0 \\ \hookrightarrow S_1 \end{matrix}$$

MIPS instructions

~~Access~~ ① lw \$t0, 576(\$\$S\_0);

② add \$t0, \$t0, \$\$S\_1;

③ sw \$t0, 356(\$\$S\_0);

Let  $\$S_0 = 16, S_1 = 17, t0 = F8.$



Fig: Datapath organisation

Instruction address is placed on the instruction memory from program counter. As each instruction is of 4 words, each time PC passes an address to instruction memory and it increments by 4 bytes on adder.

Instruction memory retrieves and passes the instruction corresponding to the instruction address.

#### (1) lw instruction:

As it is an I type instruction, so the number of \$ So ~~that~~ is 16, it would be passed to read register 1 of Register File. The content of So that is the base address would come out of read data 1. The ALU<sub>src</sub> MUX is set to indicate it's an I type instruction. 16 bit offset would be passed to sign extend and will be converted to 32 bit offset. Now this offset and base address will be passed to ALU and we'll get the physical address from ALU result. The physical address from address of data memory.

Now MemRead and MentoReg are set so data memory will fetch the data.

corresponding to physical address and pass it to write data in register file. RegWrite is SET and the number 'to' is 8 which is passed to write register. This is how data is fetched from memory and loaded in register.

#### (11) Add operation:

This instruction is R-type, so the ALU<sub>src</sub> is CLEAR. Now, we pass the number of the register to, that is 8 to Read register 1 and number of register 5, that is 17, to read register 2.

The content of those two registers are passed to ALU for add operation to occur. The ALU result is passed to MUX. MemtoReg is CLEAR so the result is passed to write data and RegWrite is SET so the number of to is passed to write register. The result is stored in register to.

III SW operation:

This is I-type instruction so ALU<sub>inc</sub> is SET. The number of So that is 16 is passed to Read register 2 and 16 bit offset is converted to 32 bit offset via sign extend element. The content of So that is the base address and offset is passed to ALU, ALU result returns physical address and passes to Data memory.

The number of to (8) is passed to read register 2 and content of to, that is the result comes out of read data 2 and is passed to write data section of data memory.

MemWrite and MemtoReg are set so the result is stored at calculated physical address.

Answer to Question 1b:

There are 5 basic components in MIPS single cycle data path. These are given below:

- 1) Instruction memory
- 2) Program counter
- 3) Adder
- 4) Register file
- 5) ALU

Program Counter: A register file is a collection of readable and writeable registers.

ALU: It performs the operation indicated by instruction. It takes two operands and result is calculated in ALU which is the output value.

Program Counter:

\* focus: R-type and I-type lw, sw, combined DP for I and R-type. Branching less important.

Example of DP;  $s_0 \rightarrow s_1$

$$A[10] = Y + A[12];$$

Draw and explain DP for MIPS Instruction set.

1) Required elements: (R and I)

- i) PC ii) IM iii) Adder iv) Register File
- v) ALU vi) Sign extend vii) Data Memory
- viii) Multiplexer (2:1)

2) Assembly code: lw \$t0, 48(\$s0);

Add \$t0, \$s0, \$t0;

sw \$t0, 40(\$s0);

$$s_0 = 16, t_0 = 8$$

I-R DP:



NUX1

$$s_0 = 0 = R$$

$$s_1 = 1 = I$$

$$s_0 \rightarrow \text{read } (M \text{ to } R) = I$$

$$s_1 \Rightarrow \text{write} = R_{to} = \text{ALU result to } M \text{ register} = R$$

MUX2

Operator - Adder, MUX, ALU

2<sup>nd</sup> MUX no need for sw.



### Chapter - Memory

2 types

- Primary
- Secondary

Mem Read/Write

→ Memory hierarchy

→ Locality

→ Design pattern

→ Memory mapping func

### # Memory : (Design)

Reading table and bookshelf → Cache and memory

Types of memory :

- 1) Primary
- 2) Secondary

Memory is storage area in which programs can find their running data and it contains data used by programs.

### Types of memory



Dynamic changes but static faster (SRAM used in cache)  
DRAM slower than SRAM, used in main memory  
SRAM → F-F constructed so faster.  
DRAM → Capacitor used to ready chip, so slower.

EPROM → can be programmed to a particular range by user  
(only for resource allocation, not for user programming).

EEPROM → has range, how many times you can erase and  
P, depending on version

EEPROM → others by users, this electrically. EEPROM = UV  
rays used for changing; EEPROM → electric field used for  
changing

Primary - related to processor

Memory hierarchy :

for making work easier, cliff way storage levels maintained.

→ for time saving } why? Time cost related  
→ cost } Time, speed related  
→ speed  
→ size



Fig : Level of memory

- Cache faster than main memory. Requests from CPU goes to memory levels, bigger storages will be slower due to access time, since bigger storages away from CPU <sup>any more</sup> distance in memory, adjacent levels info are exchanged. (dis and tin)
- ~~more~~ registers more costly as less quantity more required
- Access time - memory size inc, access time inc.
- Speed inc when size / capacity of memory dec.



array (data) - spatial      sum iteration →  
(instac)                                                                          temporal.  
sequence - spatial      cycle loop-temporal

→ mapping data transfer (42:50) between  
2 adjacent levels.

→ If requested block in cache memory, then data hit, otherwise data miss. If not in cache memory, then bring from main memory.  
hit time and miss time / penalty time.

Data from main mem. copied to cache, then accessed by CPU <sup>delivered to CPU processor</sup>.  
+ CPU request → <sup>6</sup>miss-time

The access time is fast because

from CPU addressed, then processor will access it  
the memory's bit plane.

of that block

→ When cache full, replaced placement  
mapping function → ① Direct map

② Set associative

③ Fully associative

all memory

chip set that error correction, with RAM then  
does byte error correction.

Memory mapping functions :-

If cache has the data requested, the data is sent  
to memory. If not, goes to main memory  
(CPU)

1. Direct mapping → There are only one place (fixed).

CPU → 12 → 12 from main memory  
↑ ↓ should be placed  
where in cache?



main memory: . . .



Eqn:

requested address mod

# of block in

cache

$$\frac{12 \text{ (mod)}}{8} \Rightarrow 4 \text{ (of cache)}$$

$$\frac{29}{8}$$

memory blocks are given  
identifiers like token called  
tag.

3 requested addresses:

10111 → Index \* both must match  
11001  
10110

2. Set associative → Block can be placed anywhere within the set.

Here, set is  $2^n$  (the power),  
2-way set associative

4-way "

8-way "

n-way "

For 2-way,

# of block in cache ÷ frequency  
How many sets (n-way)

$$\therefore \Rightarrow \frac{8}{2} = 4 \text{ sets}$$

0 1 2 3 Set

Cache: 

|   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|

replace 0 1 0 1 0 1

i) To which set?

(Requested Address) mod (# of set)

$$12 \bmod 4 = 0^{\text{th}} \text{ set}$$

→ Now, you can either place in 0/1 of 0<sup>th</sup> set

If 0 used, put in 1. If 1 used, put in 0.  
If both has data, 2 algos :

- \* Least recently used
- \* most recently used.

4 way:

i]  $8 \div 4 = 2 \text{ Sets}$

ii] # of set = 2

$12 \div 2 = 0^{\text{th}}$  set

(mod)



0                      1

If 27 is requested address,  
 $27 \bmod 2 = 1^{\text{st}}$  set

3) Fully associative  $\rightarrow$  Block can  
be placed anywhere.



⑫      Least recently  
        used

Example : Find the number of misses and hits for a cache with four 1 word blocks.

Given the following sequence of memory block access.

0, 8, 0, 0, 6, 8.

for each of the following cache configuration i) Direct mapped

ii) 2 way SA (LRU)

iii) Associative mapping

i) Direct mapping :

→ requested addresses : 0, 6, 8

→ 4 blocks of 1 word.

$$0 \bmod 4 = 0$$

$$6 \bmod 4 = 2$$

Initially cache all slot = 0

| requested address | Hit/miss | Content of cache block after each reference |   |   |   |        |
|-------------------|----------|---------------------------------------------|---|---|---|--------|
|                   |          |                                             | 0 | 1 | 2 | 3      |
| 0                 | miss     | Mem[0]                                      |   |   |   |        |
| 8                 | miss     | Mem[8]                                      |   |   |   |        |
| 0                 | miss     | Mem[0]                                      |   |   |   |        |
| 0                 | hit      | Mem[0]                                      |   |   |   |        |
| 6                 | miss     |                                             |   |   |   | Mem[6] |
| 8                 | miss     | Mem[8]                                      |   |   |   | Mem[6] |

# misses = 5   # hit = 1

## 2) 2-way SA:

Sequence : (0, 8, 0, 6, 0, 8,  
Address mod      6, 12, 5)  
No. of blocks

| requested address | Hit/miss | Content of cache after each ref |         |
|-------------------|----------|---------------------------------|---------|
|                   |          | Set 0                           | Set 1   |
| 0                 | miss     | Mem[0]                          | 0   1   |
| 8                 | miss     |                                 | Mem[8]  |
| 0                 | hit      | Mem[0]                          |         |
| 6                 | miss     | LRU                             | Mem[6]  |
| 0                 | hit      | Mem[0]                          |         |
| 8                 | miss     |                                 | Mem[8]  |
| 6                 | miss     | Mem[6]                          |         |
| 12                | miss     |                                 | Mem[12] |
| 5                 | miss     |                                 | Mem[5]  |

We have in Cache 6, 12, 5.

# miss = 7

# hit = 2

3) Fully associative (0, 8, 0, 6, 0, 8, 6, 12, 5)

| requested address | Hit/miss | ...    |       |        |         |
|-------------------|----------|--------|-------|--------|---------|
|                   |          | Set 0  | Set 1 | Set 0  | Set 1   |
| 0                 | miss     | Mem[0] |       | 0      | 1       |
| 8                 | miss     |        |       | Mem[8] |         |
| 0                 | hit      | Mem[0] |       |        |         |
| 6                 | miss     |        |       |        | Mem[6]  |
| 0                 | hit      | Mem[0] |       |        |         |
| 8                 | hit      |        |       | Mem[8] |         |
| 6                 | hit      |        |       |        | Mem[6]  |
| 12                | miss     |        |       |        | Mem[12] |
| 5                 | miss     | Mem[5] |       |        |         |

5, 8, 6, 12 ✓

# hit = 4

# miss = 5

## Memory Organisation:

Main memory = 128 words

Cache = 16 words

Block size = 4 words



$$\frac{\text{Cache size}}{\text{Block size}} = \text{Cache line}$$

$$\frac{16}{4} = 4$$

$$M, M \Rightarrow \frac{128}{4} = 32$$

$K \bmod n \rightarrow$  No. of lines in cache

↳ requested address

$$0 \bmod 4 = 0$$

$$1 \bmod 4 = 1$$

$$2 \bmod 4 = 2$$

$$3 \bmod 4 = 3$$

$$4 \bmod 4 = 0$$

$$5 \bmod 4 = 1$$



Main memory

|                |
|----------------|
| B <sub>0</sub> |
| B <sub>1</sub> |
| B <sub>2</sub> |

7  
5

|         |        |        |
|---------|--------|--------|
| 35 bits | 2 bits | 2 bits |
|---------|--------|--------|

Tagbit

↳ Block offset  
↳ Index on line cache  
on line 2 cache

When data travels, for matching  
tag bit is required.

|     |    |    |                     |
|-----|----|----|---------------------|
| 001 | 01 | 00 | } offset<br>options |
| 001 | 01 | 01 |                     |
| 001 | 01 | 10 |                     |
| 001 | 01 | 11 |                     |

Tag, index, line no, block  
must match  $\rightarrow$  offset  
data hit

physical address = 7 bits

$$2^7 = 128$$

4 lines  $\rightarrow$  2 bits ( $2^2$ )

8 lines  $\rightarrow$  3 bits ( $2^3$ )

---

$$\text{Cache line} = \frac{32}{4} = 8 \quad \begin{matrix} M.M = 64 \\ C:S = 32 \\ BS = 4 \end{matrix}$$

$$\text{M.M lines} = \frac{64}{4} = 16 \text{ blocks}$$

$$2^6 = 64$$



$$\text{total} = 6 \text{ bits}$$



M.M



Cache



$$\cancel{4 \times 4} = \frac{16 \text{ bits}}{4} \text{ per block} \quad M.M$$



Cache lines =  $2^3$   
= 8

$\neq$  Direct  
mapping



↳ cache line on cache index

If given,

110101  
13 b.n      1 b.o

110101  
1 b.o

Set associative:

51: 34

2-way SA

For last example

Cache

|                |   |       |
|----------------|---|-------|
| L <sub>0</sub> | 0 | Set 0 |
|                | 1 |       |
|                | 2 | Set 1 |
|                | 3 |       |
|                | 4 | Set 2 |
|                | 5 |       |
| L <sub>7</sub> | 6 | Set 3 |
|                | 7 |       |

$$\frac{\text{Cache lines}}{\text{set}} = \frac{8}{2} = 4 \text{ sets}$$

$= 2^2 \text{ sets}$  so,



11 01 01

→ no. of set (set 1)  
any from set 1

4 way

$$= \frac{8}{4} = 2 \text{ sets}$$

$= 2^1$



## Piplining

stage / seg IF ID EX MEM WB

In pipeline, all stage length are same.

So, you take min the stage that takes highest time for all the stages.

→ In Pip, for single ins it'll not give better performance.

The more the ins, pip performance the better.

→ The lesser CPI, the better performance.

→ Max length is the min execution time.

IF - phase Cycle

ID - Decode Ins

Ex - Ins execution

Mem - Memory

WB - Write back

$k$  (stage)     $n$  - ins

$k + (n-1)$

for MIPS, structural hazards  
of pip solved. IM & DM

⇒ Structural hazard: 2 ins →  
same hardware resource.

⇒ Control hazard: Before  
calculating result, jumps to  
another level, shows previous  
result (Conditional calcu  
problems). Level breaking

⇒ Data hazard: After decoding, registers identified  
where calcu being done.  
to being used for 2nd ins  
before updating for 1st (decode)  
Before WB, using that register  
for another ins. Wrong result

→ Before calcu address, going to  
next ins.

for SAP 'STI', for MIPs solved as  
Ins and data stored in diff  
memory.

For 'CH', stalling pipeline

IF - PC and IM part

ID - decoding, identification

Ex - Ins operation, which  
registers used, which  
type ins execution.

MEM - Memory read / write

WB - If not memory write,  
then register write.

9 Ins      MIPs - 5 stage/segment

I<sub>1</sub> to I<sub>9</sub>

| Stage              | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>6</sub> | I <sub>7</sub> | I <sub>8</sub> | I <sub>9</sub> | I <sub>10</sub> | I <sub>11</sub> | I <sub>12</sub> | I <sub>13</sub> |
|--------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|-----------------|-----------------|-----------------|-----------------|
| IF-S <sub>1</sub>  | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>6</sub> | I <sub>7</sub> | I <sub>8</sub> | I <sub>9</sub> |                 |                 |                 |                 |
| ID-S <sub>2</sub>  |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>6</sub> | I <sub>7</sub> | I <sub>8</sub> | I <sub>9</sub>  |                 |                 |                 |
| Ex-S <sub>3</sub>  |                |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>6</sub> | I <sub>7</sub> | I <sub>8</sub>  | I <sub>9</sub>  |                 |                 |
| MEM-S <sub>4</sub> |                |                |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>6</sub> | I <sub>7</sub>  | I <sub>8</sub>  | I <sub>9</sub>  |                 |
| WB-S <sub>5</sub>  |                |                |                |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>6</sub>  | I <sub>7</sub>  | I <sub>8</sub>  | I <sub>9</sub>  |

$n \rightarrow \text{Ins}$

$$\begin{aligned}
 & \nearrow \text{Pip} \\
 1 \times k + (n-1) \\
 (1 \times 5) + (9-1) \\
 5 + 8 = 13 \text{ c.c}
 \end{aligned}$$

NP  
 $(k \times n) = 5 \times 9 = 45 \text{ c.c}$

$$\text{Speedup} = \frac{\text{NP}}{P} = \frac{45}{13} = 3.46$$

times faster will be P  
technique

$$\begin{aligned}
 \text{CPI} &= \frac{\text{Pip}}{\text{No. of C.C}} \\
 &= \frac{13}{9} = 1.44
 \end{aligned}$$

For I = 1000,

$$\begin{aligned}
 \text{Pip} \quad 5 + (1000 - 1) &= \frac{1004}{1000} \\
 &= 1.004
 \end{aligned}$$

$$\text{Speed up} = \frac{5000}{9004} = 4.9$$

The more instructions, the more the speedup.

Efficiency / utilization

$$\underline{5 \times 9 (k \times n)}$$

$13 \times 5$  Total block

$$= 45 / 65$$

$$= 0.692$$