

Total Amount

2,28,000

(Page 1)

(Q1) For processor  $x_1$  Cycle per Instructions.

$$x_1 \text{ CPI} = 1 + \frac{0.3}{\substack{\text{Branch Instinct} \\ \text{cycle stall}}} \times 2 = \underline{1.6}$$

 $x_2 \Rightarrow$  correct 90% prediction removes stall & Remaining 10%  $\rightarrow$  2 cycle stall.

$$x_2 \text{ CPI} = 1 + 0.3 \times (0.9x_0 + 0.1 \times 2) = \underline{1.06}$$

$$\text{Speed up} = \frac{x_1}{x_2} = \frac{1.6}{1.06} \approx 1.5094 \approx 1.51\%$$

(Q2) Stage Delays = 150, 120, 150, 160 and 140 nsRegister delay = 5 ns

$$\text{tck (clock cycle time)} = \max(\text{stage delays}) + t_{reg} = 160 + 5 = \underline{165 \text{ ns}}$$

∴ For 100 independent instructions:

$$T_{\text{total}} = (k-1+n) \text{ tck.} \quad \text{where } k=5 \text{ & } n=100$$

∴ for instruction 1 to complete in 5 stages 4 next instructions will be  
in pipeline in stages. So 2nd instruction requires only one cycle to complete.

$$\therefore \text{Total Cycles} = \frac{k}{\text{cycles for 1st}} + (n-1) = (n+k-1).$$

$$\therefore \text{Total time} = (5-1+100) \times 165 = 104 \times 165$$

$$\underline{\text{Total Time}} = \underline{17160 \text{ ns}}$$

(Page 2)

Q3) For Non-pipelined processor :-

$$\therefore \frac{1}{2.5 \text{ GHz}} = 0.4 \text{ ns time per cycle}$$

$$\therefore \frac{T_{\text{non}}}{\text{for instruction}} = \frac{5 \times 0.4}{\text{cycles}} = 2 \text{ ns/instruction}$$

For Pipelined Processor :-

$$\frac{1}{2.5 \text{ GHz}} = \frac{1}{2 \times 10^9 \text{ Hz}} = 0.5 \times 10^{-9} = 0.5 \text{ ns}$$

$$\begin{aligned} & \text{Average Cycle P.R. Instructions (PI)} \\ & = 1 + (0.30 \times 0.5 \times 50) + (0.10 \times 0.50 \times 2) \\ & \quad \downarrow \text{Memory instn. 5%} \quad \downarrow \text{Branch Instn. 50\%} \\ & = 1 + 0.75 + 0.10 = 1.85 \end{aligned}$$

$$\frac{T_{\text{pipelined}}}{\text{for instruction}} = 1.85 \times 0.5 = 0.925 \text{ ns/instruction}$$

$$\therefore \underline{\text{Speedup}} = \frac{2.0 \text{ ns}}{0.925 \text{ ns}} \approx \underline{2.16}$$

Q4) Total PO stage time for 100 instructions.

$$= 40 \times 3 + 35 \times 2 + 25 \times 1 = 215 \text{ cycles},$$

$\therefore \text{Total time} = \text{IF, ID, OF Before PO} + \text{WB After}$

$$\underline{\text{Total Cycles}} = 3 + 215 + 1 = 219 \text{ cycles}$$

Q5) Ans. 3rd Question  $\frac{1}{2.5 \text{ GHz}} = 0.4 \text{ ns per cycle}$

$T_{\text{non}} = 0.4 \text{ ns} \times 4 \text{ cycles} = 1.6 \text{ ns}$  for full instruction on Non-pipelined processor

For Pipelined  $\frac{1}{2.5 \text{ GHz}} = 0.5 \text{ ns per cycle}$  No stall, & 5 stages

$$T_{\text{pipe}} = 1 \times 0.5 = 0.5 \text{ ns}$$

$$\therefore \underline{\text{Speedup}} = \frac{1.6 \text{ ns}}{0.5 \text{ ns}} = \underline{3.2 \text{ times}}$$

|              |  |          |
|--------------|--|----------|
| Institute    |  |          |
| Total Amount |  | 2,28,000 |

(Q6) for old pipeline  $T_{max} = 2.2 \text{ ns} \therefore \text{clock time} = 2.2 \text{ ns}$

(P73) Before branching in EX stage there are 2 stages IF & ID which can shelter 2 bits idle if EX stalls.

As other stage have 4 CPI therefore stall will be 2 cycle per branch

$$\therefore \text{CPI for old pipeline} = 1 + \frac{0.2 \times 2}{\text{Branch instru}} \rightarrow \text{stall per instruction}$$

$$= 1.4$$

$$\therefore \text{Total pipeline} = 1.4 \times \frac{2.2 \text{ ns}}{\text{max latency}} = 3.08 \text{ ns}$$

New design  $\rightarrow$  8 stages  $\xrightarrow{\text{Latency}} 2.2/3 \text{ ns} = 0.733 \text{ ns}, 1 \text{ ns}, 1 \text{ ns}, 1 \text{ ns}, 0.75 \text{ ns}$

$$T_{new} = \frac{\text{max latency}}{\text{CPI}} = 1 \text{ ns} \rightarrow \text{Time per CPI}$$

Stages IF, ID, RF1, RF2, EX1, EX2, MEM, WB

5 stages can be idle if 6th stage EX2 fails to produce pointer for next instruction

As per stage have 1 CPI therefore ~~5~~ 5 cycle stall per branch

$$\therefore \text{CPI for new Pipeline} = 1 + \frac{0.2 \times 5}{\text{Branch instru}} \rightarrow \text{stall cycle}$$

$$\therefore Q = \frac{T_{new}}{T_{old}} = \frac{2.0 \text{ CPI}}{1.4 \text{ CPI}} = \frac{2 \text{ ns}}{1.4 \text{ ns}} = 1.54$$

(Q7) for 6 stage ~~>0~~ overhead that means 6 stages in 7 cycles (P: page)  
Note: All 6 stages takes same time for execution. (Page 4)  
 25% instructions has  $\rightarrow$  2 stall pipeline each.

We have to speed up this pipeline over non-pipelined design.

For Non-pipeline design  $\rightarrow$  6 stages per instruction  $\therefore$  6 cycles for instruction  
 $\therefore \text{Time/instruction} = 6 \times T$ , where  $T = \text{time per cycle}$

$$\text{If } N \text{ instruction the } T_{\text{Pipeline}} = \underline{\underline{N \cdot 6T}}$$

New Pipelined execution (No stall)  $\rightarrow$  Ideal case.

We will complete 1 instruction per cycle

$$\therefore \text{Time/instruction} = 1 \times T = T$$

But our case has 25% instruction  $\rightarrow$  2 stalls cycle stall ..

$$\therefore \text{CPI} = \frac{\text{Time/instruction}}{\text{Instruction}} = 1 + 0.25 \times \frac{2}{\text{Instruction}} = 1.5 \text{ CPI}$$

$$\therefore \text{Time/instruction} = \frac{1.5 \times T}{\text{CPI}} = 1.5 T$$

$$\therefore \text{For } N \text{ instruction } T_{\text{Pipeline}} = N \cdot 1.5 T$$

$$\therefore \text{Speed up} = \frac{T_{\text{Pipeline}}}{T_{\text{Non-p}}} = \frac{N \cdot 1.5 T}{N \cdot 6T} = \underline{\underline{\frac{1.5}{6}}} = \underline{\underline{\frac{1}{4}}}$$

(Q8) Now for Non-pipelined design

$$\begin{aligned} \text{Total Cycle time} &= \text{Sum of stage delay} + \text{one Reg delay} \\ &= (5+6+11+8) + 1 = \underline{\underline{31 \text{ ns}}} \text{ for one instruction} \end{aligned}$$

But for pipelined Design.

$$\text{Total Cycle time} = \text{Max stage delay} + \text{reg delay}$$

$$= 11 + 1 = \underline{\underline{12 \text{ ns}}}$$

$$\therefore \text{Speed up} = \frac{T_{\text{Total}}}{T_{\text{Non-p}}} = \frac{31}{12} = \underline{\underline{2.583}} \approx \underline{\underline{2.6}}$$

| Total Amount | 2,28,000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Q9)          | <p>IF, <math>\rightarrow</math> IP <math>\rightarrow</math> EX <math>\rightarrow</math> DUB</p> <p><math>\downarrow</math>      <math>\downarrow</math>      <math>\downarrow</math></p> <p>1 cycle    1 cycle    ADD &amp; SUB + idle</p> <p>MUL <math>\rightarrow</math> 3 cycle</p> <p>Thus MUL instruction causes <math>\rightarrow</math> stall of 2 cycle</p>                                                                                                                                                                                                                                                                                                     |
|              | (Pages)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Q10)         | <p><math>\therefore</math> Total Time</p> <p>C1 <math>\rightarrow</math> I1 IF</p> <p>C2 <math>\rightarrow</math> I2 ID   I2 IF</p> <p>C3 <math>\rightarrow</math> I3 IF   I2 ID   I2 EX (Add)</p> <p>C4 <math>\rightarrow</math> I1 IF   I3 ID   I2 EX (MUL)   I1 (WB)</p> <p>C5 &amp; C6 <math>\rightarrow</math> Stall</p> <p>C7 <math>\rightarrow</math></p> <p>C8 <math>\rightarrow</math></p> <p style="text-align: right;"> <math>\boxed{I_2 \text{ (WB)}}</math><br/> <math>\boxed{I_3 \text{ (WB)}}</math> </p> <p>4 cycles for SUB<br/>+ 2 cycles for either of instructions<br/>+ 2 cycle stall<br/><math>= 8</math> cycle total for 3 instruction.</p>      |
|              | <p>Five stages for 1 instruction takes 1 cycle</p> <p>Now clock rate <u>1 GHz</u> thus cycle time <math>= \frac{1}{1 \times 10^9 \text{ Hz}} = 1 \text{ ns}</math></p> <p>Conditional Branches <math>\approx 20\%</math> &amp; Branch Decision in Stage 3</p> <p>Therefore 2 stages stalls for Branching &amp; each stage 2 cycle thus, Stall = 2 cycle for Branch.</p> <p><math>\therefore</math> Average CPI = <math>1 + 0.2 \times 2 = 1.4</math></p> <p><math>\therefore</math> Time for <math>10^9</math> instructions <math>= \frac{1.4}{CPI} \times 10^9 \times 1 \text{ ns} = \frac{1.4 \times 10^9 \text{ ns}}{\text{cycle time}} = 1.4 \text{ sec}</math></p> |

this processor  
Ans: 2-16  
04. The instru  
Instruction F  
each for e  
cycles e  
there

hp

(Q11) for  $10^9$  instruction =  $\frac{1.4 \times 10^9}{\text{cpl instructions}} \times \frac{1.95}{\text{cycle time/instruction}}$  (Page 6)

$$= \frac{1.4 \times 10^9}{\text{cpl}} \text{ ns}$$

$$= 1.4 \text{ sec}$$

(Q11) 5 stages IF, RD, EX, MA + WB 1 cycle each stage  
 For Instruction  $\downarrow$  still 4 stages I2 cannot enter as R0 required in I2 from I1  
 only after WB stage in I1, it is usable in I2.  
 ∴ At 5th Cycle I2 IF stage enter I1 will complete WB stage completion.  
some I2 instruction will leave in Cycle 9 for WB stage I3 will enter in IF stage.  
 R0 updated in WB stage of I2 will be used in 9th stage RD in Cycle 10  
and R3 will complete in Cycle 13  
 ∴ No of Cycles Required = 13

(Q12) offset = Target - (address of next instruction)  
 Each instruction = 4 bytes offset is in byte  
 Target of Branch instruction is i  
 Suppose i has base address A  
 i+1 will be at address A+4  
 i+2 → A+8  
 i+3 → A+12  
 i+4 (next after Branch) add A+16

$\left. \begin{array}{l} \text{Target Address} = \text{Address of Next Inst}^B + \text{offset} \\ \text{Target Address} = \text{Address of Inst}^B i = A \\ \text{Address of next inst}^B = \text{Address of } i+4 = A+16 \end{array} \right\}$   
 ∴ offset = A - (A-16) = 16

(Q12) 4 stage : delays 150, 120, 160, & 140 ns. Reg delay 5 ns  
 $n = 1000$  data items  
~~Time per cycle~~ = Max delay + Reg delay = 160 + 5 = 165 ns  
 ∴ for 1000 data items  
 $\text{Total cycle} = n + (k-1) = 1000 + (4-1) = 1003 \text{ cycles}$   
 $\text{Total Time for 1000} = \frac{1003 \times 165 \text{ ns}}{1000} = 165.495 \text{ ns}$

Answer: -16

Q13: Consider a pipeline consist of 5 stages named as IF, ID, OF, EX and WB with the respective stage delays of 2 ns, 6 ns, 5 ns, 8 ns and 1 ns. The alternative pipeline 'y' contain the same number of stages but EX stage is divided into 2 substages, (EX1 and EX2) with equal delay i.e. (8 ns/2) and ID stage is divided into 3 substages (ID1, ID2 and ID3) with equal delays of (6 ns/3). In the pipeline x and y memory reference instructions are not overlapped so the penalty of memory reference instructions in the pipeline is 4 cycles and in the pipeline 'y' is 8 cycles. If the program contain 20% of the instructions which are memory based instructions, what is the ratio of speed-up of x to speed-up of y?

→ For pipeline x → cycle time  $t_x = \max = 8\text{ ns}$  for stage EX.

→  $y \rightarrow$  cycle time  $t_y = \max = 5\text{ ns}$  for stage OF

Memory ref. penalty =  $x = 4\text{ cycle/instruction}$ ,  $y = 8\text{ cycle/instruction}$ , 20% memory instr.  $\rightarrow p = 0.2$

Cycle per Instruction (CPI) with stalls  $CPI_x = 1 + p \times 4 = 1 + 0.2 \times 4 = 1.8$

$$CPI_y = 1 + p \times 8 = 1 + 0.2 \times 8 = 2.6$$

Time per instruction:-  $T_x = CPI_x \times t_x = 1.8 \times 8 = 14.4\text{ ns}$

$$T_y = 2.6 \times 5 = 13.0\text{ ns}$$

Ratio of speed up ( $x$  to  $y$ )  $\frac{S_x}{S_y} = \frac{T_y}{T_x} = \frac{13.0}{14.4} \approx 0.903$

$S \propto \frac{1}{T_{avg}}$   
Speed up is inversely proportional to execution time

$\therefore S_x : S_y \approx 0.903 : 1$  (pipeline y is  $\approx 11\%$  faster).