



# COMPUTER SCIENCE

## Computer Organization and Architecture

### Instruction Pipelining

Lecture\_01



Vijay Agarwal sir





TOPICS  
TO BE  
COVERED

o1

## Pipelining Concepts

## CPU Time Calculation

Cycle

Cycle Time

CPI

MIPS

## CPU Time Calculations

- ❑ CPU Time Calculations Program Execution Time.
- ❑ Program Execution time is calculated based on the clock.
- ❑ Processor contain clock pins & these clock pin is externally connected with the clock generator.
- ❑ (or)
- ❑ So in the computer system all the operation are controlled by the clock, so CPU contain clock pins which is externally connected with clock generator.

- ❑ Clock Generator is operating with a constant frequency to generate the clock pulse [clock signal].
- ❑ These clock signals are carried into the CPU through (with the help of) Clock pin. So CPU operation are controlled by the clock signal.



- Program E.T (Execution Time) is calculated based on 2 factor
  - 1) Cycle
  - 2) Cycle Time

1) Cycle: Cycle is defined as clock pulse transition either from rising edge to rising edge (or) falling edge to falling edge.



2) Cycle Time: The time required to transfer the pulse either from rising edge to rising edge or falling edge to falling edge is called as cycle time.

Cycle time depends on the clock frequency.

$$\text{Cycle time} \propto \frac{1}{\text{Clock frequency}}$$

$$\text{Cycle time} \propto \frac{1}{\text{Clock frequency}}$$

Example: 1 GHz clock is used.

$$\text{Cycle time} = \frac{1}{1\text{GHz}} \text{ sec}$$

$$= \frac{1}{10^9} \text{ sec}$$

$$= 10^{-9} \text{ sec}$$

$$= 1\text{ns} \text{ (nanosecond)}$$

# CPU Time Calculation / Program ET

CPU Time means program Execution Time.

Program Execution time = **# Seconds / Program**

$$= \# \text{ Instruction} / \text{program} \times \# \text{Cycle} / \text{Instruction} \times \# \text{Second} / \text{cycle}$$

↓  
Instruction  
Count

↓  
Cycle per  
Instruction  
(CPI)

↓  
Cycle time

**Prog. ET / CPU time = IC × CPI × cycle time**

- Program is a combination of Data transfer, Data manipulation, & Transfer of control (TOC) Instruction. Different Instruction takes (consume) different cycle to complete the execution so,

i = Instruction type

$$\frac{\text{Prog. E.T}}{\text{CPU Time}} = [\sum (IC_i \times CPI_i)] \text{ Cycle time}$$

i: Type of instruction

$$\left( \sum (IC_i \times CPI_i) \right) \times \text{Cycle time}$$

Let  $CPI_i$  be the number of cycles required for instruction type  $i$ , and  $I_i$  be the number of executed instructions of type  $i$  for a given program. Then we can calculate an overall(Average) CPI as follows:

$$\text{CPI}_{(\text{Avg})} = \frac{\sum_{i=1}^n (CPI_i \times I_i)}{I_c}$$

@  $\frac{950}{950} = 8.36 \text{ cycle}$

The processor time  $T$  needed to execute a given program can be expressed as  $T = I_c \times CPI \times \tau$  (Where,  $\tau = 1/f$ , &  $I_c$  instruction count)

A common measure of performance for a processor is the rate at which instructions are executed, expressed as millions of instructions per second (MIPS) referred to as the MIPS rate. We can express the MIPS rate in terms of the clock rate and CPI as follows:

$$\text{MIPS rate} = \frac{I_c}{T \times 10^6} = \frac{f}{CPI \times 10^6} \text{ MIPS}$$

**Q.**

Common Data for Question 1, 2 & 3

Consider a 1.5 GHZ clock frequency processor used to execute the following program segment.

P  
W

| Instruction type | Instruction count | CPI |
|------------------|-------------------|-----|
| LOAD             | 300               | 11  |
| STORE            | 200               | 9   |
| ARITHMATIC       | 250               | 7   |
| SHIFT            | 150               | 6   |
| BRANCH           | 50                | 4   |

Q. 1

What is Average Instruction Execution of the program?

Q. 2

What is the MIPS Rate of a program?

Q. 3

What is the total Program ET?

**Solution for Question 1**

$$\text{Average Instruction ET} = \frac{\# \text{ of cycle/prog}}{\# \text{ of Instruction /prog}}$$
$$\Rightarrow \frac{(300 \times 11) + (200 \times 9) + (250 \times 7) + (150 \times 6) + (50 \times 4)}{950}$$
$$\Rightarrow \frac{3300 + 1800 + 1750 + 900 + 200}{950} = \underline{\underline{8.36 \text{ Cycle}}}$$

$$\text{Cycle Time} = \frac{1}{\text{clock frequency}} = \frac{1}{1.5 \text{ Ghz}} \Rightarrow \frac{1}{1.5} \text{ nsec} \Rightarrow \underline{\underline{0.66 \text{ nsec}}}$$

$$\text{Average Instruction ET} = 8.36 \times 0.66 \text{ nsec}$$

$$\text{Average Instruction ET} = \underline{\underline{5.51 \text{ nsec}}}$$

## Solution for Question 2



MIPS

$$\text{Average Instruction ET} = \underline{5.51 \times 10^{-9} \text{ sec}}$$

In 1 second → # Instruction

$$= \frac{1}{5.5 \times 10^{-9}} \text{ Instruction / second}$$

$$= \frac{1}{5.51} \times 10^9 \text{ Instruction / second}$$

$$= \boxed{0.1814 \times 10^9} \text{ Instruction / second}$$

$$= 181.4 \times 10^6 \text{ Instruction / second}$$

$$= \underline{\underline{181.4 \text{ MIPS}}}$$

$$\frac{F}{CPI_{avg} \times 10^6}$$

$$= \frac{1.5 \times 10^9}{8.36 \times 10^6} \text{ MIPS}$$

**Solution for Question 3**

Total Program ET = # Instruction / Program × Average Instruction ET

$$\begin{aligned} &= \underline{\underline{950}} \times \underline{\underline{5.51}} \text{ nsec} \\ &= 5234.5 \text{ nsec} \end{aligned}$$

Program ET = 5.234 nsec

Q.

Consider the execution of a program that results in the execution of 2 million instructions on a 400-MHz processor. The program consists of four major types of instruction. The instructions mix and the CPI for each instruction type are given below, based on the result of a program trace experiment:

| Instruction Type                   | CPI | Instruction Mix (%) |
|------------------------------------|-----|---------------------|
| ① Arithmetic and logic             | 1   | <u>60%</u>          |
| ② load/store with cache hit        | 2   | 18%                 |
| ③ Branch                           | 4   | 12%                 |
| ④ Memory reference with cache miss | 8   | 10%                 |

The average CPI when the program is executed on a uniprocessor with the above trace results is  $\text{CPI} = \underline{0.6 + (2 \times 0.18) + (4 \times 0.12) + (8 \times 0.1)} = 2.24$ . The corresponding MIPS rate is  $(400 \times 10^6) / (2.24 \times 10^6) \approx 178$

$$1 \times 60 + 2 \times 0.18 + 4 \times 0.12 + 8 \times 10$$

OR

$$CPI_{Avg} = 2.24 \text{ cycle}$$

$$MDPS = \frac{F}{CPI \times 10^6} \Rightarrow \frac{400 \times 10^6}{2.24 \times 10^6} \approx 178$$

$$CPT = 2.24 \text{ cycle}$$

$$\text{Cycle time} = \frac{1}{400 \text{ MHz}} \text{ sec}$$

$$\text{Length ET} = \text{---}$$

$$\text{In 1 sec} \text{ ---}$$

$$\underline{\underline{10^6 IPS}}$$

# Pipelining Strategy

Similar to the use of  
An assembly line in a  
Manufacturing plant

To apply this concept  
To instruction  
Execution we must  
Recognize that an  
Instruction has a  
Number of stages



New inputs are  
Accepted at one end  
Before previously  
Accepted inputs  
Appear as output at  
The other end

- Pipelining is a mechanism which is used to improve the performance of the system in which task (Instruction) are executed in overlapping manner.
- Pipelining is a decomposition technique that means the problem is divided into sub problem & Assign the sub problem to the pipes then operate the pipe under the same clock.

- Pipelining means Accepting New Input at one end before the previously accepted ip appear as an output at other end.
- That means New Input are executed with OLD Input in a Overlapping Manner (in a pipelining)

Non Pipeline : Accepting New Input Only After

Completion of OLD Input

OR

Accepting New Input After Previously Accepted Input appears as output at other end.

In Non Pipelining Non overlapping execution.

$I_1, I_2, I_3, I_4$



open

4 Instn ( $I_1, I_2, I_3, I_4$ ) .



Let us consider 4 segment pipeline used to execute 4 instruction the execution sequence as:

Segment/stages =  $[S_1 \ S_2 \ S_3 \ S_4]$

Instruction:  $[I_1 \ I_2 \ I_3 \ I_4]$



$n = 4, t_n = 4$ , Non pipeline

Non-PIPELINE



PIPELINE

 $k = 4$  $n = 4$ **PIPELINE**

Let us consider 4 segment pipeline used to execute 4 instruction the execution sequence as: PIPELINING

Segment/stages =  $[S_1 \ S_2 \ S_3 \ S_4]$

Instruction:  $[I_1 \ I_2 \ I_3 \ I_4]$



$$n = 4, t_n = 4, \text{ Non pipeline}$$



PIPELINE

$k = 4$

$n = 4$

# PIPELINE Design

Pipe has 2 end

Input end  
Output end

Between the these 2 end Multiple pipes are interconnected to functioning of Pipeline.





- These Pipes are called Stage / Segment.
- Between the Stages 'Buffer' are used to Store the Intermediate Result.
- This Buffer is also called as Pipeline Register or Interface Register or Latch or Buffer.
- All the Stage along with the Buffer are Controlled by Common clock.

Let us consider 4 segment pipeline used to execute 4 instruction the execution sequence as: PIPELINING

Segment/stages =  $[S_1 \ S_2 \ S_3 \ S_4]$

Instruction:  $[I_1 \ I_2 \ I_3 \ I_4]$

O/P end



$n = 4, t_n = 4$ , Non pipeline

$$t_n = 4 \text{ cycle}$$

$$n = 4$$

$$ET_{\text{NONPIPE}} = n \cdot t_n \Rightarrow 4 \times 4 = 16 \text{ cycle}$$

$$ET_{\text{NONPIPE}} = n \cdot t_n \Rightarrow 4 \times 4$$

$$ET_{\text{NONPIPE}} = 16 \text{ cycle}$$

$$ET_{PIPELINE} = \left[ k + (n-1) \right] t_p$$

$$= [4 + (4-1)] \times L$$

$$ET_{PIPE} = 7 \text{ cycle}$$

$$k = 4$$

$$n = 4$$

$$t_p = 1 \text{ cycle}$$



PIPELINE

$k = 4$

$n = 4$

$$ET_{PIPE} = (k + (n - 1))tp$$

$$= (4 + (4 - 1) \times 1 \text{ cycle})$$

$$ET_{PIPE} = 7 \text{ cycle}$$

$$tp = 1$$

$$k = 4$$

$$k * tp + (n - 1) tp$$

$$ET_{PIPE} = [k + (n - 1)] tp$$

# PIPELINE



Execution time in Pipeline:

$$ET_{PIPE} = \underline{k * t_p} + \underline{(n-1) t_p}$$

$k$ : #Stages [Segment]

$$ET_{PIPE} = \underline{[k + (n-1)] \text{ cycle}}$$

$n$ : #Task or  
#Instruction

$$ET_{PIPE} = \boxed{[k + (n-1)] t_p}$$

$t_p$ : Cycle time in Pipeline

Each Stage Delay  
in Pipeline

## Execution Time in Non Pipeline.

In Non Pipeline 1 Instructions takes  $t_n$  time.

Time Taken to execute n Inst<sup>n</sup> in Non Pipeline.

$$ET_{NONPIPELINE} = \underline{n t_n}$$

n: #Inst<sup>n</sup> or # Task

$t_n$ : each Inst<sup>n</sup> ET in Non Pipeline.

## PIPELINE Performance Evaluation



Now consider the case where a k-segment pipeline with a clock cycle time  $t_p$ , is used to execute n tasks.

The first task  $T_1$  requires a time equal to  $kt_p$ , to complete its operation since there are k segments in the pipe.

The remaining  $n - 1$  tasks emerge from the pipe at the rate of one task per clock cycle and they will be completed after a time equal to  $(n - 1)t_p$ .

Therefore, to complete n tasks using a k-segment pipeline requires  $k + (n - 1)$  clock cycles.

$K=4$

$n=6$

$$\begin{aligned} ET_{PIPE} &= (K + (n-1)) t_p \\ &= (4 + (6-1)) \times 1 \text{ cycle} \end{aligned}$$

$$ET_{PIPE} = 9 \text{ cycle}$$

Avg

$t_p = 1 \text{ cycle}$

### Non Pipeline

$t_n : 4 \text{ cycle}$

$$ET_{NONPIPE} = n t_n$$

$$= 6 \times 4$$

$$= \underline{\underline{24 \text{ cycle}}}$$

## Space-time diagram for pipeline.

Segment:

|   | 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     |  |
|---|-------|-------|-------|-------|-------|-------|-------|-------|-------|--|
| 1 | $T_1$ | $T_2$ | $T_3$ | $T_4$ | $T_5$ | $T_6$ |       |       |       |  |
| 2 |       | $T_1$ | $T_2$ | $T_3$ | $T_4$ | $T_5$ | $T_6$ |       |       |  |
| 3 |       |       | $T_1$ | $T_2$ | $T_3$ | $T_4$ | $T_5$ | $T_6$ |       |  |
| 4 |       |       |       | $T_1$ | $T_2$ | $T_3$ | $T_4$ | $T_5$ | $T_6$ |  |

Clock  
Cycles

For example, the diagram of Fig. 9-4 shows four segments and six tasks. The time required to complete all the operations is  $4 + (6 - 1) = 9$  clock cycles, as indicated in the diagram.

So, to complete  $n$  tasks using a  $k$ -segment pipeline is:

$$ET_{\text{pipe}} = k + (n - 1) \text{ clock cycles}$$

$$ET_{\text{PIPE}} = [k + (n - 1)] t_p$$

$k$ : # Stages or # Segment

$n$ : # Instruction or # Task

$t_p$ : Each Stage Delay in Pipeline

So, to complete n tasks in Non-pipeline is:

$$\text{ET Non-pipeline} = n * t_n$$

$n$ : # Instruction @ # Task

$t_n$ : Each Instn (one Instn) ET  
in Non Pipeline.

# The Performance Gain of Pipeline over Non Pipeline

Performance Gain = Performance of PIPE  
[Speedup Factor]                          Performance of Non Pipe

$$= \frac{\frac{1}{ET_{PIPE}}}{\frac{1}{ET_{NONPIPE}}}$$

Performance of  $\frac{1}{ET}$

$$S = \frac{ET_{NONPIPE}}{ET_{PIPE}}$$

$$S = \frac{n \cdot t_n}{(k + (n-1)) \cdot t_p}$$

RAM: 10 Hours  
SHYAM: 5 Hours

SHYAM Performance Better.

$$(i) \quad k=4 \\ n=4$$

$$S = \frac{n \cdot tn}{[k + (n-1)] \cdot tp} \\ = \frac{4 \cdot tn}{7 \cdot tp} \\ = \frac{4}{7} \cdot \frac{tn}{tp} \\ = 0.56 \cdot \frac{tn}{tp}$$

$$k=4 \\ n=100$$

$$S = \frac{n \cdot tn}{[k + (n-1)] \cdot tp} \\ S = \frac{100 \cdot tn}{103 \cdot tp}$$

$$k=4 \\ n=1000$$

$$S = \frac{n \cdot tn}{[k + (n-1)] \cdot tp} \\ = \frac{1000 \cdot tn}{1003 \cdot tp}$$

$$k=4 \\ n=10000$$

$$S = \frac{n \cdot tn}{[k + (n-1)] \cdot tp} \\ = \frac{10000 \cdot tn}{10003 \cdot tp}$$

$(k + (n-1)) \approx n$   
 When very large No.  
 Instn Executed

$$S = \frac{tn}{tp}$$

$$S = \frac{nt_n}{(k+n-1)t_p}$$

As the number of tasks increases,  $n$  becomes much larger than  $k - 1$ , and  $k + n - 1$  approaches the value of  $n$ . Under this condition, the speedup becomes

When Very Large of  
Instruction executed →

⑥

$n$  Value is Not given

⑦

In Ideal Case.

$$S = \frac{t_n}{t_p}$$

When Very large Number of Task  
are executed then

$$(k+(n-1)) \approx n$$

$$\therefore S = \frac{n}{\frac{n+t_n}{(k+n-1)} t_p}$$

# SUMMARY Till Now.

$$ET_{PIPE} = [k + (n - 1)] t_p$$

$$ET_{NONPIPE} = n t_n$$

$$\text{SPEED UP FACTOR } [S] = \frac{ET_{NP}}{ET_p}$$

$$S = \frac{n t_n}{[k + (n - 1)] t_p}$$

$$\eta = \frac{S}{K}$$

$\eta$ : efficiency

$S$ : speedup factor ,  $K$  : # stage.

When  $n$  value is Not given or  
 In ideal Case or When very  
 Large No. of Task Executed  
 then

$$S = \frac{t_n}{t_p}$$

@eg

ETPIPE for L Instn

ETNONPIPE for L Instn



$$ET_{PIPE} = [k + (n-1)]t_p$$
$$= [4 + (1-1)] \times 2 \text{ ns}$$

$$ET_{PIPE} = 8 \text{ ns}$$

Ans

$$ET_{NONPIPE} = n t_n$$

$$t_n = 2 + 2 + 2 + 2$$

$$t_n = 8 \text{ nsec.}$$

$$ET_{NP} = L \times 8 \text{ ns}$$

$$ET_{NP} = 8 \text{ ns}$$

Ans

Ideal Case

P  
W

When in Pipeline all Stages are Perfectly balanced

then L Task ET in Non PIPE LINE

$$ET_{NP} = K * t_p$$

Clock

So

K = 4 Stage

$t_p = 2 \text{ cycle}$   
 $\text{or}$   
1 cycle



$k = 4$  (#Stages)

$tp = 2$  cycle

$$\begin{aligned}tn &= \underline{2} + \underline{2} + \underline{2} + \underline{2} \\&= \cancel{\textcircled{k}} \times \frac{tp}{2} \\&= 4 \times 2 \\&= 8\end{aligned}$$

$$tn = k * tp$$

this

But Only in One Condition when Pipeline are Perfectly balanced.

$k = 4$  Stage

$tp = 1$  cycle

$$\begin{aligned}tn &= 1 + 1 + 1 + 1 \\&= \cancel{(k)} \times \frac{(tp)}{1}\end{aligned}$$

$$tn = k * tp$$

In Ideal Case

$$S = \frac{ET_{NP}}{ET_P}$$

$$S = \frac{tn}{tp}$$

(when  $n$  is Not given or  
in Ideal Case)

$$S = \frac{k * tp}{tp}$$

In Ideal Case

$$S = k$$

When perfectly  
Balance

In OS

Throughput : # Processes Completed Per time Unit

---

In CN

throughput =  $R_{bb} \times B.W.$

---

Throughput : Rate of output.

Throughput: Number of Task are completed  
per unit of Time

for n Instn  
 $E_{PIPE} = [k + (n-1)] t_p$  time

i.e. In  $\underbrace{[k + (n-1)] t_p}$  Time  $\Rightarrow$  n Instruction @ Task executed

So in 1 Unit [per unit] job time =  $\frac{n}{[k + (n-1)] t_p}$

Throughput  $[T_p] = \frac{n}{[k + (n-1)] t_p}$

## In Pipeline

When Number of Instrn [Task] are Not given **Or**  
In Ideal Case.

Number of Task are Completed Per unit of time <sup>(1 unit)</sup>.

$$\text{Throughput} = \frac{1}{t_p}$$

## Again Summary till Now

$$ET_{PIPE} = [k + (n-1)] t_p$$

$$ET_{Non PIPE} = n \cdot t_n$$

$$\text{SPEED Factor } [S] = \frac{n \cdot t_n}{[k + (n-1)] t_p}$$

When  $n$  is very large  
②  $n$  value Not given  
② Ideal Case

$$S = \frac{t_n}{t_p}$$

$$\text{Throughput} = \frac{n}{[k + (n-1)] t_p}$$

When  $n$  value Not given  
or in Ideal Case

$$\text{Throughput} = \frac{1}{t_p}$$

$$\eta = \frac{S}{K}$$

$\eta$  = efficiency

S: Speedup @ Performance GAIN

K: # Stages @ # segment

n: # of Instructions @ # of Tasks.

$t_n$ : Each Instruction ET in Non Pipeline

$t_p$ : Each Stage Delay in Pipeline

## Type of PIPELINE

- ① UNIFORM Delay
  - Linear Pipeline
  - Non Linear Pipeline
- ② Non Uniform Delay

UNIFORM Delay Pipeline : In Uniform Delay each Stage taking the same amount of time to complete the assigned Task to perform task.



# UNIFORM Delay

Task ET in Pipeline:

$$ET_{PIPE} = [k + (n-1)] t_p$$

$$k=4, n=1$$

$$t_p: [\text{Stage Delay}] = 2 \text{ ns}$$

$$\begin{aligned} ET_{PIPE} &= [k + (n-1)] t_p \\ &= [4 + (1-1)] \times 2 \text{ ns} \end{aligned}$$

$$ET_{PIPE} = 8 \text{ ns. } \underline{\text{Ans}}$$

1 Task ET in Non Pipeline

$$ET_{NONPIPE} = n t_n$$

$$n=1$$

$$t_n = 2+2+2+2$$

$$t_n = 8 \text{ ns}$$

$$\begin{aligned} ET_{NP} &= n \times t_n \\ &= 1 \times 8 \end{aligned}$$

$$ET_{NONPIPE} = 8 \text{ nsec}$$

1 Task ET in PIPELINE  $\equiv$  1 Task ET in Non PIPE.

In Pipeline .

If Buffer Delay is given in the Question

✓

$$t_p = \text{Stage Delay} + \text{Buffer Delay}$$

Assume Buffer Delay = 1 ns

$$t_p = 2+1$$

$$t_p = 3 \text{ nsec}$$

# Non UNIFORM Delay PIPELINE

In Non Uniform  
Delay Pipeline

Each Stages taking Different amount of time to complete the assigned operation.



# Non uniform Delay PIPELINE

$n=1, k=4$

1 Task ET in PIPELINE

$$ET_{PIPE} = [k + (n-1)] t_p$$

$$t_p = \max [\text{Stage Delay}]$$

$$= \max [2, 8, 4, 2]$$

$$t_p = 8 \text{nsec}$$

$$ET_{PIPE} = [k + (n-1)] t_p$$
$$= [4 + (1-1)] \times 8 \text{nsec}$$

for 1 Task

$$ET_{PIPE} = 32 \text{nsec}$$

1 Task ET in Non Pipeline

$n=1$

$$ET_{NONPIPE} = n t_n$$

$$t_n = 2 + 8 + 4 + 2$$

$$t_n = 16 \text{nsec}$$

$$ET_{NP} = n t_n$$

for 1 Task =  $1 \times 16$

$$ET_{NONPIPE} = 16 \text{nsec}$$

∴ Buffer Delay is given in the Question

$$t_p = \max \left[ (\text{Stage} + \text{Buffer}) \text{Delay} \right] \text{Delay}$$

## Non Uniform Delay

$T_f \quad n = 1000$  [for 1000 Instruction].

$$ET_{PIPE} = [k + (n-1)] t_p$$

$k=4, \quad n=1000$

$t_p: \max(\text{Stage Delay}) = \max(2, 8, 4, 2)$

$t_p = 8 \text{nsec}$

$$\begin{aligned} ET_{PIPE} &= [k + (n-1)] t_p \\ &= [4 + (1000-1)] \times 8 \\ &= 1003 \times 8 \end{aligned}$$

$ET_{PIPE} = 8024 \text{nsec}$

For 1000 Instn in Non Pipeline

$$ET_{NP} = n \times t_n$$

$t_n = 2 + 8 + 4 + 2 = t_n = 16$

$ET_{NP} = 1000 \times 16$

$$ET_{NON PIPE} = 16000 \text{nsec.}$$

## Important Points

① When Pipeline are Perfectly balanced [Uniform Delay] then  
1 Task [Inst<sup>n</sup>] ET in Pipeline is equal to L Task ET in Non Pipeline

② When Pipeline are Not Perfectly balanced [Non Uniform Delay]  
then One Task ET in Pipeline is greater than L Task ET in Non Pipeline.

only for  
1 Task

$$T_1 \geq T_2$$

1 Task ET  
in Pipeline

1 Task ET  
in Non Pipeline

- ③ But When the Number of Instruction (Task) increased  
then PIPELINE Performance is Best in  
Non Uniform & Uniform Delay Pipeline.
- ④ Bubble Delay is included Only in Pipeline. Because in  
Non Pipeline we are Not Storing the Intermediate Result.  
So Only in PIPELINE Bubble Delay include if given in Question.

Non Uniform  
Delay  $\rightarrow$

$$t_p = \text{Stage Delay} + \text{Bubble Delay}$$

for  
Non Uniform  
Delay

$$t_p = \text{Maximum} \left[ (\text{Stage Delay} + \text{Bubble Delay}) \right]$$

Q. 1 Consider on instruction pipeline which has speed up factor 20 while operate with 80% efficiency. What could be the number of stages in the pipeline?

**Q. 2**

4 segment pipeline have the respective stage delay of 10ns, 15ns, 20ns and 30ns. What is the efficiency of the pipeline when very large number of task are executed?

Q. 3 A 4 segment instruction pipeline has the respective stage delay of 8ns, 11ns 20ns, 2ns respectively. The interface register are used between the stages have a delay of 2ns. What is the approx. speed up factor when very large number of task are pipelined?

**Q. 4**

Consider 4 stage pipeline with the respective stage delay of 20ns, 80ns, 50ns and 10ns. In the enhancement process of the pipeline largest stage is split into 2 equal stage delay.

- (i) What is speed up between new and old design?
- (ii) What is the clock frequency of new pipeline?

Q. 1 A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be

[GATE-2004: 2 Marks]

- A 120.4 microseconds
- B 160.5 microseconds
- C 165.5 microseconds
- D 590.0 microseconds

**Q. 2** We have two designs D1 and D2 for a synchronous pipeline processor. D1 has 5 pipeline stages with execution times of 3 nsec, 2 usec, 4 nsec, 2 nsec and 3 nsec while the design D2 has 8 pipeline stages each with 2 nsec execution time.

How much time can be saved using design D2 over design D1 for executing 100 instructions? **[GATE-2005: 2 Marks]**

- A** 214 nseconds
- B** 202 nseconds
- C** 86 nseconds
- D** -200 nsecond

Q. 3

A nonpipelined single cycle processor operating at 100 MHz is converted into a synchronous pipelined processor with five stages requiring 2.5 nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latches is 0.5 nsec. The speedup of the pipeline processor for a large number of instructions is [GATE-2008: 2 Marks]

- A 4.5
- B 4.0
- C 3.33
- D 3.0

Q. 4

Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure



What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

[GATE-2011: 2 Marks]

- A 4.0
- B 2.5
- C 1.1
- D 3.0

A five-stage pipeline has stage delays of 150,120,150,160 and 140 nanoseconds. The registers that are used between the pipeline stages have a delay of 5 nanoseconds each.

The total time to execute 100 independent instructions on this pipeline, assuming there are no pipeline stalls, is \_\_\_\_\_ nanoseconds.

[GATE-2021(Set-1)-CS: 2M]

Consider a 3-stage pipelined processor having a delay of 10 ns (nanoseconds), 20 ns, and 14 ns, for the first, second, and the third stages, respectively. Assume that there is no other delay and the processor does not suffer from any pipeline hazards. Also assume that one instruction is fetched every cycle.

The total execution time for executing 100 instructions on this processor is  
\_\_\_\_\_ ns.

[GATE-2023-CS: 1M]

Instruction execution in a processor is divided into 5 stages. Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Execute (EX) and Write Back (WB). These stages take 5, 4, 20, 10 and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated:

- (I) a naive pipeline implementation (NP) with 5 stages and
- (II) an efficient pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns and 8 ns respectively.

The speedup (correct to two decimal places) achieved by EP over NP in executing 20 independent instructions with no hazards is \_\_\_\_.

[GATE-2017(Set-1)-CS: 2M]

**THANK  
YOU!**

