



# Computer Organization and Architecture

Pipelining and  
vector processing  
**part-1**

# ABOUT ME : MURALIKRISHNA BUKKASAMUDRAM

- MTech with 20 years of Experience in Teaching GATE and Engineering colleges
- IIT NPTEL Course topper in Theory of computation with 96 %
- IGIP Certified (Certification on International Engineering educator)
- GATE Qualified
- Trained more than 50 Thousand students across the country
- Area of Expertise : TOC,OS,COA,CN,DLD



# Pipelining and vector processing Part-1

phases in the instruction cycle

- (1) FI ✓
- (2) DI ✓
- (3) FO ✓
- (4) EX ✓



Non-Pipelining



Pipelining



# Pipelining and vector processing Part-1

Non-Pipeline

4 stages

FI  
DI  
FO  
EX



Pipe-Lining



# Pipelining and vector processing part-1

1 cycle (10 ns) →

Assume that a 5 stage pipeline is providing a clock cycle time of 10 ns. If the non-pipeline clock is also the same duration, what is the Speed Up of Pipeline for an efficiency of 80%?

$s_1 \checkmark$   
 $s_2 \checkmark$   
 $s_3 \checkmark$   
 $s_4 \checkmark$   
 $s_5 \checkmark$

Sol :-

Speed Up ratio =

Time taken for one instruction  
in Non-Pipelining

$$\frac{5 \times 10 \text{ ns} = 50 \text{ ns}}{1 \times 10 = 10 \text{ ns}}$$

$s_1 \ s_2 \ s_3 \ s_4 \ s_5$

Time taken by one instruction  
for pipelining.

$$= \frac{50}{10} = 5$$



# Pipelining and vector processing Part-1

Example :- A 5 Stage Pipeline has IF, ID, OF, EX, WB stages. The stage delays are (5 n.s, 6 n.s, 8 n.s, 10 n.s, 7 n.s) respectively. If the pipeline registers are used between the stages, and delay is 2 n.s. what is the Speed Up of Pipeline when Compared to non-Pipeline ?

Non-Pipeline

$$5 + 6 + 8 + 10 + 7 = \underline{36 \text{ n.s}}$$

$$\text{Max}(5, 6, 8, 10, 7) = 10$$

$$\text{Speed up} = \frac{36}{10 + 2}$$

$$= \frac{36}{12} = \textcircled{3}$$

Sol



# Pipelining and vector processing part-1

## Pipeline Hazard

- (1) Structural Hazards
- (2) Data Hazards
- (3) Control Hazards



## Example of Structural Hazard

$I_1$  : ADD  $R_1, R_2, \underline{X}$  ;  $\overbrace{R_1} \leftarrow R_2 + M[X]$   
 $\overline{I_2}$  : MUL  $R_3, R_4, \underline{Y}$  ;  $R_3 \leftarrow R_4 * M[Y]$   
 $I_3$  : SUB  $R_5, R_6, \underline{Z}$  ;  $R_5 \leftarrow R_6 - M[Z]$   
 $I_4$  : Next instructions.

|       | clocks - | 1     | 2     | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 |
|-------|----------|-------|-------|----|----|----|----|----|----|----|----|
| $I_1$ | FI       | DI    | FO    | EX | WD |    |    |    |    |    |    |
| $I_2$ | FI       | DI    | FO    | EX | WD |    |    |    |    |    |    |
| $I_3$ |          | stall | stall |    |    | FI | DI | FO | EX | WD |    |
| $I_4$ |          |       |       |    |    |    |    |    |    |    |    |

Penalty = 2 stall cycles

# Pipelining and vector processing part-1

Throughput :— No. of instructions that can be finished in a particular Unit of time.

Example :—



Pipeline

$$\text{In } 15 \text{ n.s} = 1 \text{ instrm.}$$

$$1 \text{n.s} = \frac{1}{15} \text{ "}$$

$$1 \text{sec} = \frac{10^9}{15} \text{ "} = \frac{1000}{15} \text{ MIPS} \\ = 66.66 \text{ MIPS}$$

Non-Pipeline

$$1 \text{ instruction} = 10 + 5 + 15 + 8 \\ = 38 \text{ n.s}$$

$$1 \text{n.s} = \frac{1}{38} \text{ instrms.}$$

$$1 \text{sec} = \frac{1}{\frac{1}{38}} \times 10^9 \text{ "} \\ = \frac{1000}{38} \text{ MIPS}$$

# Pipelining and vector processing part-1

Non-Pipeline

$$1 \text{ instruction} = \underline{6 \text{ cycles}}$$

Pipeline ✓

$$\begin{aligned} 1 \text{ cycle} + 0.25 \times 2 &= 1 + 0.5 \\ &= 1.5 \text{ cycles} \end{aligned}$$

→ Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycle is

The diagram shows a horizontal pipeline with six stages, each represented by a vertical bar. The first stage has an upward-pointing arrow above it. The fourth stage has a circled number '4' inside a circle. The fifth stage has an upward-pointing arrow above it.

$$\text{Speed up} = \frac{6}{1.5} = 4$$

# Pipelining and vector processing part-1

$$P_1(1, 2, 2, 1) = \underline{2 \text{ ns}}$$

$$P_2(1, 1.5, 1.5, 1) = \underline{1.5 \text{ ns}}$$

$$P_3(0.5, 1, 1, 0.6, 1) = \underline{(1 \text{ ns})}$$

$$P_4(0.5, 0.5, 1, 1, 1.1) = \underline{1.1 \text{ ns}}$$

→ Consider the following processors (ns stands for nanoseconds.)  
Assume that the pipeline registers have zero latency.

**P1:** Four-stage pipeline with stage latencies (1ns, 2ns, 2ns, 1ns.)

**P2:** Four-stage pipeline with stage latencies (1ns, 1.5 ns, 1.5 ns, 1ns.)

**P3:** Five-stage pipeline with stage latencies (0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.)

**P4:** Five-stage pipeline with stage latencies (0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.)

Which processor has the highest peak clock frequency?

- (A) P1      (B) P2      (C) P3      (D) P4

$$\text{clock cycle time} = \frac{1}{\text{frequency}}$$

# Pipelining and vector processing part-1

Nom - Pipeline

$$(5 + 4 + 8 + 10 + 3) = 30 \text{ ns}$$

In  $30 \text{ ns} = 1 \text{ instrn.}$

$$1 \text{ ns} = \frac{1}{30} \text{ "}$$

→ A 5 stage instruction pipeline has IF, ID, OF, EX and WO stages. The stage delays are 5, 4, 8, 10, 3 nanoseconds respectively. Assume that there is no additional buffer overhead in pipelining. What are the throughputs of non-pipelining and pipelining respectively?

- A. 33.33 MIPS, 100 MIPS
- B. 100 MIPS, 33.33 MIPS
- C. 30 MIPS, 25 MIPS
- D. None of these

Pipeline

$$\text{Max}\{5, 4, 8, 10, 3\} = 10$$

In  $10 \text{ ns} = 1 \text{ instrn.}$

$$1 \text{ sec} = \frac{10^9}{30} \text{ instrns.}$$

$$= \frac{1000}{30} \text{ MIPS}$$

$$= \underline{\underline{33.33 \text{ MIPS}}}$$

$$1 \text{ ns} = \frac{1}{10} \text{ "}$$

$$1 \text{ sec} = \frac{10^9}{10} \text{ "}$$

$$= \underline{\underline{100 \text{ MIPS}}}$$

# Pipelining and vector processing part-1

(5, 12, 8, 10)

Non-Pipeline

$$= (5 + 12 + 8 + 10)$$

$$= 35 \text{ n.s} \checkmark$$

→ A 4 stage pipeline has IF, ID, EX and WO stages. The stage delays are 5, 12, 8, 10 nanoseconds respectively. The pipeline buffer delay is 2 nanoseconds. What is the speed up of pipeline when compared to corresponding non pipelining execution?

- A. 2
- B. 3
- C. 4
- D. 2.5

Pipeline

$$\text{Max}(5, 12, 8, 10) = 12 \text{ n.s}$$

$$12 + 2 = 14 \text{ n.s}$$



$$\text{Speed Up} = \frac{35}{14} = 2.5$$

# Pipelining and vector processing part-1

$$10^3 \text{ m.s} = 1 \text{ sec}$$

$$10^6 \text{ usec} = 1 \text{ sec}$$

$$10^9 \text{ n.s} = 1 \text{ sec}$$

$$10^{12} \text{ picoSec} = 1 \text{ sec}$$

33-33%

→ The stage delays in a 4 stage pipeline are (800, 500, 400 and 300) picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipe line in percentage is ?

$$= \frac{1666 - 1250}{1250} \times 10^6 \quad \text{Case I : } (800, 500, 400, 300) \checkmark = 800$$

$$\text{Case II : } (600, 350, 500, 400, 300) \checkmark = 600$$

Case I

$$1 \text{ instrm} = 800 \text{ P.Sec}$$

$$1 \text{ Pico Sec} = \frac{1}{800}$$

$$1 \text{ sec} = \frac{10^{12}}{800} = \frac{10000}{8} \text{ MIPS}$$

$$= 1250 \text{ MIPS}$$

Case II

$$1 \text{ instrm} = 600 \text{ P.Sec}$$

$$1 \text{ P.Sec} = \frac{1}{600} \text{ instrms}$$

$$1 \text{ sec} = \frac{10^{12}}{600} \text{ "}$$

$$\frac{10000}{6} = 1666.66 \text{ MIPS}$$