

# COMPUTER SCIENCE

## Computer Organization and Architecture

Instruction Pipelining

Lecture\_05



Vijay Agarwal sir



**TOPICS  
TO BE  
COVERED**



o1

## Pipelining Hazards

PIPELINE Concept

& Execution time.

& Speed up factor, throughput

Without Stalls | Extra cycle | Bubbles

## In The RISC Pipeline 5 Stages:

1. Instruction Fetch {IF Stage}
2. Instruction Decode{ID Stage}
3. Execute {EX Stage}
4. Memory Access {MA Stage}
5. Write Back {WB Stage}

1. **Instruction Fetch {IF Stage}:** In this stage Instruction is fetched from Memory.
2. **Instruction Decode{ID Stage} :** In this Stage 2 operation are performed:
  - (i) Decode the instruction
  - (ii) Operand loading(fetching) from the register file .  
This stage also contain comparator circuit to evaluate the branch condition.

3. **Execute {EX Stage}**: In this stage Data Processing (ALU Operations) are performed
4. **Memory Access {MA Stage}** : In this Stage Operand (Data) will be accessed from memory(load or store).
5. **Write Back {WB Stage}** : In this stage Register write (Operand storing into reg. file) operation performed.

May be we have.

- 4 Stage
- 5 Stage
- 6 Stage      OR
- 7 Stage      OR

Given in  
the Question.

## Additional Stages

### Fetch Instruction (FI)

- ❖ Read the next expected Instruction into a buffer.

### Decode Instruction (DI)

- ❖ Determine the opcode and the operand specifiers.

### Calculate operands(CO)

- ❖ Calculate the effective address of each source operand.
- ❖ This may involve displacement, register indirect or other forms of address calculations.

### Fetch Operands(FO)

- ❖ Fetch each operand from memory.
- ❖ Operands in register need not be fetched.

### Executed Instruction(EI)

- ❖ Perform the indicated operation and store the result, if any, in the specified destination operand location

### Write Operand(WO)

- ❖ Store the result in memory

|               | Time → |    |    |    |    |    |    |    |    |    |    |    |    |    |
|---------------|--------|----|----|----|----|----|----|----|----|----|----|----|----|----|
|               | 1      | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
| Instruction 1 | FI     | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |    |
| Instruction 2 |        | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |
| Instruction 3 |        |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |
| Instruction 4 |        |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |
| Instruction 5 |        |    |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |
| Instruction 6 |        |    |    |    |    | FI | DI | CO | FO | EI | WO |    |    |    |
| Instruction 7 |        |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |    |
| Instruction 8 |        |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |
| Instruction 9 |        |    |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |

Timing Diagram for Instruction pipeline operation

# Pipeline Hazards

Occur when the Pipeline, or some portion of the pipeline must stall Because conditions Do not permit Continued execution

There are three Types of hazards:

- Resource
- Data
- Control



Also referred to as a Pipeline bubble

In general, there are three major difficulties that cause the instruction pipeline to deviate from its normal operation.

1. *Resource conflicts* caused by access to memory by two segments at the same time. Most of these conflicts can be resolved by using separate instruction and data memories.
2. *Data dependency* conflicts arise when an instruction depends on the result of a previous instruction, but this result is not yet available.
3. *Branch difficulties* arise from branch and other instructions that change the value of PC.

## Hazards/Dependencies In the pipeline

- Dependency is a major problem in the pipeline, causes extra cycle.
- Cycle in the pipeline without new input is called as extra cycle. Also named as “Stall”.
- When stall is present in the pipeline then CPI  $\neq 1$ .
- There are 3 kinds of dependencies possible in the pipeline-
  - I. Structural dependency/ Structural Hazards
  - II. Data dependency/ Data Hazards
  - III. Control dependency/ Control Hazards

# ① Structural Dependency.

## Structural Dependency

- When 2 Stages Require the Same Resource at the same time [Same Cycle]
- Due to Resource Conflict

Resource May AW, Functional Unit @ others

# Structural Dependency

I<sub>1</sub>:

| CC1         | CC2 | CC3         | CC4 | CC5 | CC6 | CC7 |
|-------------|-----|-------------|-----|-----|-----|-----|
| (IF)<br>MEM | ID  | (EX)<br>ALU | MEM | WB  |     |     |

I<sub>2</sub>:

|     |    |     |     |    |
|-----|----|-----|-----|----|
| MEM | ID | ALU | MEM | WB |
|-----|----|-----|-----|----|

I<sub>3</sub>:

|     |    |     |     |    |
|-----|----|-----|-----|----|
| MEM | ID | ALU | MEM | WB |
|-----|----|-----|-----|----|

I<sub>4</sub>



Resource Conflict (Sharing the same resource in CC4)

cc [Clock Cycle]

CC4

ID & IDy

Accessing  
Some Resource.

# Structural Dependency

I<sub>1</sub>:

|                  | CC1         | CC2 | CC3         | CC4 | CC5 | CC6 | CC7 |
|------------------|-------------|-----|-------------|-----|-----|-----|-----|
|                  | (IF)<br>MEM | LD  | (EX)<br>ALU | MEM | WB  |     |     |
| I <sub>1</sub> : |             |     |             |     |     |     |     |

I<sub>2</sub>:

|  |     |    |     |     |    |
|--|-----|----|-----|-----|----|
|  | MEM | ID | ALU | MEM | WB |
|--|-----|----|-----|-----|----|

I<sub>3</sub>:

|  |     |    |     |     |    |
|--|-----|----|-----|-----|----|
|  | MEM | ID | ALU | MEM | WB |
|--|-----|----|-----|-----|----|

I<sub>4</sub>



# Structural Dependency

Solution of Structural Dependency.

→ Resource Replication

→ Divide the Memory into Independent Module



# Structural Dependency

Solution:

|                  | CC1 | CC2 | CC3         | CC4 | CC5 | CC6 | CC7 | CC8 |
|------------------|-----|-----|-------------|-----|-----|-----|-----|-----|
| I <sub>1</sub> : | CM  | ID  | (Ex)<br>ALU | DM  | WB  |     |     |     |
| I <sub>2</sub> : | CM  | ID  | ALU         | DM  | WB  |     |     |     |
| I <sub>3</sub> : | CM  | ID  | ALU         | DM  | WB  |     |     |     |
| I <sub>4</sub>   |     |     | CM          | ID  | ALU | DM  | WB  |     |

No Stall.



# DATA Dependency.

# Data Dependency

$I_1 : \text{ADD } r_1 \ r_2 \ r_3 ;$        $r_1 \leftarrow r_2 + r_3$

$I_2 : \text{MUL } r_4 \ r_1 \ r_5 ;$        $r_4 \leftarrow r_1 * r_5$



$I_2$  is trying to use the result of  $I_1$  as a operand.  
So Create Data Dependency.

- Data Dependency: is created in the Pipeline, When Next Instruction [ $I_{i+1}, I_2, I_j$ ] tries to Use the Result of Previous Instruction [ $I_i, I_1, I_i$ ] as a operand | DATA .

# Data Dependency

Assume 5 Stage

TF & ID

OF

EX

MA

WB

$T_L$ : IF

OF

EX

MA

WB

$T_2$ :

IF

OF

EX

MA

WB.

Data Dependency

# Data Dependency

Data dependency will be occurred when the instruction "J" try to read the data before instruction "I" writes it.

**"Read-Before-write"**

Ex.-

$I_1: \text{Add } r_0, r_1, r_2 : r_0 \leftarrow r_1 + r_2$   
 $I_2: \text{MUL } r_3, r_0, r_2 : r_3 \leftarrow r_0 * r_2$

Here  $I_2$  tries to Read the Value of Register  $r_0$  before  $I_1$  Write it.

## Data Dependency



*Not create problem  
in Non Pipeline*

When the above instruction are executed in a Non-pipelined system then data dependency condition does not occur because " $I_2$ " executing after ' $I_1$ '

So,  $I_2$  reads the register ( $r_o$ ) data after  $I_1$  writes it [Read-After- write].

Because In Non Pipelining 'Non Overlapping execution'.

i.e In Non Pipelining New Input Insert Only After Completion of old input.

↳ But in Pipelining overlapping execution So Data Dep. creates a Problem in Pipelining.

# Data Dependency

- But When the above instruction are executed in a pipeline system then data dependency condition will be occurred because  $I_2$  will be executing along with  $I_1$  so  $I_2$  reads the  $r_0$  data before  $I_1$  writes it. Therefore,  $I_2$  incorrectly get the old value from  $r_0$  [Data loss].



- When we tries to Read the Value ab  $I_0$  (by  $I_2$ ) before  $I_1$  Write it then  $I_2$  Reading the OLD value  $\textcircled{O}$  Incorrect Value.
- So for getting the Correct Value Instruction  $I_2 (I_{i+1}, I_j)$  must Wait Untill the Completion of Instruction  $I_1 [I_i]$ .

## Data Dependency

Assume 5 stage

TF&DD

OF

EX

ma

WB

$T_L$ :

CC1 | CC2

CC3

CCW

CC

5

CCG

CC7

CCS

CC

1

MA  
T<sub>2</sub>:

11

The logo consists of three vertical columns. Each column features a series of five pink diagonal lines at the top, followed by a thick, wavy pink line that spans all three columns horizontally. Below this graphic, the word "Stally" is written in a bold, pink, sans-serif font.

OF

E

X

mA

WR

This waiting creates Extra cycle/stalls/bubbles in the Pipeline.

Q. How to check there is a Dependency  
between the Consecutive Instruction ?

How to Detect Dep b/w  $I_i$  &  $I_2$   
 $(I_i \text{ } \& \text{ } I_{i+1})$   
 $(I_i \text{ } \& \text{ } I_j)$

See  
→ P.T.O (Next Page)

# Data Dependency

- To handle above problem, some logic will be maintained in the “ID” stage of pipeline to stop the accessing of a dependent data.
- Structure of a logic is as follows:

“TOMASULO ALGO”

$$T_1: r_0 \in r_1 + r_2$$

$$T_2: r_3 \in r_0 * r_4$$

| S/No           | Function Unit | Destination    | Independent Source 1 | Independent Source 2 | Dependent Source 1               | Dependent Source 2 | Status | Reg. file access                                                        |
|----------------|---------------|----------------|----------------------|----------------------|----------------------------------|--------------------|--------|-------------------------------------------------------------------------|
| I <sub>1</sub> | ADD           | R <sub>0</sub> | r <sub>1</sub>       | r <sub>2</sub>       | -                                | -                  | 0      | r <sub>1</sub> = value<br>r <sub>2</sub> = value<br>EX- Stage Allocated |
| I <sub>2</sub> | MUL           | r <sub>3</sub> | -                    | r <sub>4</sub>       | r <sub>0</sub> (I <sub>1</sub> ) | -                  | 0      | r <sub>2</sub> = value<br>EX- Stage not allocated                       |

“0” execution yet not complete

“1” execution complete

$$I_i: \gamma_0 \in \gamma_1 + \gamma_2$$
$$I_j: \gamma_3 \in \gamma_0 * \gamma_4$$

Domain : Source operand

Range : (Destination)  
(in which trying to write)

If  $\text{Range}(I_i) \cap \text{Domoin}(I_j) \neq \emptyset$

[Something  
then Data Dep. Common]

eg.

$I_1$

$I_2$

$(I_{i+1})$

$I_j$

$I_i: \underline{\gamma_0} \in \gamma_1 + \gamma_2$

$\gamma_3 \in \underline{\gamma_0} * \underline{\gamma_4}$

I. Domain : Source operand  
(Trying to Read)

II. Range : (Destination)  
(trying to write)

| Domain ( $I_i$ )                             | Range ( $I_i$ )        |
|----------------------------------------------|------------------------|
| $\underline{\gamma_1}, \underline{\gamma_2}$ | $\underline{\gamma_0}$ |
| Domain ( $I_{i+1}$ )                         | Range ( $I_{i+1}$ )    |
| $\underline{\gamma_0} \underline{\gamma_4}$  | $\underline{\gamma_3}$ |

If Range ( $I_i$ )  $\cap$  Domoin ( $I_j$ )  $\neq \emptyset$

$\underline{\gamma_0}$   
Data Dependency  
is there.

$(\underline{\gamma_0}) \cap (\underline{\gamma_0}, \underline{\gamma_4})$   
 $(\underline{\gamma_0})$  Having Common

then Data Dep. [Something  
Common]

## Data Dependency

- By using the above logic, we need to stop the access of dependent data unit the data becomes available.

This waiting creates Extra Cycle/Bubble/stall in the pipeline that is described below :

# Data Dependency

Assume 5 Stage

IF & ID

OF

EX

MA

WB

$T_1$ :

|         | CC1 | CC2     | CC3 | CC4 | CC5     | CC6 | CC7 | CC8 | CC9 |
|---------|-----|---------|-----|-----|---------|-----|-----|-----|-----|
| IF      |     |         |     |     |         |     |     |     |     |
| ID & OF |     |         |     |     |         |     |     |     |     |
| EX      |     |         |     |     |         |     |     |     |     |
| MA      |     |         |     |     |         |     |     |     |     |
| WB      |     |         |     |     |         |     |     |     |     |
| $T_2$ : | IF  |         |     |     | DD & OF | EX  | MA  | WB. |     |
|         |     | Stalls. |     |     |         |     |     |     |     |

This waiting creates Extra cycle/stalls/bubbles in the Pipeline.

## Topic : Dependencies In the pipeline

|                | CC1     | CC2                                                                      | CC3 | CC4 | CC5                                        | CC6                                    | CC7 |
|----------------|---------|--------------------------------------------------------------------------|-----|-----|--------------------------------------------|----------------------------------------|-----|
| I <sub>1</sub> | IF      | ID                                                                       | EX  | MA  | WB<br>$r_0 = \text{updated}$<br>Status = 1 |                                        |     |
| I <sub>2</sub> | IF<br>✓ | ID<br>S: $r_0, r_2$<br>(D) ( $I_0$ )<br>$r_2 = \underline{\text{value}}$ | ID  | ID  | STALL'S                                    | ID<br>$r_0 = \underline{\text{value}}$ | EX  |
| I <sub>3</sub> |         | IF                                                                       | IF  | IF  |                                            | IF                                     | ID  |
| I <sub>4</sub> |         |                                                                          | =   | =   |                                            | =                                      | IF  |

7 cycles - 4 i/ps = 3 stalls

Read  
-After  
Write

# Solution Of Data Dependency

## Hardware Interlocks

The most straightforward method is to insert hardware interlocks. An interlock is a circuit that detects instructions whose source operands are destinations of instructions further up in the pipeline.

*Lock All the Stage for  
Dependent & Further Instn.*

## Operand Forwarding

Another technique called operand forwarding uses special hardware to detect a conflict and then avoid it by routing the data through special paths between pipeline segments. For example, instead of transferring an ALU result into a destination register, the hardware checks the destination operand, and if it is needed as a source in the next instruction, it passes the result directly into the ALU input, bypassing the register file.

## Solution of Data Dependency

- The minimize the data dependency stall in the pipeline hardware technique is used i.e. operand forwarding. Also named as By-passing or Short –circuit
  
- This mechanism state that use the buffer between the stage to hold the intermediate result so that, dependent instruction will be accessing the new data from the buffer before update register file.

# Data Dependency

eg. Consider the following program segment, executed on a RISC pipeline with a operand forwarding technique.

I<sub>1</sub>: Add r0, r1, r2  
I<sub>2</sub>: Sub r3, r0, r2  
I<sub>3</sub>: MUL r4, r3, r0  
I<sub>4</sub>: DIV r5, r4, r0



Non- Adjacent dependency

- ① True Data Dep.
- ② Data Dep.

Note: (1) Adjacent data dependency is named as true data dependency i.e.

$$\begin{cases} I_2 - I_1 \\ I_3 - I_2 \\ I_4 - I_3 \end{cases}$$

## Data Dependency

(2) Non-Adjacent data dependency is named as data dependency only.

i.e.

$$\begin{array}{rcl} I_3 & = & I_1 \\ I_4 & = & I_1 \end{array}$$

$$I_3 - I_1$$

$$I_4 - I_1.$$

# Data Dependency

(without Operand Forwarding)

Assume 5 Stage

IF & ID

OF

EX

MA

WB

$I_1$ : IF

ID&OF

EX

MA

WB

$I_2$ :

IF



DD&OF

EX

MA

WB.

This waiting creates Extra cycle/stalls/bubbles in the Pipeline.

# Data Dependency

(with operand forwarding)

P  
W

Assume 5 stage



# With Operand Forwarding

|                | CC1 | CC2 | CC3                     | CC4                                        | CC5                                                           | CC6 |
|----------------|-----|-----|-------------------------|--------------------------------------------|---------------------------------------------------------------|-----|
| I <sub>1</sub> | IF  | ID  | EX                      | MA                                         | WB<br>$r_0 = \text{update}$                                   |     |
| I <sub>2</sub> |     | IF  | ID<br>(I <sub>1</sub> ) | EX                                         | MA                                                            | WB  |
| I <sub>3</sub> |     |     | IF                      | ID<br>(I <sub>2</sub> ), (I <sub>1</sub> ) | EX                                                            | MA  |
| I <sub>4</sub> |     |     |                         | IF                                         | $r_0 = \text{value}$<br>ID (I <sub>1</sub> , I <sub>3</sub> ) | EX  |
| I <sub>5</sub> |     |     |                         |                                            | IF                                                            | ID  |
| I <sub>6</sub> |     |     |                         |                                            |                                                               | IF  |

6 cycles – 6 i/p's = 0 stalls

Q.

Consider a pipelined processor with the following four stages

P  
W

IF : Instruction Fetch

ID : Instruction Decode and Operand Fetch

EX : Execute

WB : Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

1F

(i) Without Overhead Forwarding.

IDfor  
Ex →  
WB

|   |     |     |     |    |                         |
|---|-----|-----|-----|----|-------------------------|
| 1 | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| 3 | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| 1 | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

Aug(12) without open and forwarding.

[GATE-2007 : 2 Marks]

- A  
B  
C  
D

**IF Other Stack (ii) Without operand Forwarding.**

|                      |     |     |     |    |                         |
|----------------------|-----|-----|-----|----|-------------------------|
| IDF or $I_1$         | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| EX $\rightarrow I_2$ | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| WB $\rightarrow I_3$ | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

Ans(12) Without operand forwarding.

[IGATE-2007 : 2 Marks]



TF

TD40r

**EX →**

ωβ

2

|     |     |     |    |                         |
|-----|-----|-----|----|-------------------------|
| ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

## With operand forwarding (8)

[GATE-2007 : 2 Marks]

A

7

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13

B

8

4

TF ID EX WB

C

1

1

IF ID<sup>↓</sup> EX EX EX WB

D

1

I<sub>2</sub>

DF ID OF ↓

P  
W

IF  
ID for  
Ex →  
WB  
other approach

I

With Operand Forwarding.

|   |     |     |     |    |                         |
|---|-----|-----|-----|----|-------------------------|
| 1 | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| 3 | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| 1 | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

With operand forwarding (8).

[IGATE-2007 : 2 Marks]



# IF Another Approach ① with operand forwarding

|   |     |     |     |    |                         |
|---|-----|-----|-----|----|-------------------------|
| 1 | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| 3 | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| 1 | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |



By other Approach.

In Pipeline  $CPI = 1$

But Some Insts take More cycle in the Stage.

So we Count Extra Cycle [Stalls].

Total Cycle =  $ET_{PIPE}$  + Extra Cycle [Stalls]

Q.

Consider a pipelined processor with the following four stages

P  
W

IF : Instruction Fetch

ID : Instruction Decode and Operand Fetch

EX : Execute

WB : Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

IF : L

ID : L

EX : L (for ADD & SUB)

WB : L

CPI = L

$$n(\# \text{Inst}) = 3$$

$$k(\# \text{Stages}) = 4$$

Normally

$$\begin{aligned} ET_{PIPE} &= [k + (n-1)] \text{ cycle} \\ \text{without Stalls} &\rightarrow [4 + (3-1)] \end{aligned}$$

$$\begin{aligned} ET_{PIPE} &= 6 \text{ cycle} \\ \text{without Stalls} & \end{aligned}$$

$$\begin{aligned} \text{Extra Cycle} &= 2 \text{ cycle} \\ (\text{Stalls}) & \end{aligned}$$

---

$$\text{Total Cycle} = 8 \text{ cycle}$$

Avg

TF

ID for

EX →

WB

|

|   |     |     |     |    |                         |
|---|-----|-----|-----|----|-------------------------|
| 1 | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| 3 | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| 1 | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

TF : ⊥

[GATE-2007 : 2 Marks]

A 7

ID : ⊥

B 8

EX : ADD &amp; SUB &amp; MUL

C 10

WB : ⊥

D 14

 $n =$

TF

DD for

Ex →

WB

|   |     |     |     |    |                         |
|---|-----|-----|-----|----|-------------------------|
| 1 | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| 3 | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| 1 | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

[GATE-2007 : 2 Marks]

- A 7
- B 8
- C 10
- D 14

Q.

The performance of a pipelined processor suffer if

P  
W

[GATE-2002 : 2 Marks]

- A the pipeline stages have different delays
- B consecutive instructions are dependent on each other
- C the pipeline stages share hardware resources
- D All of the above

**MCQ**

A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write Operand (WO) stage. The IF, ID, OF and WO stage take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions ,3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

**Instruction**

- I<sub>0</sub>: MUL R<sub>2</sub>, R<sub>0</sub>, R<sub>1</sub>
- I<sub>1</sub>: DIV R<sub>5</sub>, R<sub>3</sub>, R<sub>4</sub>
- I<sub>2</sub>: ADD R<sub>2</sub>, R<sub>5</sub>, R<sub>2</sub>
- I<sub>3</sub>: SUB R<sub>5</sub>, R<sub>2</sub>, R<sub>6</sub>

**Meaning of Instruction**

- R<sub>2</sub>  $\leftarrow$  R<sub>0</sub>\*R<sub>1</sub>
- R<sub>5</sub>  $\leftarrow$  R<sub>3</sub>/R<sub>4</sub>
- R<sub>2</sub>  $\leftarrow$  R<sub>5</sub> + R<sub>2</sub>
- R<sub>5</sub>  $\leftarrow$  R<sub>2</sub> - R<sub>6</sub>

**A** 13

**B** 15

**C** 17

**D** 19

[GATE-2010-CS: 2M]

**Q.**

For a pipelined CPU with a single ALU, consider the following situations

P  
W

1. The  $j^{th} + 1^{st}$  instruction uses the result of the  $j^{th}$  instruction as an operand
2. The execution of a conditional jump instruction
3. The  $j^{th}$  and  $j^{th} + 1^{st}$  instructions require the ALU at the same time

Which of the above can cause a hazard?

**[GATE-2003 : 1 Marks]**

- A** 1 and 2 only
- B** 2 and 3 only
- C** 3 only
- D** All the three

Q.

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement  $X = (S - R * (P + Q))/T$  is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

|       |            |                           |
|-------|------------|---------------------------|
| ADD   | R5, R0, R1 | ; $R5 \leftarrow R0 + R1$ |
| MUL   | R6, R2, R5 | ; $R6 \leftarrow R2 * R5$ |
| SUB   | R5, R3, R6 | ; $R5 \leftarrow R3 - R6$ |
| DIV   | R6, R5, R4 | ; $R6 \leftarrow R5 / R4$ |
| STORE | R6, X      | ; $X \leftarrow R6$       |

The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

[GATE-2006: 2 Marks]

- A** 10
- B** 12
- C** 14
- D** 16

Q. Consider the sequence of machine instructions given below:

|     |            |
|-----|------------|
| MUL | R5, R0, R1 |
| DIV | R6, R2, R3 |
| ADD | R7, R5, R6 |
| SUB | R8, R7, R4 |

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The PO state takes 1 clock cycle for ADD or SUB instruction. 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is \_\_\_\_\_.

[GATE-2015 (Set-2): 2 Marks]

$I_1: MUL R_5 \leftarrow R_0 \times R_1$  → Stage 3  
 $I_2: DIV R_6 \leftarrow R_2 / R_3$  → Stage 5  
 $I_3: ADD R_7 \leftarrow R_5 + R_6$  → Stage 1  
 $I_4: SUB R_8 \leftarrow R_7 - R_4$  → Stage L  
  
 $I_2$  is Not Dependent on  $I_1$

$k = 4$  (Stages)

$n = 4$  (Instrn)

$$\begin{aligned}
 ET_{PIPE} &= [k + (n-1)] \text{ cycle} \\
 \text{without stalls} &\rightarrow (4 + 4-1) \\
 &= 7 \text{ cycle}
 \end{aligned}$$

Postage  
 IF (Fetch & Decode)  
 OF (operand Fetch)  
 PO (Perform operation)  
 WB (Write Back)

TF : L  
 DF : L  
 WB : L



$$\text{Total cycle} = 7 + 6 = 13 \text{ cycle}$$

Note: This Method will work when only one stage taking different Delay & one type of dependency is there.

P  
W

$I_1: MUL R_5 \leftarrow R_0 \times R_1$  → **Postage 3** IF (Fetch & Decode)  
 $I_2: DN R_6 \leftarrow R_2 / R_3$  → **5** OF (operand Fetch)  
 $I_3: ADD R_7 \leftarrow R_5 + R_6$  → **1** PD (Perform operation)  
 $I_4: SUB R_8 \leftarrow R_7 - R_4$  → **L** WB (Write Back)

$I_2$  is Not Dependant On  $I_1$

| WB |       |       |       |       | $I_1$ |       |       |       | $I_2$            | $I_3$            | $I_4$ |
|----|-------|-------|-------|-------|-------|-------|-------|-------|------------------|------------------|-------|
| PD |       |       | $I_1$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_2$ | $I_2$            | $I_3$            | $I_4$ |
| OF |       | $I_1$ | $I_2$ |       |       |       |       |       | $I_3 \downarrow$ | $I_4 \downarrow$ |       |
| IF | $I_1$ | $I_2$ | $I_3$ | $I_4$ |       |       |       |       |                  |                  | 13    |

P  
W



|       | cc1 | cc2 | cc3 | cc4 | cc5 | cc6 | cc7 | cc8 | cc9 | cc10 | cc11 | cc12 | cc13 |
|-------|-----|-----|-----|-----|-----|-----|-----|-----|-----|------|------|------|------|
| $I_1$ | IF  | OF  | PO  | PO  | PO  | WB  |     |     |     |      |      |      |      |
| $I_2$ |     | IF  | OF  | PO  | PO  | PO  | PO  | PO  | WB  |      |      |      |      |
| $I_3$ |     |     | IF  |     |     |     |     | DE  | PO  | WB   |      |      |      |
| $I_4$ |     |     |     | IF  |     |     |     | OF  | PO  | WB   |      |      |      |

Sometimes Mistake Done by the Students.

$I_1: MUL R_5 \leftarrow R_0 \times R_1$  → **Postage**  
 $I_2: DIV R_6 \leftarrow R_2 / R_3$  → **3**  
 $I_3: ADD R_7 \leftarrow R_5 + R_6$  → **5**  
 $I_4: SUB R_8 \leftarrow R_7 - R_4$  → **L**

**IF (Fetch & Decode)**  
**OF (operand Fetch)**  
**PO (Perform operation)**  
**WB (Write Back)**

I<sub>2</sub> is Not  
 Dep. on I<sub>1</sub> cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 **cc13**

|       | IF | OF | PO | PO | PO | WB |    |    |    |    |  |  |
|-------|----|----|----|----|----|----|----|----|----|----|--|--|
| $I_1$ | IF | OF | PO | PO | PO | WB |    |    |    |    |  |  |
| $I_2$ |    | IF | OF |    | PO | PO | PO | PO | PO | WB |  |  |
| $I_3$ |    |    | DF |    |    |    |    | OF | PO | WB |  |  |
| $I_4$ |    |    | RF |    |    |    |    | OF | PO | WB |  |  |

13 cycle

The instruction pipeline of a RISC processor has the following stages. Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions, In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.

The number of clock cycles required for completion of execution of the sequence of instructions is \_\_\_\_.

[GATE-2018-CS: 2M]

DF : 1  
DD : 1  
OF : 1  
PD

WB : L

n = 100

k = 5

$$ET = [k + (n-1)] \\ \Rightarrow 5 + (100-1)$$

without stalls. = 104 cycle

CPL = 1

$$\begin{array}{rcl} \text{Extern} & \rightarrow & 40 \times 2 = 80 \\ \text{Cycle} & & \\ & + 35 \times L & + 35 \\ & + 25 \times O & + 0 \\ & & \hline & & 115 \end{array}$$

$$\begin{array}{l} \text{Total Cycle} = 104 + 115 \\ = \underline{\underline{219 \text{ Cycle}}} \end{array}$$

DF : 1

DD : 1

OF : 1

PD

$40 \rightarrow 3, 35 \rightarrow 2, 25 \rightarrow L$

WB : L

n = 100

k = 5

$$ET = [k + (n-1)] \\ \Rightarrow 5 + (100-1)$$

without stalls. = 104 cycle

CPL = 1

$$\begin{array}{rcl} \text{Extern} & \rightarrow & 40 \times 2 = 80 \\ \text{Cycle} & & \\ & & + 35 \times L + 35 \\ & & + 25 \times 0 + 0 \\ & & \hline & & 115 \end{array}$$

$$\begin{array}{l} \text{Total Cycle} = 104 + 115 \\ = \underline{219 \text{ Cycle.}} \end{array}$$

# DATA Dependency | Hazard.

- ① RAW (Read After Write): True Data Dep.
- ② WAR (Write After Read): ANTI Dep.
- ③ WAW (Write After Write): Output/Write Dep.

# DATA Dependency | Hazard.

- ① RAW (Read After Write): True Data Dep.
- ② WAR (Write After Read): ANTI Dep.
- ③ WAW (Write After Write): Output/Write Dep.

Note

Here RAW, WAR, WAW means Execution sequence ie RAW (Read, After Write).  
if follow the execution sequence then No Problem.  
But if Not follow then these Hazard Create.

## RAW [Read, After Write]. | True Data Dep.

$I_1(I_i) : \text{MUL } \delta_0 \leftarrow \delta_1 * \delta_2$

$I_2(I_j) : \text{ADD } \delta_4 \leftarrow \delta_0 + \delta_3$

If first  $I_1(I_i)$  execute then  $I_2(I_j)$  execute [ie  $\delta_0$  Read After  $\delta_1$  Write] then No Hazards (No Problem)

But When  $I_2(I_j)$  tries to Read before  $I_1(I_i)$  write then Hazards

## RAW [Read, After Write]. | True Data Dep.



If  $T_2$  tries to Read  $\gamma_0$  before  $T_1$  Update[write]

$T_2$  Read  $\gamma_0 = 25$  Value

OLD value / Incorrect value

Hazards

But actually  $\gamma_0$  value = 600

## Types of Data Hazard

- Read after write (RAW), or true dependency
  - ❖ An instruction modifies a register or memory location.
  - ❖ Succeeding instruction reads data in memory or register location.
  - ❖ Hazard occurs if the read takes place before write operation is complete.
- Write after read (WAR), or antidependency
  - ❖ An instruction reads a register or memory location.
  - ❖ Succeeding instruction writes to the location.
  - ❖ Hazard occurs if the write operation completes before the read operation takes place.
- Write after write (WAW), or Write/output dependency
  - ❖ Two instruction both write to the same location.
  - ❖ Hazard occurs if the write operations take place in the reverse order of the intended sequence.

# Types of Data Hazard

## RAW



## Types of Data Hazard

WAR | ANTI Deb. [Write, After Read]

(I<sub>i</sub>) I<sub>1</sub>       $R_3 \leftarrow R_1 * R_2$

(I<sub>j</sub>) I<sub>2</sub>       $R_1 \leftarrow R_4 + R_5$

If I<sub>2</sub>(I<sub>j</sub>) Tries to Write the Data before I<sub>j</sub>(I<sub>i</sub>) Read it.

## Types of Data Hazard

WAW [Write After Write]

(I<sub>i</sub>) I<sub>1</sub>      R<sub>3</sub> ∈ R<sub>1</sub> × R<sub>2</sub>

(I<sub>j</sub>) I<sub>2</sub>      R<sub>3</sub> ∈ R<sub>4</sub> + R<sub>5</sub>

If I<sub>j</sub> tries to write before I<sub>i</sub> wrote it then  
Output Dep.

## Types of Data Hazard

- CPU always execute a sequence the program from top to bottom in a sequence called as “in-order” execution.
- In this execution sequence if any instruction is dependent then the remaining instruction are also sharing the stall cycles even, they are independent.

Ex:- Code

|                  |     |                                                  |
|------------------|-----|--------------------------------------------------|
| I <sub>1</sub> : | Add | r <sub>0</sub> , r <sub>1</sub> , r <sub>2</sub> |
| I <sub>2</sub> : | sub | r <sub>3</sub> , r <sub>0</sub> , r <sub>4</sub> |
| I <sub>3</sub> : | MUL | r <sub>4</sub> , r <sub>5</sub> , r <sub>6</sub> |
| I <sub>4</sub> : | DIV | r <sub>3</sub> , r <sub>7</sub> , r <sub>8</sub> |

In- order execution sequence



## Types of Data Hazard

- In the above execution sequence ' $I_2$ ' is data dependent on " $I_1$ " So,  $I_2$  will be waiting until the ' $I_1$ ' execution is completed when the hardware does not support operand forwarding.  
This waiting creates stall in the pipeline this stall are also shared by the  $I_3$  and  $I_4$  instruction even they are independent.
- To utilize this point instruction scheduling concept states that insert the independent instruction first.  
It causes out of order execution {Re-order execution}.
- Out of order execution sequence is  $I_1 - I_3 - I_4 - I_2$
- Out of order execution create two more dependencies in the pipeline ..
  - (i) Anti data dependency
  - (ii) output/Write data dependency

# Types of Data Hazard

# Instruction "J"      Instruction "I"

Read – Before – write [True Data Loss]

Write – Before – Read [ANTI Data loss]

Write – Before – write [O/P DATA Loss]

↓  
AFTER

↓  
Delay

↓  
Stall

↓  
Hazard

## Types of Data Hazard

**Anti Dependency** will be occurred when the instruction ‘J’ try to write the data before instruction ‘I’ reads it.

Example - I<sub>3</sub> executed before I<sub>2</sub> so I<sub>3</sub> modified the register (r<sub>4</sub>) before I<sub>2</sub> reads it, therefore I<sub>2</sub> incorrectly read the new value from (r<sub>4</sub>) [Data loss].

**Output dependency** will be occurred when the instruction “J” tried to write the data before instruction ‘I’ writes.

Ex- I<sub>4</sub> executes before I<sub>2</sub> so, I<sub>4</sub> modifies the register (r<sub>3</sub>) before I<sub>2</sub> writes it therefore destination is incorrectly updated with old value [data loss].

## Types of Data Hazard

**Note:** To handle the above dependency problem hardware technique is used i.e., register renaming.

This technique states that, use the re-order buffer to store the out-of-order instruction output later update the register file with a re-order buffer content after the completion of a dependent instruction therefore data is safe.

# Types of Data Hazard

# Register Re-naming:

Execution

$$I_1 \rightarrow r_0$$

$$\begin{array}{l} I_3 \rightarrow \\ I_4 \rightarrow \end{array} ] \text{ Re-order buffer}$$

$$I_2 \rightarrow r_3 \text{ (Status 1)}$$

Re-order buffer  $\rightarrow$  Reg. file  $\{R_4, R_3\}$

# Types of Data Hazard

## # Hazards:

1. Hazard is a delay.
2. Delay creates stalls in the pipeline
3. Dependency problem causes hazards. These are three kinds
  - I. Structural hazards
  - II. Data hazards
  - III. Control hazards
3. Data hazard is further divided into three types based on the reader and write operation sequence-
  - (i) RAW [Read - after -write] hazard
  - (ii) WAR [write - after -Read] hazard
  - (iii) WAR [write - after -write] hazard

## Types of Data Hazard

- RAW hazard is created when the instruction 'J' tries to reads the data before instruction 'I' writes it. [True data dependency]
- WAR hazard is created when the instruction 'J' tries to write the data before instruction 'I' reads it [Anti data dependency]
- WAW hazard is created when the instruction 'J' tries to write the data before instruction 'I' writes it- [output data Dependency]

Q.

Consider the following code sequence having five instruction I1 to I5. Each of these instructions has the following format.

OP Ri, Rj, Rk

Where operation OP is performed on contents of registers Rj and Rk and the result is stored in register Ri.

- I<sub>1</sub>:** ADD R1, R2, R3
- I<sub>2</sub>:** MUL R7, R1, R3
- I<sub>3</sub>:** SUB R4, R1, R5
- I<sub>4</sub>:** ADD R3, R2, R4
- I<sub>5</sub>:** MUL R7, R8, R9

Consider the following three statements:

- S<sub>1</sub>:** There is an anti-dependence between instruction I<sub>2</sub> and I<sub>5</sub>.
- S<sub>2</sub>:** There is an anti-dependence between instructions I<sub>2</sub> and I<sub>4</sub>.
- S<sub>3</sub>:** Within an instruction pipeline an anti-dependence always creates one or more stalls.

Which one of above statements is/are correct?

[GATE-2015 (Set-3): 2 Marks]

- A** Only S1 is true
- B** Only S2 is true
- C** Only S1 and S3 are true
- D** Only S2 and S3 are true

Q

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement  $X = (S - R * (P + Q))/T$  is given below. The values of variable P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

|       |            |                           |
|-------|------------|---------------------------|
| ADD   | R5, R0, R1 | ; $R5 \leftarrow R0 + R1$ |
| MUL   | R6, R2, R5 | ; $R6 \leftarrow R2 * R5$ |
| SUB   | R5, R3, R6 | ; $R5 \leftarrow R3 - R6$ |
| DIV   | R6, R5, R4 | ; $R6 \leftarrow R5 / R4$ |
| STORE | R6, X      | ; $X \leftarrow R6$       |

P  
W

The number of Read-After-Write (RAW) dependencies, Write-After-Read (WAR) dependencies, and Write-After-Write (WAW) dependencies in the sequence of instructions are, respectively,

[GATE-2006: 2 Marks]

- (a) 2, 2, 4
- (c) 4, 2, 2

- (b) 3, 2, 3
- (d) 3, 3, 2

**THANK  
YOU!**

