

# COMPUTER SCIENCE

## Computer Organization and Architecture

### Instruction Pipelining

Lecture\_07



Vijay Agarwal sir



A graphic of a construction barrier made of orange and white striped panels, with two yellow bollards on top, positioned on the left side of the slide.

**TOPICS  
TO BE  
COVERED**

**o1**

## **Pipelining Hazards**

- ① Structural Hazards
- ② Data Hazards
- ③ Control Hazards.

# RISC PIPELINE

In The RISC Pipeline 5 Stages:

1. Instruction Fetch {IF Stage}
2. Instruction Decode{ID Stage}
3. Execute {EX Stage}
4. Memory Access {MA Stage}
5. Write Back {WB Stage}

3. **Execute {EX Stage}**: In this stage Data Processing (ALU Operations) are performed
4. **Memory Access {MA Stage}** : In this Stage Operand (Data) will be accessed from memory(load or store).
5. **Write Back {WB Stage}** : In this stage Register write (Operand storing into reg. file) operation performed.

## Additional Stages

- ❑ Fetch Instruction (FI)
  - ❖ Read the next expected Instruction into a buffer.
- ❑ Decode Instruction (DI)
  - ❖ Determine the opcode and the operand specifiers.
- ❑ Calculate operands(CO)
  - ❖ Calculate the effective address of each source operand.
  - ❖ This may involve displacement, register indirect or other forms of address calculations.
- ❑ Fetch Operands(FO)
  - ❖ Fetch each operand from memory.
  - ❖ Operands in register need not be fetched.
- ❑ Executed Instruction(EI)
  - ❖ Perform the indicated operation and store the result, if any, in the specified destination operand location
- ❑ Write Operand(WO)
  - ❖ Store the result in memory

|               | Time → |    |    |    |    |    |    |    |    |    |    |    |    |    |
|---------------|--------|----|----|----|----|----|----|----|----|----|----|----|----|----|
|               | 1      | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
| Instruction 1 | FI     | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |    |
| Instruction 2 |        | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |
| Instruction 3 |        |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |
| Instruction 4 |        |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |
| Instruction 5 |        |    |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |
| Instruction 6 |        |    |    |    |    | FI | DI | CO | FO | EI | WO |    |    |    |
| Instruction 7 |        |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |    |
| Instruction 8 |        |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |
| Instruction 9 |        |    |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |

Timing Diagram for Instruction pipeline operation

# Pipeline Hazards

Occur when the Pipeline, or some portion of the pipeline must stall Because conditions Do not permit Continued execution

There are three Types of hazards:

- Resource
- Data
- Control



Also referred to as a Pipeline bubble

In general, there are three major difficulties that cause the instruction pipeline to deviate from its normal operation.

1. *Resource conflicts* caused by access to memory by two segments at the same time. Most of these conflicts can be resolved by using separate instruction and data memories.
2. *Data dependency* conflicts arise when an instruction depends on the result of a previous instruction, but this result is not yet available.
3. *Branch difficulties* arise from branch and other instructions that change the value of PC.

## Hazards/Dependencies In the pipeline

- Dependency is a major problem in the pipeline, causes extra cycle.
- Cycle in the pipeline without new input is called as extra cycle. Also named as “Stall”.
- When stall is present in the pipeline then  $CPI \neq 1$ .
- There are 3 kinds of dependencies possible in the pipeline-
  - I. Structural dependency/ Structural Hazards
  - II. Data dependency/ Data Hazards
  - III. Control dependency/ Control Hazards

# Structural Dependency

Resource Conflict.

Solution → Resource Replication

(use Additional H/w)

Divide the Memory



# Data Dependency

I<sub>1</sub>: ADD r<sub>1</sub> r<sub>2</sub> r<sub>3</sub>;      r<sub>1</sub> ← r<sub>2</sub> + r<sub>3</sub>

I<sub>2</sub>: MUL r<sub>4</sub> r<sub>1</sub> r<sub>5</sub>;      r<sub>4</sub> ← r<sub>1</sub> \* r<sub>5</sub>



# Data Dependency

Data dependency will be occurred when the instruction "J" try to read the data before instruction "I" writes it.

## "Read-Before-write"

Ex:-

I<sub>1</sub>: Add r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub> : r<sub>0</sub> ← r<sub>1</sub> + r<sub>2</sub>

I<sub>2</sub>: MUL r<sub>3</sub>, r<sub>0</sub>, r<sub>2</sub> : r<sub>3</sub> ← r<sub>0</sub> \* r<sub>2</sub>

## Data Dependency

eg. Consider the following program segment, executed on a RISC pipeline with a operand forwarding technique.

I<sub>1</sub>: Add r0 , r1, r2

I<sub>2</sub>: Sub r3 , r0, r2

I<sub>3</sub>: MUL r4 , r3, r0

I<sub>4</sub>: DIV r5 , r4, r0

Non- Adjacent dependency

Note: (1) Adjacent data dependency is named as true data dependency i.e.

$$\begin{cases} I_2 - I_1 \\ I_3 - I_2 \\ I_4 - I_3 \end{cases}$$

## Data Dependency

- (2) Non-Adjacent data dependency is named as data dependency only.

i.e.

$$l_3 = l_1$$

$$l_4 = l_1$$

Q.

Consider a pipelined processor with the following four stages

P  
W

IF : Instruction Fetch

ID : Instruction Decode and Operand Fetch

EX : Execute

WB : Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

|     |     |     |    |                         |
|-----|-----|-----|----|-------------------------|
| ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

[GATE-2007 : 2 Marks]

- A 7
- B 8
- C 10
- D 14

Q. The performance of a pipelined processor suffer if

[GATE-2002 : 2 Marks]

- A the pipeline stages have different delays
- B consecutive instructions are dependent on each other
- C the pipeline stages share hardware resources
- D All of the above

**MCQ**

A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write Operand (WO) stage. The IF, ID, OF and WO stage take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions ,3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

**Instruction**

- I<sub>0</sub>: MUL R<sub>2</sub>, R<sub>0</sub>, R<sub>1</sub>
- I<sub>1</sub>: DIV R<sub>5</sub>, R<sub>3</sub>, R<sub>4</sub>
- I<sub>2</sub>: ADD R<sub>2</sub>, R<sub>5</sub>, R<sub>2</sub>
- I<sub>3</sub>: SUB R<sub>5</sub>, R<sub>2</sub>, R<sub>6</sub>

**Meaning of Instruction**

- R<sub>2</sub>  $\leftarrow$  R<sub>0</sub>\*R<sub>1</sub>
- R<sub>5</sub>  $\leftarrow$  R<sub>3</sub>/R<sub>4</sub>
- R<sub>2</sub>  $\leftarrow$  R<sub>5</sub> + R<sub>2</sub>
- R<sub>5</sub>  $\leftarrow$  R<sub>2</sub> - R<sub>6</sub>

A 13

B 15

B 15

D 19

[GATE-2010-CS: 2M]

**Q.**

For a pipelined CPU with a single ALU, consider the following situations

1. The  $j^{th} + 1^{st}$  instruction uses the result of the  $j^{th}$  instruction as an operand
2. The execution of a conditional jump instruction
3. The  $j^{th}$  and  $j^{th} + 1^{st}$  instructions require the ALU at the same time

Which of the above can cause a hazard?

**[GATE-2003 : 1 Marks]**

- A** 1 and 2 only
- B** 2 and 3 only
- C** 3 only
- D** All the three

**Q.**

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement  $X = (S - R * (P + Q))/T$  is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

|       |            |                           |
|-------|------------|---------------------------|
| ADD   | R5, R0, R1 | ; $R5 \leftarrow R0 + R1$ |
| MUL   | R6, R2, R5 | ; $R6 \leftarrow R2 * R5$ |
| SUB   | R5, R3, R6 | ; $R5 \leftarrow R3 - R6$ |
| DIV   | R6, R5, R4 | ; $R6 \leftarrow R5 / R4$ |
| STORE | R6, X      | ; $X \leftarrow R6$       |

The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

[GATE-2006: 2 Marks]

- A 10
- B 12
- C 14
- D 16

**Q.**

Consider the sequence of machine instructions given below:

|     |            |
|-----|------------|
| MUL | R5, R0, R1 |
| DIV | R6, R2, R3 |
| ADD | R7, R5, R6 |
| SUB | R8, R7, R4 |

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The PO state takes 1 clock cycle for ADD or SUB instruction. 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is \_\_\_\_\_.

P  
W

[GATE-2015 (Set-2): 2 Marks]

The instruction pipeline of a RISC processor has the following stages. Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.

The number of clock cycles required for completion of execution of the sequence of instructions is 219. Ans

**MCQ**

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instruction  $I_1, I_2, I_3, \dots, I_{12}$  is executed in this pipelined processor. Instruction  $I_4$  is the only branch instruction and its branch target is  $I_9$ . If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is  
[GATE-2013-CS: 2M]

A 132

B 165

C 176

D 328

## Type of Data Dependency.

- ① RAW (Read After Write) / True Data Dependency  
I<sub>j</sub>      I<sub>i</sub>
- ② WAR (Write After Read) / ANTI Dependency  
Name[False Dep.]
- ③ WAW (Write After Write) / output | Write Dependency.

Note

These words are actual Execution Sequence. if Not follow in that order then Problem Will be Created.

卷之三

RAW (Read After write)



## Types of Data Hazard

- Read after write (RAW), or true dependency
  - ❖ An instruction modifies a register or memory location.
  - ❖ Succeeding instruction reads data in memory or register location.
  - ✓ Hazard occurs if the read takes place before write operation is complete.

- Write after read (WAR), or antidependency
  - ❖ An instruction reads a register or memory location.
  - ❖ Succeeding instruction writes to the location.
  - ✓ Hazard occurs if the write operation completes before the read operation takes place.

- Write after write (WAW), or Write/output dependency
  - ❖ Two instruction both write to the same location.
  - ✓ Hazard occurs if the write operations take place in the reverse order of the intended sequence.

# Types of Data Hazard

## ① RAW (Read After Write) | True Data Dependency.

$$\begin{array}{ll} I_1 & I_i(I_i) \quad R_0 \leftarrow R_1 + R_2 \\ @ & \text{Magnifying glass icon} \\ I_2 & I_j(I_{i+1}) \quad R_3 \leftarrow R_0 + R_4 \end{array}$$

When the Next Instruction ( $I_2, I_j, \otimes I_{i+1}$ ) tries to use the result of Previous Instruction ( $I_1, I_i \otimes I_i$ ) as a operand. (if Not follow the sequence  $I_j \text{ then } I_i$ ) then Problem Will be created.

Logn → [ Hardware Interlock | Operand Forwarding | By bypassing | short Circuit ]

# Types of Data Hazard

## ② WAR (Write After Read) | ANTI Dependency.



When Next Instruction tries to Write the Data,  
Before Previous Inst<sup>(I<sub>i</sub>)</sup> (I<sub>i</sub>) Read it

Now R<sub>1</sub> value Read by I<sub>i</sub> is Incorrect Value.

Why? Assume Suppose I<sub>1</sub> takes More & More time as Compare to I<sub>2</sub>. Then I<sub>2</sub> execute first & update R<sub>1</sub> value

Initial R<sub>1</sub> = 500

Any How But if I<sub>2</sub> execute (work)  
it Update the Value of Register R<sub>1</sub>.

$R_1 = 10 \text{ By } I_2$

# Types of Data Hazard

## ② WAR (Write After Read) | ANTI Dependency.



Here  $I_2$  tries to Update the Value of Register  $R_1$ , before Read this Value ( $R_1$ ) By  $I_1$ . So  $I_1$  Read Incorrect Value.

Why? Assume Suppose  $I_1$  takes More & More time as Compare to  $I_2$ . Then  $I_2$  execute first & update  $R_1$  value

Initial  $R_1 = 500$

Any How But if  $I_2$  execute (work) it update the Value of Register  $R_1$ .

$R_1 = 10$  By  $I_2$ .

# Types of Data Hazard

## ③ WAW [Write After Write] | OUTPUT | write Dependency

$I_i \quad I_i \quad T_i : R_2 \leftarrow R_1 * R_0$

$I_{i+1} \quad I_j \quad \underline{T_2} : R_2 \leftarrow R_3 + R_4$

Assume  $I_i$  taking more & more time compare to  $I_2$  so  $I_2$  execute firstly so if update the Register Value  $R_2$

When the Next Instruction ( $I_2 | I_j | I_{i+1}$ ) tries to write the value, before the Previous Instn ( $I_i | I_i$ ) write it.

# Types of Data Hazard

## ③ WAW [Write After Write] | OUTPUT | write Dependency

$$\begin{array}{l} \text{I}_i \quad \text{I}_j \quad \text{I}_1 : \\ \text{R}_2 \leftarrow \text{R}_1 * \text{R}_0 \\ \text{I}_{i+1} \quad \text{I}_j \quad \underline{\text{I}_2} : \\ \text{R}_2 \leftarrow \text{R}_3 + \text{R}_4 \end{array}$$

Assume  $\text{I}_1$  taking more & more time compare to  $\text{I}_2$  so  $\text{I}_2$  execute firstly so it update the register value  $\text{R}_2$ .

Here  $\text{I}_2$  is executed before the  $\text{I}_1$  so  $\text{I}_2$  update the value

of Register  $\text{R}_2$  before  $\text{I}_1$  updation.

So Incorrect (Wrong Updation)

$$\text{I}_1: \text{R}_1 = 50,000$$

$$\text{I}_2: \text{R}_2 = 25 \text{ lakh}$$

But  $\text{I}_2$  then  $\text{I}_1$   
then  $50,000$

25 lakh

## Types of Data Hazard



ANTI & OUTPUT Dependency does not Create Any Stalls But in ANTI & OUTPUT we are getting the Problem @ Not Reading | Updating the Correct Value .

The Solution of ANTI & OUTPUT Dependency  
is 'Register Renaming'.

Register Renaming: Use Some Temporary Storage Register  
to Store the Result of the Out of Order  
Instruction output.

## ⑧ WHEN & HOW ANT $\Sigma$ Def. & OUTPUT Def. Create Problem ?

(Sofn) (i) If the Previous Instruction taking Max & More time (very High time) Compare to Next Instruction. [UnEven Delay] (multibit)

(ii) When there is Data Dependency between 2 Instruction

then to Minimize the Stalls . We trying to Reschedule the Instruction [out-of-order Instruction execution]

(iii) In Super Scalar System , Multiple Instruction Executed in the Stages So Next Instn if Read|write the Value that was Used in Previous Instn then Create a Problem

$$I_1: \underline{\underline{y_0}} \leftarrow y_1 * y_2$$

$$I_2: \underline{\underline{y_3}} \leftarrow \underline{\underline{y_0}} + y_4$$

$$I_3: \underline{\underline{y_3}} \leftarrow y_6 + y_7$$

$I_4:$

$I_5:$

After  $I_1$  &  $I_2$

When  $I_2$  execute  
in  $y_3$

Wrong Updation

Super Scalar System

$IF(I_1)$   
 $IF(I_2)$   
 $IF(I_3)$

$IF$   
Stage

$I_1$   
 $\underline{\underline{I_2}}$   
 $I_3$

$DD$   
Stage

$I_1$   
 $I_3$

EX  
Stage

$I_1$   
 $I_2$

MA

=

WB

$I_2$  Data Demand On  $I_1$ ,  $I_2$  will wait until to become available

$T_1$   
 $T_2$   
 $T_3$   
 $T_4$

Sequential Order  
[Inorder execution].

# Types of Data Hazard

- CPU always execute a sequence the program from top to bottom in a sequence called as “in-order” execution.
- In this execution sequence if any instruction is dependent then the remaining instruction are also sharing the stall cycles even, they are independent.



*Rescheduled  
Out of order execution*

- I<sub>1</sub>
- I<sub>3</sub>
- I<sub>4</sub>
- I<sub>2</sub>

④ Stalls



## Types of Data Hazard

- In the above execution sequence ' $I_2$ ' is data dependent on " $I_1$ " So,  $I_2$  will be waiting until the ' $I_1$ ' execution is completed when the hardware does not support operand forwarding.  
This waiting creates stall in the pipeline this stall are also shared by the  $I_3$  and  $I_4$  instruction even they are independent.
- To utilize this point instruction scheduling concept states that insert the independent instruction first.  
It causes out of order execution {Re-order execution}.
- Out of order execution sequence is  $I_1 - I_3 - I_4 - I_2$
- Out of order execution create two more dependencies in the pipeline ..
  - (i) Anti data dependency
  - (ii) output/Write data dependency

# Types of Data Hazard



$$I_1 - I_3 - I_4 - I_2$$

$I_1$  : Add  $\gamma_0 \gamma_1 \gamma_2$ ;  $\gamma_0 \in \gamma_1 + \gamma_2$

$I_3$  : MUL  $\gamma_u \gamma_v \gamma_w \gamma_x$ ;  ~~$\gamma_u \in \gamma_v * \gamma_w$~~

$I_4$  : DIV  $\gamma_3 \gamma_7 \gamma_8$ ;  $\gamma_3 = \gamma_7 / \gamma_8$

$I_2$  : SUB  $\gamma_3 \gamma_0 \gamma_4$ ;  ~~$\gamma_3 \in \gamma_0 - \gamma_4$~~

① ANTI Dep  $(I_3 - I_2)$

② OUTPUT Dep.  $(I_4 - I_2)$

## Types of Data Hazard

Anti Dependency will be occurred when the instruction 'J' try to write the data before instruction 'I' reads it.

Example - I<sub>3</sub> executed before I<sub>2</sub> so I<sub>3</sub> modified the register (r<sub>4</sub>) before I<sub>2</sub> reads It, therefore I<sub>2</sub> incorrectly read the new value from (r<sub>4</sub>) [Data loss].

Output dependency will be occurred when the instruction "J" tired to write the data before instruction 'I' writes.

Ex- I<sub>4</sub> executes before I<sub>2</sub> so, I<sub>4</sub> modifies the register (r<sub>3</sub>) before I<sub>2</sub> writes it therefore destination is incorrectly updated with old value [data loss].

## Types of Data Hazard

Soln

Note: To handle the above dependency problem hardware technique is used i.e., register renaming.

This technique states that, use the re-order buffer to store the out-of-order instruction output later update the register file with a re-order buffer content after the completion of a dependent instruction therefore data is safe.



# Types of Data Hazard

# Register Re-naming:

Execution

$$I_1 \rightarrow r_0$$

$$\begin{array}{l} I_3 \rightarrow \\ I_4 \rightarrow \end{array}$$

Re-order buffer

(Temp. Register)

$$I_2 \rightarrow r_3 \text{ (Status 1)}$$



# Types of Data Hazard

## # Hazards:

1. Hazard is a delay.
2. Delay creates stalls in the pipeline
3. Dependency problem causes hazards. These are three kinds
  - I. Structural hazards
  - II. Data hazards
  - III. Control hazards
3. Data hazard is further divided into three types based on the reader and write operation sequence-
  - (i) RAW [Read - after -write] hazard
  - (ii) WAR [write - after -Read] hazard
  - (iii) WAR [write - after -write] hazard

## Types of Data Hazard

- RAW hazard is created when the instruction 'J' tries to reads the data before instruction 'I' writes it. [True data dependency]
- WAR hazard is created when the instruction 'J' tries to write the data before instruction 'I' reads it [Anti data dependency]
- WAW hazard is created when the instruction 'J' tries to write the data before instruction 'I' writes it- [output data Dependency]

Q. Consider the following code sequence having five instruction I<sub>1</sub> to I<sub>5</sub>. Each of these instructions has the following format.

$$\text{OP } R_i, R_j, R_k \quad R_i \in R_j \odot R_k$$

Where operation OP is performed on contents of registers R<sub>j</sub> and R<sub>k</sub> and the result is stored in register R<sub>i</sub>.

- |                  |                        |
|------------------|------------------------|
| I <sub>1</sub> : | ADD R1, R2, R3         |
| I <sub>2</sub> : | MUL <u>R7</u> , R1, R3 |
| I <sub>3</sub> : | SUB R4, R1, R5         |
| I <sub>4</sub> : | ADD <u>R3</u> , R2, R4 |
| I <sub>5</sub> : | MUL <u>R7</u> , R8, R9 |

$S_1 \times \text{ANTI}(I_2 - I_5)$   
 $S_2 \checkmark \text{ANTI}(I_2 - I_4)$

Consider the following three statements:

**S<sub>1</sub>:** There is an anti-dependence between instruction I<sub>2</sub> and I<sub>5</sub>.

**S<sub>2</sub>:** There is an anti-dependence between instructions I<sub>2</sub> and I<sub>4</sub>.

**S<sub>3</sub>:** Within an instruction pipeline an anti-dependence always creates one or more stalls.

Which one of above statements is/are correct?

[GATE-2015 (Set-3): 2 Marks]

- A Only S1 is true
- B Only S2 is true
- C Only S1 and S3 are true
- D Only S2 and S3 are true

Q

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement  $X = (S - R * (P + Q))/T$  is given below. The values of variable P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

|       |            |                           |
|-------|------------|---------------------------|
| ADD   | R5, R0, R1 | ; $R5 \leftarrow R0 + R1$ |
| MUL   | R6, R2, R5 | ; $R6 \leftarrow R2 * R5$ |
| SUB   | R5, R3, R6 | ; $R5 \leftarrow R3 - R6$ |
| DIV   | R6, R5, R4 | ; $R6 \leftarrow R5 / R4$ |
| STORE | R6, X      | ; $X \leftarrow R6$       |

P  
W

The number of Read-After-Write (RAW) dependencies, Write-After-Read (WAR) dependencies, and Write-After-Write (WAW) dependencies in the sequence of instructions are, respectively,

[GATE-2006: 2 Marks]

- (a) 2, 2, 4
- (c) 4, 2, 2
- (b) 3, 2, 3
- (d) 3, 3, 2

# Control-Dependency.

Control Dependency is created in the Pipeline Due to

- ✓ Branch Instruction
- ✓ Program Control
- ✓ function call
- ✓ Transfer of Control Instn (TOC Instn)

Unconditional TOC

Conditional TOC .

## Control Hazard

- Also known as a branch hazard
- Occurs when the pipeline makes the wrong decision on a branch prediction.
- Brings instructions into the pipeline that must subsequently be discarded
- Dealing with Branches:
  - ❖ Multiple Streams
  - ❖ Prefetch branch target
  - ❖ Loop buffer
  - ❖ Branch prediction
  - ❖ Delayed branch

# Control Dependency:

Assume 5 Stage Pipeline &  $I_2$  is the Branch Instruction. & Assume the target address of the Branch Instruction is available at the end of 'LAST Stage' (5 Stage)



$I_3$   $I_4$   $I_5$  &  $I_6$  is unwanted Instn.  
BT<sub>1</sub> is the wanted Instn

# Control Dependency :

Assume 5 Stage Pipeline &  $I_2$  is the Branch Instruction. & Assume the target address of the Branch Instruction is available at the end of LAST stage (5th stage)



# Control Dependency:

Assume 5 Stage Pipeline &  $I_2$  is the Branch Instruction. 4 Assume the target address of the Branch Instruction is available at the end of 4th Stage.



# Control Dependency :

Assume 5 Stage Pipeline &  $I_2$  is the Branch Instruction. Assume the target address of the Branch Instruction is available at the end of 3rd Stage:

1000  $I_1$

1001  $I_2$  JMP 2000

1002  $I_3$

1003  $I_4$

1004  $I_5$

1005  $I_6$

1006  $I_7$

2000  $BT_1$   
2001  $BT_2$

(WB)  $S_5$

(MA)  $S_4$

(EX)  $S_3$

(ID)  $S_2$

(IF)  $S_1$



2STALL

# Control Dependency:

Assume 5 Stage Pipeline &  $I_2$  is the Branch Instruction. Assume the target address of the Branch Instruction is available at the end of 2<sup>nd</sup> Stage.



# Control Dependency :

Assume 5 Stage Pipeline &  $I_2$  is the Branch Instruction. & Assume the target address of the Branch Instruction is available at the end of 'K Stage'.

1000  $I_1$

1001  $I_2$  [JMP 2000]

1002  $I_3$

1003  $I_4$

1004  $I_5$

1005  $I_6$

1006  $I_7$

2000  $BT_1$

2001  $BT_2$

If the Target Address of Branch Instruction  
is available at ' $\underline{k^{\text{th}}}$ ' stages

$$\# \text{Stalls from Branch Inst} = k - 1$$

## Control Dependency/Control Hazards

generally

Avg ET

Ideal  $CPT = \perp$

But Due to Control Hazard | Control Deb.

$$\text{Avg Inst}^{\text{Hazard}}_{\text{ET}} = \perp + \frac{\# \text{Inst}^{\text{Hazard}} \text{ which Create Stalls}}{\# \text{Stalls from Branch}}$$

$$\text{Avg Inst}^{\text{Hazard}}_{\text{ET}} = 1 + \left( \frac{\text{Branch Inst}^{\text{Hazard}}}{\text{Frequency}} \times \# \text{Stalls Due to Due Inst}^{\text{Hazard}} \right)$$

## Control Dependency :

- Control Dependency is created in the pipeline Due to execution of Branch (TOC Instn) So Program execution will be changed
- In this the Next Sequential Instruction become Wanted @ unwanted, its Depends on the output Previous Instruction decoding.
- If the Previous Instruction decode the Instruction as Simple Instn ( Arithmetic , Data transfer etc ) then the Next Instruction become Wanted Instruction
- If the Previous Instn decide the Instruction as Branch Instn . Unconditional TOC , Conditional TOC with True then Next Seq. Instn become Unwanted.

# Control Dependency:



ID Decoder as Simple (ALU, Data  
then Branch Inst)  
then I<sub>2</sub> is wanted

When I<sub>1</sub> execute then  
I<sub>2</sub> is wanted Instruction

But When I<sub>2</sub> execute (Decode) then  
I<sub>3</sub>, I<sub>4</sub>... is a Unwanted Instruction

Because I<sub>2</sub> Decode as Uncondition Toc (JMP 2000)  
so BT<sub>1</sub> is the Now Wanted Instruction.

## Control Dependency :

- In the pipeline overlapping Execution Happen, that means before Decoding the Previous Inst<sup>n</sup>. Next Sequential Inst<sup>n</sup> Inserted into Pipeline
- If the Next Instruction is Unwanted, then we have to Remove / Flush the Unwanted Inst<sup>n</sup> from the Pipe .

The Process of Removing the Unwanted Instruction is Called Flush / Freeze .

## Control Dependency :

- The Number of Stalls Created Due to Branch Operation is called Branch Penalty.
- This Branch Penalty Value Depends on the outcome of the target Address.

(Note) In the RISC Pipeline Branch Penalty Value is '1' Because Target Address is available at the end of 2<sup>nd</sup> Stage.

$$\text{Branch Penalty} = K - 1 \Rightarrow 2 - 1 = 1$$

# Control Dependency :

Note

Generally Target address is available in decoding stage|phase but Condition checking [Evaluation] done in execution Phase|Stage.

Note

If Target is available at 'k' Stage.

$$\text{Branch Penalty} = k - l$$

$$\text{Branch Penalty} = \left( \text{At what stage Target Address Available} \right) - l$$

# Control Dependency

$$\text{BranchInst}^n = \frac{\# \text{Branch}}{\text{Freq. Inst}^n}$$

Avg  $CPI = 1$

But Due to Control Hazards

$$CPI = 1 + \left( \frac{\text{BranchInst}^n \times \# \text{Stalls Due to Branch}}{\text{Freq.}} \right)$$

# Control Dependency :

Q. Suppose we have 100 Instruction. Out of 100 Instruction 30 Instr<sup>n</sup> are Branch type, which Create 3STALL Cycle then Avg CPI ?

$$\frac{\text{Total # Stalls}}{100 \text{ Instn}} = \frac{\text{Number of Branch Instn}}{\text{Branch Instn}} \times \frac{\# \text{ Stalls Due to Branch Instn}}{\text{Branch Instn}}$$

100 Instn

$$ET = 100 \times L = 100$$

$$\Rightarrow 30 \times 3$$

$$\# \text{ Stalls} = 90$$

$$\begin{aligned}\text{Total for 100 Instn} &= 100 + 90 \\ &= 190\end{aligned}$$

$$\text{Avg CPI} = \frac{190}{100} = 1.9 \text{ Avg}$$

# Control Dependency :

Q. Suppose we have 100 Instruction. Out of 100 Instruction 30 Instr<sup>n</sup> are Branch type, which Create 3STALL Cycle then Avg CPI ?

$$\frac{\text{Avg Instn}}{\text{ET}} = 1 + \left( \frac{\text{Branch Instn}}{\text{frequency}} \times \frac{\# \text{Stalls Due}}{\text{to Branch Instn}} \right)$$

$$\begin{aligned}\frac{\text{Avg Instn}}{\text{ET}} &= 1 + ( \cdot 30 \times 3 ) \\ &\Rightarrow 1 + 0.9 \\ &= 1.9 \underline{\text{Avg}}\end{aligned}$$

100  $\Rightarrow$  30 Branch  
Instn  
30.1. Branch  
3STALL's

# Control Dependency :

Q Suppose we have LOO Instruction. Out of LOO Instruction  $\frac{30 \text{ Instr}}{\text{Insr}}$  are Branch type, which Create 3STALL Cycle then Avg CPI ?



$$\# \text{Stalls/Instn} = .70 \times 0 + .30 \times 3$$

$$\# \text{Stall/Instr} = 0.9$$

$$\text{Avg Instn ET} = [1 + \# \text{Stalls/Instn}]$$

$$\Rightarrow [1 + 0.9]$$

$$\text{Avg Instn ET} = 1.9$$

Avg

**THANK  
YOU!**

