

# COMPUTER SCIENCE

## Computer Organization and Architecture

### Instruction Pipelining

Lecture\_06



Vijay Agarwal sir



A graphic of a construction barrier made of orange and white striped panels, with two yellow bollards on top, positioned on the left side of the slide.

**TOPICS  
TO BE  
COVERED**

**o1**

## **Pipelining Hazards**

① Structural Hazards

② Data Hazards

③ Control Hazards

# RISC PIPELINE

In The RISC Pipeline 5 Stages:

1. Instruction Fetch {IF Stage}
2. Instruction Decode{ID Stage}
3. Execute {EX Stage}
4. Memory Access {MA Stage}
5. Write Back {WB Stage}

1. **Instruction Fetch {IF Stage}:** In this stage Instruction is fetched from Memory.
2. **Instruction Decode{ID Stage} :** In this Stage 2 operation are performed:
  - (i) Decode the instruction
  - (ii) Operand loading(fetching) from the register file .  
This stage also contain comparator circuit to evaluate the branch condition.

3. **Execute {EX Stage}**: In this stage Data Processing (ALU Operations) are performed
4. **Memory Access {MA Stage}** : In this Stage Operand (Data) will be accessed from memory(load or store).
5. **Write Back {WB Stage}** : In this stage Register write (Operand storing into reg. file) operation performed.

Stages can varies given in the Question

4 Stage

5 Stage

6 Stage

etc.

## Additional Stages

- ❑ Fetch Instruction (FI)
  - ❖ Read the next expected Instruction into a buffer.
- ❑ Decode Instruction (DI)
  - ❖ Determine the opcode and the operand specifiers.
- ❑ Calculate operands(CO)
  - ❖ Calculate the effective address of each source operand.
  - ❖ This may involve displacement, register indirect or other forms of address calculations.
- ❑ Fetch Operands(FO)
  - ❖ Fetch each operand from memory.
  - ❖ Operands in register need not be fetched.
- ❑ Executed Instruction(EI)
  - ❖ Perform the indicated operation and store the result, if any, in the specified destination operand location
- ❑ Write Operand(WO)
  - ❖ Store the result in memory

|               | Time → |    |    |    |    |    |    |    |    |    |    |    |    |    |
|---------------|--------|----|----|----|----|----|----|----|----|----|----|----|----|----|
|               | 1      | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
| Instruction 1 | FI     | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |    |
| Instruction 2 |        | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |
| Instruction 3 |        |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |
| Instruction 4 |        |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |
| Instruction 5 |        |    |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |
| Instruction 6 |        |    |    |    |    | FI | DI | CO | FO | EI | WO |    |    |    |
| Instruction 7 |        |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |    |
| Instruction 8 |        |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |
| Instruction 9 |        |    |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |

Timing Diagram for Instruction pipeline operation

# Pipeline Hazards

Occur when the Pipeline, or some portion of the pipeline must stall Because conditions Do not permit Continued execution

There are three Types of hazards:

- Resource
- Data
- Control



Also referred to as a Pipeline bubble

In general, there are three major difficulties that cause the instruction pipeline to deviate from its normal operation.

1. *Resource conflicts* caused by access to memory by two segments at the same time. Most of these conflicts can be resolved by using separate instruction and data memories.
2. *Data dependency* conflicts arise when an instruction depends on the result of a previous instruction, but this result is not yet available.
3. *Branch difficulties* arise from branch and other instructions that change the value of PC.

## Hazards/Dependencies In the pipeline

- Dependency is a major problem in the pipeline, causes extra cycle.
- Cycle in the pipeline without new input is called as extra cycle. Also named as “Stall”.
- When stall is present in the pipeline then  $CPI \neq 1$ .
- There are 3 kinds of dependencies possible in the pipeline-
  - I. Structural dependency/ Structural Hazards
  - II. Data dependency/ Data Hazards
  - III. Control dependency/ Control Hazards

# Structural Dependency

↓  
Created Due to 'Resource Conflict'.

Resource: May be Registers, Memory or  
Functional Unit.

# Structural Dependency

## Soln of Structure Def.

- ↳ ① Resource Replication  
(Use Additional Resource)
  - ↳ ② Divide the Memory
- 
- The diagram illustrates the division of memory into two main components: CM (Code Memory) and DM (DATA Memory). CM is further divided into Code and Data sections. Arrows point from the text labels to their corresponding parts in the diagram.
- ```
graph LR; A[② Divide the Memory] --> B[CM | DM]; B --> C[Code]; B --> D[Data]
```

# Data Dependency

$I_1 : \text{ADD } r_1 \ r_2 \ r_3 ; \quad r_1 \leftarrow r_2 + r_3$

$I_2 : \text{MUL } r_4 \ r_1 \ r_5 ; \quad r_4 \leftarrow r_1 * r_5$

when Next Instn ( $I_j$ ) using the Result of Previous Instn ( $I_i$ ), as a operand.

## Data Dependency

- ① → RAW Data Dep. (True Data Dep.)
- ② → ANTI Dep. (WAR Dep.)
- ③ ↘ OUTPUT/WRITE Dep. (WAW Dep.)

## Data Hazard

A data hazards is a situation in which the pipeline is stalled because the data to be operated on are delayed for some reason. Or Any condition that cause the pipeline to stalls is called as hazards.



## Data Dependency

eg. Consider the following program segment, executed on a RISC pipeline with a

I<sub>1</sub>: Add r0, r1, r2       $r_0 \leftarrow r_1 + r_2$   
I<sub>2</sub>: Sub r3, r0, r2  
I<sub>3</sub>: MUL r4, r3, r0  
I<sub>4</sub>: DIV r5, r4, r0

Non- Adjacent dependency

Note: (1) Adjacent data dependency is named as true data dependency i.e.

$$\begin{cases} I_2 - I_1 \\ I_3 - I_2 \\ I_4 - I_3 \end{cases}$$

## Data Dependency

- (2) Non-Adjacent data dependency is named as data dependency only.

i.e.

$$l_3 = l_1$$

$$l_4 = l_1$$

W: Write  
R: Read

## Data Dep.

- ① RAW Dep. (DATA Dep.) | Flow Dep.
  - ② WAR Dep. (ANTI Dep)
  - ③ WAW Dep. (OUTPUT | write Dep.)
- } Name & Folge Dep.

## Data/RAW Dependency

### RAW [Read After Write]

Data dependency: creates in the pipeline when an instruction I<sub>j</sub> (Next Instruction) depends on the result of a I<sub>i</sub> (previous instruction),.

$$\begin{array}{ll} I_1: \text{Add } r0, r1, r2 & I_i \\ I_2: \text{Sub } r3, r0, r2 & I_j \end{array}$$

$r_0 \leftarrow r_1 + r_2$

$r_3 \leftarrow \underline{r_0} * r_2$



### RAW [Read After. Write]

## 1

# Data Dependency / RAW Dependency

① Read after write (RAW), or Data dependency

- ❖ An instruction modifies a register or memory location.
- ❖ Succeeding instruction reads data in memory or register location

I<sub>1</sub>: Add r0, r1, r2

I<sub>2</sub>: Sub r3, r0, r2

$$\begin{array}{ll} I_i & \textcircled{r_0} \in r_1 + r_2 \\ I_j & r_3 \in \underline{\textcircled{r_0}} * r_2 \end{array}$$

RAW [Read After Write]

## 2. Anti Dependency / WAR Dependency

②

Write after read (WAR), or anti dependency :

- ❖ An instruction reads a register or memory location.
- ❖ Succeeding instruction writes to the location.



Uneven Delay = WAR [Write After Read]

### 3. Output Dependency / WAW Dependency

- ③ Write after write (WAW), or Write/output dependency:
- ❖ Two instruction both write to the same location.

I<sub>1</sub>: MUL r3, r1, r2  
I<sub>2</sub>: ADD r3, r4, r5

$\stackrel{\mathcal{D}_i}{\mathcal{I}_1} @ \stackrel{\mathcal{D}_j}{\mathcal{I}_2} : \begin{array}{l} \text{r3} \leftarrow r_1 * r_2 \\ \text{r3} \leftarrow r_4 + r_5 \end{array}$

WAW [Write After Write].

## Types of Data Hazard

- Read after write (RAW), or true dependency
  - ❖ An instruction modifies a register or memory location.
  - ❖ Succeeding instruction reads data in memory or register location.
  - ❖ Hazard occurs if the read takes place before write operation is complete.
- Write after read (WAR), or antidependency
  - ❖ An instruction reads a register or memory location.
  - ❖ Succeeding instruction writes to the location.
  - ❖ Hazard occurs if the write operation completes before the read operation takes place.
- Write after write (WAW), or Write/output dependency
  - ❖ Two instruction both write to the same location.
  - ❖ Hazard occurs if the write operations take place in the reverse order of the intended sequence.

## Data/RAW Dependency

***Data dependency:*** creates in the pipeline when an instruction  $I_j$  (Next Instruction) depends on the result of a  $I_i$  (previous instruction),.

$I_1$ : Add r0, r1, r2

$I_2$ : Sub r3, r0, r2

## Data Hazard

A data hazard is a situation in which the pipeline is stalled because the data to be operated on are delayed for some reason.



Data Dep.

# Data Dependency & Hazards

5stage (IF, ID, EX)  
MEM, WB

P  
W

eg1.

Consider the following program segment, executed on a RISC pipeline:

Q.1 How Many RAW(Data Dependency ) Dependency?

$I_1: Add r0, r1, r2$   
 $I_2: Sub r3, r0, r2$   
 $I_3: MUL r4, r0, r5$   
 $I_4: DIV r0, r6, r7$   
 $I_5: MUL r8, r0, r9$

Def.

$I_2 - I_1$

$I_3 - I_1$

$I_5 - I_4$

$I_1 - I_2$

$I_1 - I_3$

$I_4 - I_5$

$I_5 - I_1$  X

Q. How to check RAW Deb. Creates Hazard  
Or Not ?



DBMS  $\Rightarrow$  RR No Problem  
Not Conflict Free

OS  $\Rightarrow$  Reader ✓ Reader ✓

OA: Read & Read No Problem

# Data Dependency & Hazards

eg2.

Consider the following program segment, executed on a RISC pipeline:

Q.1 Check Here RAW(Data Dependency ) Dependency Create RAW Hazards?

$$\cancel{r_0} = r_1 + r_2$$

I<sub>1</sub>: Add r0, r1, r2

I<sub>2</sub>: Sub r3, r4, r5

I<sub>3</sub>: MUL r6, r0, r7

I<sub>4</sub>: FAdd r7, r5, r8

I<sub>5</sub>: ADD r9, r0, r10

RAW Deb. : 2 Data Def.

\* I<sub>1</sub> - I<sub>3</sub>      @ I<sub>3</sub> - I<sub>1</sub>  
 \* I<sub>1</sub> - I<sub>5</sub>      @ I<sub>5</sub> - I<sub>1</sub>

RAW Hazard: I<sub>3</sub> - I<sub>1</sub>

Q How to CHECK RAW Def. Create  
RAW Hazards @ Not ?

Soln Pipeline Execution then Check if Result  
is Required earlier @ Not by Next Instruction.

P  
W

~~To available After 05~~



Dependency may @ May Creates Hazards.

# Data Dependency & Hazards

eg3.

Consider the following program segment, executed on a RISC pipeline:

Q.1 How Many RAW(Data Dependency) ?

Q.2 How Many RAW Hazards ?

for i = 0 to n

I<sub>1</sub>: Add r0, r1, r2

I<sub>2</sub>: Sub r3, r0, r4

I<sub>3</sub>: MUL r5, r0, r6

I<sub>4</sub>: XOR r7, r0, r8

I<sub>5</sub>: MUL r9, r0, r10

Data(RAW) Deb.

I<sub>2</sub> - I<sub>1</sub>

I<sub>3</sub> - I<sub>1</sub>

I<sub>4</sub> - I<sub>1</sub>

I<sub>5</sub> - I<sub>1</sub>

4 RAW Deb.

3 RAW Hazards

- $I_1$ : Add r0, r1, r2  
 $I_2$ : Sub r3, r0, r4  
 $I_3$ : MUL r5, r0, r6  
 $I_4$ : XOR r7, r0, r8  
 $I_5$ : MUL r9, r0, r10

After CC5  $\Rightarrow$   $r0$  Available

|       | CC1 | CC2 | CC3 | CC4 | CC5 | CC6 | CC7 | CC8 | CC9 | CC10 | CC11 | CC12 | CC13 |
|-------|-----|-----|-----|-----|-----|-----|-----|-----|-----|------|------|------|------|
| $I_1$ | IF  | ID  | EX  | MA  | WB  |     |     |     |     |      |      |      |      |
| $I_2$ | IF  | IF  | EX  | MA  | WB  |     |     |     |     |      |      |      |      |
| $I_3$ | IF  | IF  | EX  | MA  | WB  |     |     |     |     |      |      |      |      |
| $I_4$ | IF  | IF  | EX  | MA  | WB  |     |     |     |     |      |      |      |      |
| $I_5$ | IF  | ID  | EX  | MA  | WB  |     |     |     |     |      |      |      |      |

4 RAW Dependency  
3 RAW Hazards

Q

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement  $X = (S - R * (P + Q))/T$  is given below. The values of variable P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

|       |            |
|-------|------------|
| ADD   | R5, R0, R1 |
| MUL   | R6, R2, R5 |
| SUB   | R5, R3, R6 |
| DIV   | R6, R5, R4 |
| STORE | R6, X      |



RAW Dep. 4 ( $I_2 - I_1, I_3 - I_2, I_4 - I_3, I_5 - I_4$ )

WAR Dep : 2 ( $I_3 - I_2, I_4 - I_3$ )

WAW Dep : 2 ( $I_3 - I_1, I_4 - I_2$ )

The number of Read-After-Write (RAW) dependencies, Write-After-Read (WAR) dependencies, and Write-After-Write (WAW) dependencies in the sequence of instructions are, respectively,

[GATE-2006: 2 Marks]

- (a) 2, 2, 4
- (c) 4, 2, 2
- (b) 3, 2, 3
- (d) 3, 3, 2

# Data Dependency.

# Data Dependency

Data dependency will be occurred when the instruction "J" try to read the  
data before instruction "I" writes it.

## "Read-Before-write"

Ex:-

I<sub>1</sub>: Add r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub> : r<sub>0</sub> ← r<sub>1</sub> + r<sub>2</sub>

I<sub>2</sub>: MUL r<sub>3</sub>, r<sub>0</sub>, r<sub>2</sub> : r<sub>3</sub> ← r<sub>0</sub> \* r<sub>2</sub>

## Data Dependency

- When the above instruction are executed in a Non-pipelined system then data dependency condition does not occur because " $I_2$ " executing after ' $I_1$ '  
So,  $I_2$  reads the register ( $r_o$ ) data after  $I_1$  writes it [Read-After- write].

In Non pipeline : Non overlapping Execution  
No Problem in Data Dep.

## Data Dependency

- When the above instruction are executed in a pipeline system then data dependency condition will be occurred because  $I_2$  will be executing along with  $I_1$  so  $I_2$  reads the  $(r_0)$  data before  $I_1$  writes it. Therefore,  $I_1$  incorrectly get the old value from  $(r_0)$  [Data loss].

|       | CC1 | CC2 | CC3 | CC4 |
|-------|-----|-----|-----|-----|
| $I_1$ | IF  | ID  | EX  |     |
| $I_2$ |     | IF  | ID  |     |
| $I_3$ |     |     |     |     |
| $I_4$ |     |     |     |     |

Read - Before write  
Old (wrong) Data Access {Data loss}

# Data Dependency

- To handle above problem, some logic will be maintained in the “ID” stage of pipeline to stop the accessing of a dependent data.  
 Structure of a logic is as follows: **“TOMASULO ALGO”**

| S/No           | Function Unit | Destination    | Independent Source 1 | Independent Source 2 | Dependent Source 1               | Dependent Source 2 | Status | Reg. file access                                                        |
|----------------|---------------|----------------|----------------------|----------------------|----------------------------------|--------------------|--------|-------------------------------------------------------------------------|
| I <sub>1</sub> | ADD           | R <sub>0</sub> | r <sub>1</sub>       | r <sub>2</sub>       | -                                | -                  | 0      | r <sub>1</sub> = value<br>r <sub>2</sub> = value<br>EX- Stage Allocated |
| I <sub>2</sub> | MUL           | r <sub>3</sub> | -                    | r <sub>2</sub>       | r <sub>0</sub> (I <sub>1</sub> ) | -                  | 0      | r <sub>2</sub> = value<br>EX- Stage not allocated                       |

“0” execution yet not complete  
 “1” execution complete

Dependency ?

Domain & Range

## Data Dependency

- By using the above logic, we need to stop the access of dependent data unit the data becomes available.  
  
This waiting creates Extra Cycle/Bubble/stall in the pipeline that is described below :

# Topic : Dependencies In the pipeline

|                | CC1 | CC2                                                          | CC3 | CC4 | CC5                                        | CC6 | CC7 |
|----------------|-----|--------------------------------------------------------------|-----|-----|--------------------------------------------|-----|-----|
| I <sub>1</sub> | IF  | ID                                                           | EX  | MA  | WB<br>$r_0 = \text{updated}$<br>Status = 1 |     |     |
| I <sub>2</sub> | IF  | ID<br>S: $r_0, r_2$<br>(D) ( $I_0$ )<br>$r_2 = \text{value}$ | ID  | ID  | ID<br>$r_0 = \text{value}$                 | EX  |     |
| I <sub>3</sub> |     | IF                                                           | IF  | IF  | IF                                         | ID  |     |
| I <sub>4</sub> |     |                                                              | =   | =   | =                                          | IF  |     |

7 cycles - 4 i/p/s = 3 stalls

Read  
-After  
Write

## Solution of Data Dependency.

- ↳ ① Software Solution (Compiler Based)
  - ↳ NOP Instruction.
- ↳ ② Hardware type Sol'n
  - ↳ HW Interlock
  - ↳ Operand Forwarding. | By bypassing, Shoot(ing) Kill.

# Solution Of Data Dependency

## Hardware Interlocks

The most straightforward method is to insert hardware interlocks. An interlock is a circuit that detects instructions whose source operands are destinations of instructions further up in the pipeline.

## Operand Forwarding

Another technique called operand forwarding uses special hardware to detect a conflict and then avoid it by routing the data through special paths between pipeline segments. For example, instead of transferring an ALU result into a destination register, the hardware checks the destination operand, and if it is needed as a source in the next instruction, it passes the result directly into the ALU input, bypassing the register file.

## Solution of Data Dependency

- The minimize the data dependency stall in the pipeline hardware technique is used i.e. operand forwarding. Also named as **By-passing or Short –circuit**
- This mechanism state that use the buffer between the stage to hold the intermediate result so that, dependent instruction will be accessing the new data from the buffer before update register file.

$\mathfrak{I}_1 : \text{ADD } \delta_0 \ \delta_1 \ \delta_2 ; \quad \delta_0 \in \delta_1 + \delta_2$

$\mathfrak{I}_2 : \text{MUL } \delta_3 \ \delta_0 \ \delta_4 ; \quad \delta_3 \in \delta_0 * \delta_4$

$\mathfrak{I}_2$  Data Dependant on  $\mathfrak{I}_1$ .

## Without Operand Forwarding

|                  | CC1 | CC2      | CC3 | CC4 | CC5       | CC6 | CC7 | CC8 | CC9 |
|------------------|-----|----------|-----|-----|-----------|-----|-----|-----|-----|
| T <sub>1</sub> : | IF  | TD<br>OF | EX  | MA  | WB        |     |     |     |     |
| T <sub>2</sub> : | IF  | TD       | TD  | TD  | ID<br>OF  | EX  | MA  | WB  |     |
|                  |     |          |     |     | 3 STALL'S |     |     |     |     |

## With Operand Forwarding

|                | CC1 | CC2 | CC3                     | CC4                          | CC5                                                             | CC6 |  |
|----------------|-----|-----|-------------------------|------------------------------|-----------------------------------------------------------------|-----|--|
| I <sub>1</sub> | IF  | ID  | EX                      | MA                           | WB                                                              |     |  |
| I <sub>2</sub> |     | IF  | ID<br>(I <sub>1</sub> ) | EX                           | MA                                                              | WB  |  |
| I <sub>3</sub> |     |     | IF                      | ID<br>(I <sub>2</sub> ), (I) | EX                                                              | MA  |  |
| I <sub>4</sub> |     |     |                         | IF                           | r <sub>0</sub> = value<br>ID (I <sub>1</sub> , I <sub>3</sub> ) | EX  |  |
| I <sub>5</sub> |     |     |                         |                              | IF                                                              | ID  |  |
| I <sub>6</sub> |     |     |                         |                              |                                                                 | IF  |  |

6 cycles – 6 i/p's = 0 stalls

OPERAND - forwarding

Q.

Consider a pipelined processor with the following four stages

IF : Instruction Fetch

ID : Instruction Decode and Operand Fetch

EX : Execute

WB : Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

Ex

|   |                |     |     |     |    |              |
|---|----------------|-----|-----|-----|----|--------------|
| 1 | T <sub>1</sub> | ADD | R2  | R1  | R0 | R2 ← R1 + R0 |
| 3 | T <sub>2</sub> | MUL | R4, | R3, | R2 | R4 ← R3 * R2 |
| 1 | T <sub>3</sub> | SUB | R6, | R5, | R4 | R6 ← R5 - R4 |

[GATE-2007 : 2 Marks]

- A 7
- B 8
- C 10
- D 14

# without operand Forwarding

|                                  | cc1 | cc2       | cc3       | cc4       | cc5       | cc6       | cc7       | cc8       | cc9       | cc10      | cc11 | cc12 | cc13 |
|----------------------------------|-----|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------|------|------|
| I <sub>1</sub> :                 | IF  | ID        | EX        | WB        |           |           |           |           |           |           |      |      |      |
| I <sub>2</sub>                   | IF  | ID        | ID        | <u>ID</u> | EX        | EX        | EX        | WB        |           |           |      |      |      |
| I <sub>3</sub>                   | IF  | <u>ID</u> | EX   | WB.  |      |
| without operand forwarding = 12. |     |           |           |           |           |           |           |           |           |           |      |      |      |

# without operand Forwarding

|       | $I_1$ | $I_2$ | $I_3$ | $I_4$ | $I_5$ | $I_6$ | $I_7$ | $I_8$ | $I_9$ | $I_{10}$ | $I_{11}$ | $I_{12}$ | $I_{13}$ |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|----------|----------|----------|
| $I_1$ | IF    | ID    | EX    | WB    |       |       |       |       |       |          |          |          |          |
| $I_2$ | IF    | IF    | ID    | EX    | EX    | EX    | WB    |       |       |          |          |          |          |
| $I_3$ | IF    | "     | "     | "     | "     | "     | "     | ID    | EX    | WB       |          |          |          |

without operand forwarding = ~~I2~~

# without operand Forwarding :



without operand  
Forwarding

With operand forwarding.

# With operand Forwarding

|    | cc1 | cc2 | cc3 | cc4 | cc5 | cc6 | cc7 | cc8 | cc9 | cc10cc11 | cc12 | cc13 |
|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----------|------|------|
| T1 | IF  | ID  | EX  | WB  |     |     |     |     |     |          |      |      |
| T2 |     | IF  | ID  | EX  | EX  | EX  | WB  |     |     |          |      |      |
| T3 |     |     | IF  |     |     |     |     |     |     |          |      |      |
|    |     |     |     |     |     |     |     |     |     |          |      |      |

with operand forwarding = ⑧ Avg

# with operand Forwarding



With operand Forwarding = 8 .

OR

Another Approach

without timing Diagram  
(Not preferable)

By Debant

$$\text{In Pipeline} \quad \boxed{\text{CPI} = 1}$$

Restriction

only one type of  
Deb.  
Then this method  
work.

But Some Instructions takes More Cycle  
(Extra cycle) in some Stage.

$$\text{Total ET} = \text{ET}_{\substack{\text{PIPE} \\ \text{Normal}}} + \text{Stalls} \\ [\text{ExtraCycle}]$$

Q.

Consider a pipelined processor with the following four stages

IF : Instruction Fetch

ID : Instruction Decode and Operand Fetch

EX : Execute

WB : Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

$k=4$

4 Stage

$n=3$

3 Instruction ( $I_1, I_2, I_3$ )

IF :  $\perp$

ID :  $\perp$

EX : ③ (2 Extra Cycle  
[Stall])

WB :  $\perp$

$$ET_{PIPE} = [k + (n-1)] + p$$

$$\Rightarrow [4 + (3-1)]$$

$$= \underline{\underline{6 \text{ cycle}}}$$

$$\text{Total ET} = ET_{PIPE} + \text{Stalls}$$

$$\begin{aligned} &\Rightarrow 6 + 2 \\ &= \underline{\underline{8 \text{ cycle long}}} \end{aligned}$$

$$\text{Stalls} = \underline{\underline{2 \text{ cycle}}}$$

$n=3, k=4$

|       |     |     |     |    |                         |
|-------|-----|-----|-----|----|-------------------------|
| $T_1$ | ADD | R2  | R1  | R0 | $R2 \leftarrow R1 + R0$ |
| $T_2$ | MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| $T_3$ | SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

[GATE-2007 : 2 Marks]

- (A) 7
- (B) 8
- (C) 10
- (D) 14

Q.

The performance of a pipelined processor suffer if

P  
W

[GATE-2002 : 2 Marks]

- A the pipeline stages have different delays
- B consecutive instructions are dependent on each other
- C the pipeline stages share hardware resources
- D All of the above

A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write Operand (WO) stage. The IF, ID, OF and WO stage take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions ,3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

**Instruction**

- I<sub>0</sub>: MUL R<sub>2</sub>, R<sub>0</sub>, R<sub>1</sub>
- I<sub>1</sub>: DIV R<sub>5</sub>, R<sub>3</sub>, R<sub>4</sub>
- I<sub>2</sub>: ADD R<sub>2</sub>, R<sub>5</sub>, R<sub>2</sub>
- I<sub>3</sub>: SUB R<sub>5</sub>, R<sub>2</sub>, R<sub>6</sub>

**Meaning of Instruction**

- R<sub>2</sub>  $\leftarrow$  R<sub>0</sub>\*R<sub>1</sub>
- R<sub>5</sub>  $\leftarrow$  R<sub>3</sub>/R<sub>4</sub>
- R<sub>2</sub>  $\leftarrow$  R<sub>5</sub> + R<sub>2</sub>
- R<sub>5</sub>  $\leftarrow$  R<sub>2</sub> - R<sub>6</sub>

**A** 13**B** 15**C** 17**D** 19

[GATE-2010-CS: 2M]

**Q.**

For a pipelined CPU with a single ALU, consider the following situations

1. The  $j^{th} + 1^{st}$  instruction uses the result of the  $j^{th}$  instruction as an operand
2. The execution of a conditional jump instruction
3. The  $j^{th}$  and  $j^{th} + 1^{st}$  instructions require the ALU at the same time

Which of the above can cause a hazard?

**[GATE-2003 : 1 Marks]**

- A** 1 and 2 only
- B** 2 and 3 only
- C** 3 only
- D** All the three

Q.

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement  $X = (S - R * (P + Q))/T$  is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

|       |       |         |            |
|-------|-------|---------|------------|
| ADD   | $I_1$ | $\perp$ | R5, R0, R1 |
| MUL   | $I_2$ | 3       | R6, R2, R5 |
| SUB   | $I_3$ | $\perp$ | R5, R3, R6 |
| DIV   | $I_4$ | 3       | R6, R5, R4 |
| STORE | $I_5$ | $\perp$ | R6, X      |

;  $R5 \leftarrow R0 + R1$   
;  $R6 \leftarrow R2 * R5$   
;  $R5 \leftarrow R3 - R6$   
;  $R6 \leftarrow R5 / R4$   
;  $X \leftarrow R6$

| WB |       |       | $T_1$ |       | $T_2$ | $T_3$ |       | $T_4$ | $T_5$ |       |       |       |    |    |    |     |
|----|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----|----|----|-----|
| EX |       |       | $T_1$ | $T_2$ | $T_2$ | $T_2$ | $T_3$ | $T_4$ | $T_4$ | $T_4$ | $T_4$ | $T_5$ |    |    |    |     |
| TD |       |       | $T_1$ | $T_2$ |       | $T_3$ |       | $T_4$ |       | $T_5$ |       |       |    |    |    |     |
| TF | $T_1$ | $T_2$ | $T_3$ | $T_4$ | $T_5$ |       |       |       |       |       |       |       |    |    |    |     |
|    | 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     | 10    | 11    | (12)  | 13 | 14 | 15 | 16. |

Another Approach

$$\# \text{Inst}^n = 5 \quad (n=5)$$

$$\text{Stage} = 4 \quad (k=4)$$

IF      |  
ID      |  
EX      |  
WB      |

Extra

|              |               |   |
|--------------|---------------|---|
| MUL :        | 3             | 2 |
| DIV :        | 3             | 2 |
|              | ④             |   |
| Total Extra: | ④<br>(Stalls) |   |

$$\begin{aligned} ET_{PIPE} &= [k + (n-1)] \text{ cycle} \\ &\Rightarrow [4 + 5 - 1] \\ &= 8 \text{ cycle} \end{aligned}$$

$$\begin{aligned} ET &= ET_{PIPE} + \text{Stalls} \\ &\Rightarrow 8 + 4 \end{aligned}$$

$$\begin{aligned} &\Rightarrow 12 \text{ Cycle} \end{aligned}$$

The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

[GATE-2006: 2 Marks]

- A 10
- B 12
- C 14
- D 16

Q. Consider the sequence of machine instructions given below:

|     |              |              |            |
|-----|--------------|--------------|------------|
| MUL | $\text{I}_1$ | $\text{P}_0$ | R5, R0, R1 |
| DIV | $\text{I}_2$ | 5            | R6, R2, R3 |
| ADD | $\text{I}_3$ | 1            | R7, R5, R6 |
| SUB | $\text{I}_4$ | 1            | R8, R7, R4 |

$$\begin{aligned}
 R_5 &\leftarrow R_0 * R_1 \\
 R_6 &\leftarrow R_2 / R_3 \\
 R_7 &\leftarrow R_5 + R_6 \\
 R_8 &\leftarrow R_7 - R_4
 \end{aligned}$$

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of

instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The PO state takes 1 clock cycle for ADD or SUB instruction. 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is 13 cycle.

Timing diagram showing tasks  $T_1$ ,  $T_2$ ,  $T_3$ , and  $T_4$  over 13 time slots. The tasks are represented by vertical columns of events:

- $T_1$ : IF, OF, PO, PO, PO, WB
- $T_2$ : IF, OF, PO, PO, PO, WB
- $T_3$ : DF, DF
- $T_4$ : DF, DF

The diagram highlights specific events with pink circles and indicates transitions with green arrows:

- $T_1$  and  $T_2$  both have two overlapping periods of PO and WB.
- $T_3$  and  $T_4$  both have two periods of DF.
- A green arrow points from the end of the first DF period of  $T_3$  to the start of the second DF period of  $T_4$ .
- A green arrow points from the end of the first DF period of  $T_4$  to the start of the second DF period of  $T_3$ .

| WB |                |                |                | I <sub>1</sub> |                |                |                | I <sub>2</sub>   | I <sub>3</sub>   | I <sub>4</sub> |    |
|----|----------------|----------------|----------------|----------------|----------------|----------------|----------------|------------------|------------------|----------------|----|
| PD |                | I <sub>1</sub> | I <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>2</sub>   | I <sub>3</sub>   | I <sub>4</sub> |    |
| OF |                | I <sub>1</sub> | I <sub>2</sub> |                |                |                |                | I <sub>3</sub> ↓ | I <sub>4</sub> ↓ |                |    |
| IF | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |                |                |                |                  |                  |                |    |
|    | 1              | 2              | 3              | 4              | 5              | 6              | 7              | 8                | 9                | 10             | 11 |

13 cycle

13 cycle

By Another Approach

$n=4$  ( $T_1 T_2 T_3 T_4$ )

$k=4$  (4 Stage)

TF ⊥

OF ⊥

PO ⊥

WB ⊥

$$ET_{PIPE} = [k + (n-1)] t_p$$

$$\Rightarrow [4 + (4-1)] \text{ cycle}$$

$$ET_{PIPE} = 7 \text{ cycle}$$

(CPI = 1)

|     |  | MUL | 3 | 2 | Extaq |
|-----|--|-----|---|---|-------|
| DIV |  | 5   | 4 |   |       |
| ADD |  |     | 1 |   |       |
| SUB |  |     | 0 |   |       |

6 Stalls

$$ET = ET_{PIPE} + \text{Stalls.}$$

$$\Rightarrow 7 + 6$$

$$= 13 \text{ cycle Avg}$$

..

The instruction pipeline of a RISC processor has the following stages. Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB) The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions, In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.

The number of clock cycles required for completion of execution of the sequence of instructions is \_\_\_\_.

[GATE-2018-CS: 2M]

$$k=5, n=100$$

IF : 1

ID : 1

OF : 1

PO : ⊥

WB : 1

$$n=100, k=5$$

$$ET_{PIPE} = (k + (n-1)) \text{ cycle}$$

$$\Rightarrow [5 + (100-1)]$$

$$= 104 \text{ cycle}$$

|                |     | (Stalls)<br>Extra |                                                           |
|----------------|-----|-------------------|-----------------------------------------------------------|
| PO : 40 Inst^n | : 3 | 2                 |                                                           |
| 35 Inst^n      | : 2 | 1                 | Total Stalls $\Rightarrow 40 \times 2$<br>$+ 35 \times 1$ |
| 25 Inst^n      | : 1 | 0                 | $\Rightarrow 80 + 35$<br><u>= 115 Stalls.</u>             |

$$\text{Total ET} = ET_{PIPE} + \text{Stalls}$$

$$\Rightarrow 104 + 115$$

$$= 219 \text{ Avg}$$

$k=5, n=100$

IF : 1.

ID : 1.

OF : 1.

PD :

WB :  $\frac{1}{4}$

PO: 40 Instn : 3

35 Instn : 2

25 Instn : 1

PO Stage

$$= 40 \times 3 + 35 \times 2 + 25 \times 1$$

$$= 120 + 70 + 25$$

= 215 cycle

Total ET = 4 + PO Stage Total

$$\Rightarrow 4 + 215$$

$\Rightarrow 219 \text{ cycle Avg}$

∴

**MCQ**

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instruction  $I_1, I_2, I_3, \dots, I_{12}$  is executed in this pipelined processor. Instruction  $I_4$  is the only branch instruction and its branch target is  $I_9$ . If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

[GATE-2013-CS: 2M]

A 132

B 165

C 176

D 328

**THANK  
YOU!**

