

Lec-38

## MIPS32 instruction cycle

Instruction execution cycle — 5 stages

- (i) instruction fetch (IF): fetching instruction from memory
- (ii) ID: Instruction Decode / Register fetch: decode of code & find what instruction it is.
- (iii) EX: Execution / Effective Address calculation: we execute instruction or for some instruction, we have to compute effective address size.

eg: LD RS, 5B(R12)



- (iv) MEM: Memory access / Branch completion — read or write

- (v) ~~WB~~ WB: register write-back

eg: ADD R1 R2 R3



Now this result of  $R2 + R3$  value needs to be stored in R1. This happens in Register write-back stage.

(in these steps, there are various steps → they are micro-operations)

## (a) PC - Program counter

### Instruction fetch (IF):

(a)

Here the instruction pointed to by PC is fetched from memory & also next value of PC is computed.

\* PC always contains the address of next instruction to be executed.

- Every MIPS32 instruction is of 32 bits.
- Every memory word is of 32 bits and has a unique address.
- for a branch instruction, new value of the PC may be the target address. So PC is not updated in this stage; new value is stored in a register NPC.



\* for byte addressable memory, PC has to be incremented by 4 bytes (32 bits).

### Instruction Decode (ID):

- instruction already fetched in IR is decoded.
- OPCODE is 6-bits (bits 31:26)
- first source operand as (bits 25:21), second source operand rt (bits 20:16).
- 16 bit immediate data (bits 15:0)
- 26-bit immediate data (bits 25:0)
- decoding is done in parallel with reading the register operands rs and rt.
- Possible because these fields are in a fixed location in the instruction format.

$A, B, Imm, Imm1 \rightarrow$  temp registers.

Date \_\_\_\_\_  
Page \_\_\_\_\_

ID:  $A \leftarrow \text{Reg}[rs]$

$B \leftarrow \text{Reg}[rt]$

$Imm \leftarrow (IR_{15})^{16} \# \# IR_{15..0}$

$Imm1 \leftarrow (IR_{25})^6 \# \# IR_{25..0}$

// sign extension of  
16-bit immediate  
field.

(// sign extension of 26-bit  
immediate field)

What is actually happening

16-bit immediate field

~~It is a complement signed no.~~

It is signed no. in 2's complement

If no. positive  $\rightarrow$  starts with 0.

If no. negative  $\rightarrow$  starts with 1.

32 bits.



If no. is +ve  $\rightarrow$  pad zeroes to the left

## (C) EX: Execution / Effective address computation

- In this step, the ALU is used to perform some calculation
  - the exact operation depends on instruction that is already decoded during IF.
  - ALU operates on operands that have been already made ready in the previous cycle
    - A, B, Imm, etc.
  - we show micro instructions corresponding to the type of instructions.

~~(eg)~~ eg LW R3 , 100(R8). (memory referenced)

$$\text{ALU out} \leftarrow A + \text{Imm}$$

eg: SUB R2, RS, R12

$$\text{ALU out} \leftarrow A - \text{func } B$$

(reg-reg: ALU instruct.)

(Register-Immediate ALU instructions)

eg: SUBL R2, RS, 524.

$$\text{ALU out} \leftarrow A - \text{func. Imm}$$

(Branch)

eg BEQZ R2, 1abd

$$\text{ALU out} \leftarrow \text{NPC} + \text{Imm}; \quad (\text{op is } =)$$

$$\text{cond} \leftarrow (A \text{ op } 0);$$

LMD → load memory data

↳ it is just at a temporary register.

## (d) MEM : memory access / Branch completion

- the only instructions that make use of this step are loads, stores and branches.
- the load and store instructions access the memory.
- the branch instructions update PC depending upon the outcome of the branch condition.

Load instruction

 $PC \leftarrow NPC;$  $LMD \leftarrow Mem[ALUout];$ 

(Store instruction)

 $PC \leftarrow NPC;$  $Mem[ALUout] \leftarrow B;$ 

Branch instruction :

if (cond)  $PC \leftarrow ALUout;$   
else  $PC \leftarrow NPC;$ 

Other instructions

 $PC \leftarrow NPC$ 

## (e)

## WB : Register Write Back

→ in this step, result → written back to register  
(Result may come from ALU)

(Result may come from the memory system (via LOAD instruction)).

→ the position of destination register in the instruction word depends on instruction → "already known after decoding" has been done".



register register ALU instruction:

Reg [rd]  $\leftarrow$  ALUout;

Register - immediate ALU instruction

Reg [rt]  $\leftarrow$  ALUout;

load instruction

Reg [st]  $\leftarrow$  LMD;

complete steps of micro operations

eg: ADD R2, R5, R10

|     |                                                  |
|-----|--------------------------------------------------|
| IF  | IR $\leftarrow$ Mem[PC]                          |
|     | NPC $\leftarrow$ PC + 1;                         |
| ID  | A $\leftarrow$ Reg[rs]<br>B $\leftarrow$ Reg[rt] |
| EX  | ALUout $\leftarrow$ A + B;                       |
| MEM | PC $\leftarrow$ NPC                              |
| WB  | Reg[rd] $\leftarrow$ ALUout                      |

eg: ADDI R2, R5, 150

|     |                                                                                                        |
|-----|--------------------------------------------------------------------------------------------------------|
| IF  | IR $\leftarrow$ Mem[PC]                                                                                |
|     | NPC $\leftarrow$ PC + 1                                                                                |
| ID  | A $\leftarrow$ Reg[rs]<br>Imm $\leftarrow$ (IR <sub>15:0</sub> ) <sup>16</sup> #### IR <sub>15:0</sub> |
| EX  | ALUout $\leftarrow$ A + Imm;                                                                           |
| MEM | PC $\leftarrow$ NPC;                                                                                   |
| WB  | Reg[rt] $\leftarrow$ ALUout;                                                                           |



Lec-39

## Pipelined implementation of a processor (3).

- each stage must finish its execution every clock cycle.
- We should be able to start a new instruction every clock cycle.

| instruction | 1  | 2  | 3  | 4   | 5   | 6   | 7   | 8  |
|-------------|----|----|----|-----|-----|-----|-----|----|
| i           | IF | ID | EX | MEM | WB  |     |     |    |
| i+1         |    | IF | ID | EX  | MEM | WB  |     |    |
| i+2         |    |    | IF | ID  | EX  | MEM | WB  |    |
| i+3         |    |    |    | IF  | ID  | EX  | MEM | WB |

Instruction i+1 finishes.  
Instruction i finishes at

- We give time to fill up pipe after that every clock cycle one instruction finishes.



## Some naming conventions



IF-ID-IR

→ most of the temporary regs. required in data path are included as part of inter-stage latches.

eg       $IR \leftarrow \text{Mem [PC]}$

### (a) PIPELINE STAGE - IR

$IF-ID-IR \leftarrow \text{Mem [PC]};$

$IF-ID-NPC; PC \leftarrow \text{if } ((EX-MEM-IR[\text{opcode}] == \text{branch}) \text{ then } EX-MEM-ALUOUT \text{ else } \{ \text{PC} + 1 \})$

else



IF  
=

## (b) Micro operations for pipeline stage ID

$ID\_EX\_A \leftarrow \text{Reg}[\text{IF\_ID\_IR}(rs)];$   
 $ID\_EX\_B \leftarrow \text{Reg}[\text{IF\_ID\_DR}(re)];$   
 $ID\_EX\_NPC \leftarrow \text{IF\_ID\_NPC};$   
 $ID\_EX\_IR \leftarrow \text{IF\_ID\_IR};$   
 $ID\_EX\_Imm \leftarrow \text{sign\_extend}(\text{IF\_DR\_IR}_{15-0});$



## (c) Micro op. for EX pipeline stage

## → R-R ALU

$EX\_MEM\_IR \leftarrow ID\_EX\_IR;$

$EX\_MEM\_ALUout \leftarrow ID\_EX\_A \text{ func } ID\_EX\_B;$

## → R-M ALU

$EX\_MEM\_IR \leftarrow ID\_EX\_IR$

$EX\_MEM\_ALUout \leftarrow ID\_EX\_A \text{ func } ID\_EX\_Imm$

→ Load/Store:

$$EX\_MEM\_IR \leftarrow ID\_EX\_IR$$

$$EX\_MEM\_ALUout \leftarrow ID\_EX\_A + ID\_EX\_Imm$$

$$EX\_MEM\_B \leftarrow DD\_EX\_B;$$

→ Branch:

$$EX\_MEM\_ALUout \leftarrow ID\_EX\_NPC + ID\_EX\_Imm$$

$$EX\_MEM\_cond \leftarrow (ID\_EX\_A == 0);$$



(d) Microoperations for pipeline stages MEM

$$MEM\_WB\_IR \leftarrow EX\_MEM\_IR; \quad \{ ALU \}$$

$$MEM\_WB\_ALUout \leftarrow EX\_MEM\_ALUout; \quad \}$$

$$MEM\_WB\_IR \leftarrow EX\_MEM\_IR; \quad \{ \text{load} \}$$

$$MEM\_WB\_EMD \leftarrow Mem[EX\_Mem\_ALU.out]; \quad \cancel{\text{store}}$$

$$MEM\_WB\_IR \leftarrow EX\_MEM\_IR \quad \{ \text{store} \}$$

$$MEM[EX\_MEM\_ALU.out] \leftarrow EX\_MEM\_B;$$

7.5P Stage



(c) Micro operations for Pipeline stage WB

Reg [MEM-WB-IR[rd]]  $\leftarrow$  MEM-WB-ALUout; for R-RALU

Reg [MEM-WB-IR[rd]]  $\leftarrow$  MEM-WB-ALUout; R-M-ALU

Reg [MEM-WB-IR[rd]]  $\leftarrow$  MEM-WB-LMD;  $\rightarrow$  LOAD



After combining all  $\rightarrow$  MIPS 32

### HAZARDS in pipelines!

(a) Structural hazard: two instructions which are already there in the pipeline, they are in two different stages, but they are trying to use same hardware resource

(b) Data Hazard  $\rightarrow$  arises due to instruction dependency



Lec-41

Verilog modeling of the processor - Part 1

→ behavioural verilog code to implement pipeline.

→ Two special 1-bit variables are used:

→ HALTED: Set after a halt instruction executes & reached WB stage.

→ TAKEN\_BRANCH: Set after a decision to take a branch is known required to disable the instruction that have already entered the pipeline from making any state changes.

How Pipeline works?

i : IF ID EX MEM WB

i+1 : IF ID\* EX MEM WB

i+2 : IF ID EX MEM WB

i+3 : IF ID EX MEM WB

{ Suppose we have a sequence of instructions to be executed and after that we have HLT instruction, the processor is assumed to stop now.

Suppose instruction  $i+1$  is HLT instruction. Then are

$i, i-1, i-2^{th}$  instruction earlier as well.

The  $i+1$  HLT instruction decoded during ID stage\*.

During the ID stage of  $(i+1)^{th}$  instruction, instruction  $i$  is in EX stage. So we can not stop the processor right here because previous instructions ie  $i, i-1, i-2$  are not completed. So we wait till WB stage of  $i+1$  stage for that HALTED is used.

As soon as ~~the~~ HLT instruction is decoded, we set the HALTED to 1.

and before executing any further instructions, the flag HALTED is checked if it is equal to 1. Therefore  $i+1$  and  $i+2$  will not be carried out.