

# Pipeline Implementation of RISC architecture processors: MIPS

- a) 32-general purpose registers R0 to R31 (32-bit each)
- b) Program Counter register PC - 32-bit.

No flag registers & Only few addressing modes  
every word is memory addressable and 32-bit word size.

Below are the instructions considered.

Load & store instructions

LW R2, 12(R8) //  $R2 = \text{Mem}[R8 + R4]$

SW R5, 10(R25) //  $\text{Mem}[R5 - 10] = R5$

Arithmetic & logic instructions

ADD R1, R2, R3 //  $R1 = R2 + R3$

ADD R1, R2, R0 //  $R1 = R2 + 0$

SUB R12, R10, R8 //  $R12 = R10 - R8$

AND R20, R1, R5 //  $R20 = R1 \& R5$

OR R11, R5, R6 //  $R11 = R5 | R6$

MUL R5, R6, R7 //  $R5 = R6 * R7$

SLT R5, R11, R12 // If  $R5 < R12$ , R5=1; else R5=0

SLT  $\Rightarrow$  Set Less Than Instruction

ADDI R1, R2, 25 //  $R1 = R2 + 25$

SUBI R5, R1, 150 //  $R5 = R1 - 150$

SLTI R2, R10, 10 // If  $R10 < 10$ , R2=1; else R2=0

Branch Instructions

BEQZ R1, Loop // Branch to loop if  $R1 = 0$

BNEQZ R5, Label // Branch to label if  $R5 \neq 0$

Jump Instruction

J loop // Branch to loop unconditionally

Miscellaneous Instruction

HLT // Halt execution.

MIPS 32 instruction encoding : 3 types

R-type (Register type)

I-type (Immediate type)

J-type (Jump type)

R-type:



Rtype instruction considered with opcode

Instruction

opcode

ADD

000000

SUB

000001

AND

000010

OR

000011

SLT

000100

MUL

000101

HLT

1111

example for opcode instruction with 32-bit registers

SUB R5, R12, R25

000001 01100, 11001, 00101, 00000 000000

= 05992800 (in hex).

↳ Broke above instruction into hexadecimal.

I-type: contains a 16-bit immediate data field.

has one source & one destination field.



## I-type instructions with opcode

| <u>Instruction</u> | <u>Opcode</u> |         |
|--------------------|---------------|---------|
| LW                 | 001000        | //load  |
| SW                 | 001001        | //store |
| ADDI               | 001010        |         |
| SUBI               | 001011        |         |
| SLTI               | 001100        |         |
| BNEQZ              | 001101        |         |
| BEQZ               | 001110        |         |

example for immediate instruction

LW R20, 84(R9)  
 001000 01001, 1010000000101000... [84 in bin]  
 = 0B40054 (in hexadecimal) missing fields

example 2:

BEQZ R25, label  
 001110 11001, 00000, 4444444444444444  
 = 3B20YYYY (in hexadecimal)

J-type: Contains 26-bit jump address



| <u>Instruction</u> | <u>Opcode</u> |
|--------------------|---------------|
| J                  | 010000        |

\*\* Jump instruction is not implemented for RISC processor

Instruction cycle: time taken to execute one complete instruction

MIPS 32 instruction cycle :- 5 steps

- a) IF : Instruction Fetch
- b) ID : Instruction Decode / Register Fetch
- c) EX : Execution / Effective address calculation
- d) MEM : Memory access / Branch completion
- e) WB : Register Write-Back

### a) Instruction Fetch:

PC contains next instruction address in memory

→ Every MIPS 32 instruction is 32 bit

→ Every memory word is 32 bits and unique

IF:  $IR \leftarrow \text{Mem}[PC]$ ; //instruction register  
 $NPC \leftarrow PC + 4$ ; //New PC (Prog Counter)

### b) Instruction Decode:

Instruction Fetched in IR is decoded in this stage

Opcode :- 6 bits (31:26)

first source operand rs (25:21), second source operand  
rt (20:16); 16-bit immediate data (15:0); 26-bit  
immediate data (25:0)

\* Decoding is done in parallel with reading register operands

rs & rt      ID:  $A \leftarrow \text{reg}[rs]$

$B \leftarrow \text{reg}[rt]$

$Imm \leftarrow (IR)_{15..0}^{16} \# \# IR_{15..0}$

$Imm1 \leftarrow (IR)_{25..0}^{16} \# \# IR_{25..0}$

16-bit immediate field is signed number in 2's complement  
Sign extension is to extend 16 bit to 32 bit

easy in 2's complement as it is to pad sign  
bit to the left (MSBS) until 32 bits

c) Execution / Effective address calculation

ALU for some calculations

ALU inputs are from stage (b) (A, B, Imm, etc)  
examples:

Memory reference:  $ALU_{out} \leftarrow A + Imm$

ex: LW R3, 100(R8)

Reg-Reg ALU :  $ALU_{out} \leftarrow A \text{ func } B$

ex: SUB R2, R5, R12

Reg-Imm ALU :  $ALU_{out} \leftarrow A \text{ func Imm}$

ex: SUBI R2, R5, 524

Branch :  $ALU_{out} = NPC + Imm$ ; Cond  $\leftarrow (A \text{ op } 0)$

ex: BEQZ R2, Label

d) Memory Access / Branch Completion.

Only load, store, use this step

Load & Store access memory

Branch Updates PC depending upon outcome  
Load instruction:

$PC \leftarrow NPC;$

$LMD \leftarrow Mem[ALU_{out}];$

Store instruction:

$PC \leftarrow NPC;$

$Mem[ALU_{out}] \leftarrow B;$

Branch instruction:

if (Cond)  $PC \leftarrow ALU_{out};$

else  $PC \leftarrow NPC;$

Others:

$PC \leftarrow NPC;$

### e) Write Back to Register (WB)

result is written back to register

↳ may come from ALU

↳ may come from memory system via LOAD instruction  
position of destination register varies wrt type of instruction  
(R-type / I-type)

for Reg-Reg

$\text{Reg}[\text{rd}] \leftarrow \text{ALUOut}$

Reg-Imm

$\text{Reg}[\text{rt}] \leftarrow \text{ALUOut}$

Load-Instr

$\text{Reg}[\text{rt}] \leftarrow \text{LMD};$

Basic example for instruction execution step-by-step

en: ADD R2, R5, R10

// Registered Instruction example.

IF:  $\text{IR} \leftarrow \text{Mem}[\text{PC}];$

$\text{NPC} \leftarrow \text{PC} + 1;$

ID:  $A \leftarrow \text{Reg}[\text{RS}]$

$B \leftarrow \text{Reg}[\text{rt}]$

Ex:  $\text{ALUOut} \leftarrow A + B;$

Mem:  $\text{PC} \leftarrow \text{NPC};$

WB:  $\text{Reg}[\text{rd}] \leftarrow \text{ALUOut};$

// immediate instruction example

en: ADDI R2, R5, 150

IF:  $\text{IR} \leftarrow \text{Mem}[\text{PC}];$

$\text{NPC} \leftarrow \text{PC} + 1;$

ID:  $A \leftarrow \text{Reg}[\text{rs}];$

$\text{Imm} \leftarrow (\text{IR})_{15}^{16} \# \# \text{IR}_{15..0}$

Ex:  $\text{ALUOut} \leftarrow A + \text{Imm};$

Mem:  $\text{PC} \leftarrow \text{NPC};$

WB :  $Reg[rt] \leftarrow ALUout$  ;

// Load instruction example.

en: LW R2, 200(R6)

IF :  $IR \leftarrow Mem[PC]$ ;

$NPC \leftarrow PC + 1$ ;

ID :  $A \leftarrow Reg[rs]$

$Imm \leftarrow (IR_{15..0})^{16} \# IR_{15..0}$

EX :  $ALUout \leftarrow A + Imm$

Mem :  $PC \leftarrow NPC$  ;

$LMD \leftarrow Mem[ALUout]$

WB :  $Reg[rt] \leftarrow LMD$  ;

// Store Instruction example

en: SW R3, 25(R10)

IF :  $IR \leftarrow Mem[PC]$ ;

$NPC \leftarrow PC + 1$  ;

ID :  $A \leftarrow Reg[rs]$ ;  $B \leftarrow Reg[rt]$ ;

$Imm \leftarrow (IR_{15..0})^{16} \# IR_{15..0}$

EX :  $ALUout = A + Imm$ ;

Mem :  $PC \leftarrow NPC$  ;

$Mem[ALUout] \leftarrow B$  ;

WB :

en: BEQZ R3, label.

IF :  $IR \leftarrow Mem[PC]$ ;

$NPC \leftarrow PC + 1$

ID :  $A \leftarrow Reg[rs]$

$Imm \leftarrow (IR_{15..0})^{16} \# A..IR_{15..0}$

EX :  $ALUout \leftarrow NPC + Imm$

$Cond \leftarrow (A == 0)$

Mem :  $PC \leftarrow NPC$ ; if ( $Cond$ )  $PC \leftarrow ALUout$

WB :

MIPS 32 Non Pipelined Architecture  
(DATA PATH)



## Pipelined Architecture:

- New Instructions every clock cycle
- Each stage mentioned before becomes pipeline stage  
(IF, ID, EX, MEM, WB)
- Each stage should finish execution in one clock cycle.



## clock cycles of pipelined architecture

| inst | 1  | 2  | 3  | 4   | 5   | 6   | 7   | 8  |
|------|----|----|----|-----|-----|-----|-----|----|
| i    | IF | ID | EX | MEM | WB  |     |     |    |
| i+1  |    | IF | ID | EX  | MEM | WB  |     |    |
| i+2  |    |    | IF | ID  | EX  | MEM | WB  |    |
| i+3  |    |    |    | IF  | ID  | EX  | MEM | WB |



Micro operations of MIPS 32 architecture.

IF\_ID : Latch between IF & ID

ID\_EX : latch between ID & EX

EX\_MEM : latch between EX & MEM

MEM\_WB: latch between MEM & WB

Now using above naming conventions we name the processes, micro functions as below:

a) Instruction Fetch:

IF\_ID\_TR  $\leftarrow \text{REGM}[\text{PC}]$ ;

IF\_ID\_NPC, PC  $\leftarrow \begin{cases} \text{if } (\text{EX-MEM\_IR}[\text{OPCODE}] == \text{branch}) \\ \text{f EX-MEM\_Cond} \end{cases}; \{ \text{EX-MEM\_All} \}$

hardware!



else {PC+1};

a: EX-MEM\_All but

b: EX-MEM\_Cond.

IF-ID

b) Instruction Decode Stage:

$$ID\_EX\_A \leftarrow \text{Reg} [IF\_ID\_IR[rs]]$$

$$ID\_EX\_B \leftarrow \text{Reg} [IF\_ID\_IR[rt]]; \quad \text{NPC}$$

$$ID\_EX\_NPC \leftarrow IF\_ID\_NPC;$$

$$ID\_EX\_IR \leftarrow IF\_ID\_IR; \quad A$$

$$ID\_EX\_Imm \leftarrow \text{Sign extend}(IF\_ID\_IR_{15\dots0});$$



MEM\_WB\_ALU Out (or) MEM\_WB\_LMD

MEM\_WB\_ALU[IR[rd]]

c) Execution stage

RR ALU

$$EX\_MEM\_IR \leftarrow ID\_EX\_IR;$$

$$EX\_MEM\_ALUOut \leftarrow ID\_EX\_A \text{ func } ID\_EX\_B$$

R-T ALU

$$EX\_MEM\_IR \leftarrow ID\_EX\_IR;$$

$$EX\_MEM\_ALUOut \leftarrow ID\_EX\_A \text{ func } ID\_EX\_Imm \\ LD/SW$$

$$EX\_MEM\_IR \leftarrow ID\_EX\_IR;$$

$$EX\_MEM\_ALUOut \leftarrow ID\_EX\_A + ID\_EX\_Imm;$$

$$EX\_MEM\_B \leftarrow ID\_EX\_B;$$

## EX-MEM Cycles

Branch!

$$\begin{aligned} \text{EX-MEM\_ALUOut} &\leftarrow \text{ID-EX\_NPC} + \text{ID\_EX\_Imm}; \\ \text{EX-MEM\_Cond} &\leftarrow (\text{ID-EX\_A} == 0); \end{aligned}$$



d) Memory Access Stage!

$$\begin{aligned} \text{MEM\_WB\_IR} &\leftarrow \text{EX-MEM\_IR}; \\ \text{MEM\_WB\_ALUOut} &\leftarrow \text{EX-MEM\_ALUOut}; \end{aligned} \quad \text{GALU}$$

$$\begin{aligned} \text{MEM\_WB\_IR} &\leftarrow \text{EX-MEM\_IR}; \\ \text{MEM\_WB\_LMD} &\leftarrow \text{EX-MEM\_ALUOut} \quad \text{MEM} \end{aligned} \quad \text{Load.}$$

$$\begin{aligned} \text{MEM\_WB\_IR} &\leftarrow \text{EX-MEM\_IR}; \\ \text{MEM\_WB\_ALUOut} &\leftarrow \text{EX-MEM\_B}; \end{aligned} \quad \text{Store.}$$

IF {  
Stage}



e) Register Write Back Stage:

$\text{Reg}[\text{MEM\_WB\_IR}[rd]] \leftarrow \text{MEM\_WB\_ALUout}; R-R$   
 $\text{Reg}[\text{MEM\_WB\_IR}[rt]] \leftarrow \text{MEM\_WB\_ALUout}; R-R$   
 $\text{Reg}[\text{MEM\_WB\_IR}[rt]] \leftarrow \text{MEM\_WB\_LMD}; \text{Load.}$



Putting everything above together we have as below,



Hazards: 1) Structural Hazards  
 2) Data Hazards.  
 3) Control Hazards.

Verilog modelling of MIPS32 processor.  
When a halt instruction is encountered we need to set a flag to stop other instructions from entering pipeline. This is Halted flip flop.

When a branch instruction is encountered we need to consider branching for which we need a flip flop to discard that instruction. This flip flop is called Branch taken-Branch.

```
module pipe_mips32(clk1, clk2);
    input clk1, clk2;
    reg [31:0] PC, IF_ID, IR, IF_ID_NPC;
    reg [31:0] ID_EX_IR, ID_EX_NPC, ID_EX_A, ID_EX_B,
                ID_EX_Mem;
    reg [2:0] ID_EX_type, EX_MEM_type, MEM_WB_type;
    reg [31:0] EX_MEM, IR, EX_MEM_ALUout, EX_MEM_B;
    reg EX_MEM_Cond;
    reg [31:0] MEM_WBIR, MEM_WB_ALUout, MEM_WB_LMD;
    reg [31:0] Reg[0:31]; // register bank
    reg [31:0] Mem[0:1023]; // memory 1024x32
```

parameters ADD = 6'b000000, SUB = 6'b000001,  
AND = 6'b000010, OR = 6'b000011, SLT = 6'b000100,  
MUL = 6'b000101, SLLT = 6'b101111, LW = 6'b001000,  
SW = 6'b001000, ADD1 = 6'b001010, SUB1 = 6'b001011,  
SLTI = 6'b001100, BNQ1 = 6'b001101, BNQ2 = 6'b001110;

parameters RR\_ALU = 3'b000, RM\_ALU = 3'b001, LOAD = 3'b010,  
STORE = 3'b011, BRANCH = 3'b100, HALT = 3'b101;

reg HALTED;  
reg TAKEN\_BRANCH;

### //IF stage

always @ (posedge clk1)  
if (HALTED == 0)  
begin  
if (((EX-MEM-IR[31:26] == BEQZ) && (EX-MEM-  
IR[25:24] == 1)) || ((EX-MEM-IR[31:26] == BNEZ) &&  
(EX-MEM-Cond == 0)))  
begin

IF-ID-IR <= #2 Mem[EX-MEM\_ALUout];  
TAKEN-BRANCH = #2 1'b1;  
IF-ID-NPC <= #2 EX-MEM\_ALUout;  
PC <= #2 EX-MEM\_ALUout + 1;

end

else

begin

IF-ID-IR <= #2 Mem[pc];

IF-ID-NPC <= #2 PC + 1;

PC <= #2 PC + 1;

end

end

### //ID stage

always @ (posedge clk2)  
if (HALTED == 0)  
begin

If (IF\_ID\_IR[25:2] == 5'b00000)

ID\_EX\_A <= 0;

else

ID\_EX\_A <= #2 Reg [IF\_ID\_IR[25:2]];

If (IF\_ID\_IR[20:16] == 5'b00000)

ID\_EX\_B = 0;

else

ID\_EX\_B <= #2 Reg [IF\_ID\_IR[20:16]];

ID\_EX\_NPC <= #2 IF\_ID\_NPC;

ID\_EX\_IR <= #2 IF\_ID\_IR;

ID\_EX\_Imm <= #2 \${\\$16\\$IF\_ID\_IR[15:0]}, IF\_ID\_IR  
[15:0] } } ;

Case (IF\_ID\_IR[31:26])

ADD, SUB, AND, OR, SLT, MUL: ID\_EX\_type <= #2 RRAU;

ADT, SUBI, SLTI: ID\_EX\_type <= #2 RM\_ALU;

LW: ~~ID\_EX\_type~~ <= #2 LOAD;

SW: ~~ID\_EX\_type~~ <= #2 STORE;

BEQZ, BNEQZ: ID\_EX\_type <= #2 BRANCH;

HLT: ID\_EX\_type <= #2 HALT;

Default: ID\_EX\_type <= #2 HALT;

endCase

end

// EX stage

always @ (posedge CLK)

If (HALTED == 0)

begin

$EX\_MEM\_type \leq \#2 ID\_EX\_type;$

$EX\_MEM\_IR \leq \#2 ID\_EX\_IR;$

$TAKEN \leq BRANCH \leq \#2 0;$

~~Case (ID-EX-type)~~

Case (ID-EX-type)

RR\_ALU: begin

Case (ID\_EX\_IR[31:26])

ADD: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A + ID\_EX\_B$

SUB: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A - ID\_EX\_B$

AND: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A \& ID\_EX\_B$

OR: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A | ID\_EX\_B$

SLT: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A < ID\_EX\_B$

MUL: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_B * ID\_EX\_B$

Default: EX\_MEM\_ALUout  $\leq \#2 32'hXXXXXXXX$

end case

end

RM\_ALU: begin

Case (ID\_EX\_IR[31:26])

ADDI: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A + ID\_EX\_imm$

SUBI: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A - ID\_EX\_imm$

SLTI: EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A < ID\_EX\_imm$

Default: EX\_MEM\_ALUout  $\leq \#2 32'hXXXXXXXX$

end case

end

LOAD,STORE: begin

EX\_MEM\_ALUout  $\leq \#2 ID\_EX\_A + ID\_EX\_imm$

EX\_MEM\_Busout  $\leq \#2 ID\_EX\_B$

end

BRANCH: begin

EX-MEM-ALUout <= #2 ID-EX-NPC + ID-EX-Imm;

EX-MEM-Cond <= #2 (ID-EX-A == 0)

end.

endcase

end.

//MEM stage

always @ (posedge CLK2)

If (HALTED == 0)

begin

MEM-WB-Type <= EX-MEM-type;

MEM-WB-IR <= EX-MEM-IR;

Case (EX-MEM-type)

RR\_ALU, RM\_ALU;

MEM-WB-ALUout = #2 EX-MEM-ALUout;

LOAD D: MEM-WB-LMD <= #2 Mem[EX-MEM-ALUout];

STORE: If (TAKEN-BRANCH == 0)

Mem[EX-MEM-ALUout] <= #2 EX-MEM-B;

endcase

end

//WB stage

always @ (posedge CLK1)

begin

If (TAKEN-BRANCH == 0)

Case (MEM-WB-type)

RR\_ALU: Reg[MEM-WB-IR[15:1]] <= #2 MEM-WB-ALUout;

RM\_ALU: Reg[MEM-WB-IR[20:16]] <= #2 MEM-WB-ALUout;

LOAD: Reg[MEM-WB-IR[20:16]] <= #2 MEM-WB-LMD;

```

HALT: HALTED <= #2 1'b1;
endcase
end
endmodule

```

Using the above module described, execute programs to test functionality: Examples as below: (think & extend further)

ex: 1: ADD three numbers and store in processor registers  
ie set R1=10, R2=20, R3=30 & store sum in R4;  
module mips32test;

reg CLK1, CLK2;

integer k;

pipe\_mips'32 mips(.CLK1(CLK1), .CLK2(CLK2));

initial begin

CLK1=0; CLK2=0;

repeat(20)

#5 CLK1=1; #5 CLK1=0;

#5 CLK2=1; #5 CLK2=0;

end

end

initial begin

for(k=0; k<31; k=k+1)

mips.Reg[k]=k;

// Load all memory locations

// with the hex code of the

// program to add three numbers

// and store results. Add dummy

// instructions to avoid data hazards.

```

mips::HALTED = 0;
mips::PC = 0;
mips::TAKEN_BRANCH = 0;
#280
for (k=0; k<6; k=k+1)
    $display ("R%d - %d", k, mips::Reg[k]);
end
initial
begin
    $dumpfile("mips.vcd");
    $dumpvars(0, mips32test);
#300 $finish;
end
endmodule

```

ex:2: load a word stored in memory location 120, add 46 to it  
 and store result in memory location 121

ex:3 Calculate factorial of Number with loop instruction.

\* The test bench inputs are the hex codes of our programs  
 Remaining parts are the same vary the hex codes and  
 observe the outputs with different examples.

As a whole, the high level programming languages are converted to lower levels and fed to processors and they are in fact test benches.