

Sequencing  
Logic  
control mem

→ Computer is Six level machine

(1) Level 5 Problem Oriented language level  
(High level language)

→ Applications were working only in one machine, that app. is called "Desktop App"  
↓ Translation (compiler)

(2) Level 4 Assembly language level  
↓ Translation (Assembler)

(3) Level 3 Operating System Machine level  
↓ Partial Interpretation (O.S.)

(4) Level 2 Instruction set Architecture level  
(ISA)  
↓ Interpretation (Micro program)

(5) Level 1 Micro Architecture level  
↓ Direct execution (Hard wired)  
↓ Hardware (MAL)

(6) Level 0 Digital logic level

⇒ computer



→ Computer functions -

→ Data Movement

→ Data control

⇒ The Basic Computer Model —

⇒ The von-Neumann Model —

⇒ "Fetch-decode Execution cycle."

⇒ Instruction set Architecture —



→ compiler makes code machine independent.

→ Instruction set is the interface b/w hardware & software

→ Instruction set Design -

- Central part of any system design
- Allows abstraction, independence.

- Machines with new ISA should allow old instruction to be execute.
- 64-bit machine means 64-bits data will transfer through data bus at a time.
- for ex. we have 16 bit machine.  
Then 8 bits data will be stored in one address and other 8 bits data will be stored in other 2nd address.



- Memory Model -
- Exceptional conditions -
- Opcodes (Machine Language)
- MAL which is the set of processor design techniques used to implement the instruction set.

⇒ Elements of an Instruction -

- Operation code (Opcode)
- Source Operand reference
- Result Operand reference
- Next Instruction reference

⇒ Instruction Types -

- Data processing.
- Data storage
- Data movement
- Program flow control
  - (i) conditional and unconditional branches
  - (ii) call and return.

⇒ ISA Architecture types -

- (1) Stack architecture. → Implicitly on the top of stack.
- (2) Accumulator " → One Operand is implicitly the accumulator
- (3) General purpose registers architecture

Register Memory

load R<sub>1</sub>, A

Add R<sub>1</sub>, B

store C, R<sub>1</sub>

Load store (A memory → register)

load R<sub>1</sub>, A

load R<sub>2</sub>, B

ADD R<sub>3</sub>, R<sub>2</sub>, R<sub>1</sub>

store C, R<sub>3</sub>

→ Recently we are using Load store Architecture, because we store data in Registers. So, Execution time is less.

⇒ GPR Architecture -

(1) Register - Register ( $0 \text{ Memory} + 3 \text{ Reg} = \text{Total } 3$ )

example :- ALPHA, MIPS, SPARC, Power-Pc

(2) Register - Memory ( $1 \text{ Memory} + 1 \text{ Reg} = \text{Total } 2$ )

example :- Intel 80x86, Motorola 68000.

(3) Memory - Memory ( $3 \text{ Memory} + 0 \text{ Reg} = \text{Total } 3$ )

Example: VAX. (find why VAX is still in market?)

⇒ No. of Operands -

$$Y = (a-b) / (c + (d * e))$$

(1) Three Operands instruction -

Sub Y, a, b. (3-operands)

MUL t, d, e.

Add t, t, c.

Div Y, Y, t

(2) Two Operands instruction -

Sub a, b.

MUL d, e.

Add c, d.

Div a, c.

⇒ One Operand instructions -

→ In fewer operands, the fetch-execution cycle is faster.

→ From 4 GB memory  $2^{32}$  bytes can be accessed. and  $2^{30}$  words (32 bytes) can be accessed.

⇒ Big Endian - When MSB is at least address.  
e.g. MIPS.

⇒ Little Endian - When LSB is at least address.

→ It's a reversing of bytes not a number.

⇒ iAX and x86 <sup>have</sup> ~~are~~ CISC Instruction set.

⇒ Powerpc processes RISC Instruction set.

CISC: MOV AX, 10  
MOV BX, 5  
MULT AX, BX

RISC: MOV Ax, 0.  
MOV bx, 10  
MOV cx, 5

Begin: Add ax,bx  
Loop Begin

→ Here, for CISC  $(2 \text{ moves} \times 1 \text{ cycle}) + (1 \text{ mult} \times 3 \text{ cycles}) = 82 \text{ cycles.}$

→ for RISC.  $(3 \text{ moves} \times 1 \text{ cycle}) + (5 \text{ adds} \times 1 \text{ cycle}) + (\cancel{+ 1 \text{ loop}} \times 5 \text{ cycles}) = 13 \text{ cycles.}$

⇒ MIPS (Microprocessor without Interlocked Pipeline stages).

→ It has RISC ISA. (Big Endian)

→ MIPS has 3 Operands instruction.

  
2 source      1 destination.

but for  $A = B + C + D + E.$

Add F, B, C.

Add G, D, E.

Add A, F, G.

$$\rightarrow F = (G + H) - (I + J)$$

OR

Add K, G, H  
 Add L, I, J.  
 SUB F, K, L.

Add G, G, H.  
 Add I, I, J.  
 SUB F, G, I.

$\rightarrow$  MIPS has a  $32 \times 32$  bit register file.

~~we have~~

$\rightarrow$  If  $\downarrow$  32-bit processor, then 32-bits will move from data-bus.

$\rightarrow$  In MIPS, there is only \$t0, \$t1 ... \$t9 temporary values to store values.

$\rightarrow$  and \$s0, \$s1 ... \$s7 for saved variables.

$\Rightarrow$  MIPS Operand - Registers —

| Name        | Register No. | Usage                                       |
|-------------|--------------|---------------------------------------------|
| \$zero      | 0            | the constant value 0                        |
| \$v0 - \$v1 | 2-3          | Values of results and expression evaluation |
| \$a0 - \$a3 | 4-7          | arguments.                                  |
| \$t0 - \$t7 | 8-15         | temporaries.                                |
| \$s0 - \$s7 | 16-23        | Saved                                       |
| \$t8 - \$t9 | 24-25        | more temporaries.                           |
| \$gp        | 28           | global pointer                              |
| \$sp        | 29           | stack pointer                               |
| \$fp        | 30           | frame pointer                               |
| \$ra        | 31           | return address                              |

Note:

NO SUBI instruction is available in  
MIPS Micro-processor.

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

- Register 1 (\$at) is reserved for the assembler, 26-27 called \$K0-\$K1, for O.S.
- Require 5-bits to select one register.
- R-type instruction format —

| OP                 | rs       | rt              | rd      | shiftamt      | funct  |
|--------------------|----------|-----------------|---------|---------------|--------|
| Opcode             | first    | Second register | shift   | function      |        |
| Operation register | Register | destination     | amount  | field selects |        |
| source             | source   | Opand           | 00000   | variant of    |        |
| Opand              | Opand    | Opand           | for now | Operation     |        |
|                    |          |                 |         | extends       | opcode |
| 6 bits             | 5 bits   | 5 bits          | 5 bits  | 6 bits        |        |

⇒ I-Type —

destination  
register operand

| OP     | rs     | rt     | constant or Address |
|--------|--------|--------|---------------------|
| 6 bits | 5 bits | 5 bits | 16 bits             |

e.g.  $C = C + 4$ .

$$\$S_3 = \$S_3 + 4$$

ADDI  $\$S_3, \$S_3, 4$ .

↓  
(Immediate)

⇒ Load instruction :-

e.g.  $A = (A) + \text{Array}[100]$ .

$\text{LW } \$t_0, 100(\$s_2)$

↑  
base Address.  
~~~~~  
Loading

Add  $\$t_0, \$s_1, \$s_3, \$t_0$

→  $G = H + A[8]$

$\text{LW } \$t_0, 32(\$s_3)$  ( $\because$  each word occupies 4 bytes).

→  $A[i_2] = H + A[8];$

~~$\text{LW } \$t_0, 32(\$s_3)$~~   
 ~~$\text{SLW } \$t_0, 32(\$s_4)$~~   
~~Add  $\$t_0, \$t_0$~~

~~$\text{LW } \$t_0, 32(\$s_3)$~~   
 ~~$\text{LW } \$t_1, 48(\$s_4)$~~   
~~Add  $\$t_1, \$t_0$~~   
 ~~$\text{SW } 48(\$s_4), \$t_1$~~

$\text{LW } \$t_0, 32(\$s_1)$

Add  $\$t_1, \$t_0, \$s_3$  ||  $\$s_3 = H$ .

~~$\text{SW } 48(\$s_1), \$t_1$~~

Where,  $\$s_1$  = Base Add. of A.

$\$s_2$  = H content

$\$t_0, \$t_1$

= temporary register

$\Rightarrow \$S_0 = 0$  —

$\Rightarrow \text{add } \$t_0, \$S_0$

~~nor~~  $\$t_1, \$t_0$

~~and~~  $\$t_1, \$t_1, \$t_0$

~~sw~~  $\$t_1, \$S_0$ .

OR and  $\$S_0, \$S_0, \$zero$ .

OR sub  $\$S_0, \$S_0, \$S_0$

OR addi  $\$S_0, \$zero, a$  (constant).

$\Rightarrow$  MIPS conditional instructions —

Branch to a labeled instn if a condition is True.

Otherwise continue sequentially..

e.g. Beq  $H_3, LT, L_1$

IF ( $H_3 == LT$ )

(IF this is true then only it will go to  $L_1$  location otherwise it will go to next instn.)

e.g. IF ( $i == j$ )  $f = g + h$ .

else  $f = g - h$ .

Beq

~~if~~  $(\$S_3 == \$S_4)$

Add.  $\$S_0, \$S_1, \$S_2 // f = g + h$ .

~~else~~ Bne

Sub.  $\$S_0, \$S_1, \$S_2 // f = g - h$

Both eq.



(1) Bcq \$s\_3, \$s\_4, LAdd  
Sub \$s\_0, \$s\_1, \$s\_2.

LAdd : \$s\_0, \$s\_1, \$s\_2.

(2) Bnc \$s\_3, \$s\_4, LSub

Add \$s\_0, \$s\_1, \$s\_2.

J Exit

// to jump to LSub.

LSub : \$s\_0, \$s\_1, \$s\_2

Exit :

conditional instructions with loop —

loop:  $g = g + A[i]$

$i = i + j$

if ( $i \neq h$ ) goto loop.

: add \$t\_1, \$s\_0, \$s\_0 //  $t_1 = i + i = 2i$

add \$t\_1, \$t\_1, \$t\_1 //  $t_1 = 2i + 2j = 4i$ .

add \$t\_1, \$t\_1, \$s\_5 //  $\$s_5 = \text{Base Add of } A.$   
 $\$t_1 = A[i]$ .

lw \$t\_0, 0(\$t\_1)

add \$s\_1, \$s\_1, \$t\_0 //  $g = g + A[i]$ .

add \$s\_3, \$s\_3, \$s\_4 //  $i = i + j$

bnc \$s\_3, \$s\_2, Loop.

Ques 1 while ( save[i] == k )  
 $i = i + j;$

$i \leftarrow \$s_3, j = \$s_4, k = \$s_5.$  base Add. =  $\$s_6$

while:    Add  $\$t_1, \$s_3, \$s_3$     //  $\$t_1 = i + i = 2i$   
             Add  $\$t_1, \$t_1, \$t_1$     //  $\$t_1 = 4i$   
             Add  $\$t_1, \$t_1, \$s_6$   
         LW  $\$t_0, 0(\$t_1)$     //  $\$t_0 = \text{Save}[i]$

~~beq~~  $\$t_0, \$s_5, LAdd$   
 Exit ..

LAdd:  $\$s_3, \$s_3, \$s_4$   
 $j \text{ while}$     // jump while loop

Ques 1 while ( save[i] == k ),  $i += 1;$

~~Ques 2~~ switch(k) {

case 0:  $f = i + j;$  break;

case 1:  $f = g + h;$  break;

case 2:  $r = g - h;$  break;

case 3:  $f = i - j;$  break;

}

$f = \$s_0, i = \$s_1, j = \$s_2, g = \$s_3, h = \$s_4,$   
 $k = \$s_5.$

> For branch instruction format we will use I-type of instruction format.

Jump(j) Addressing —



26 bits address is concatenated with upper bits

j1 Addressing.



In I-type base Add. is for lw, sw,  
and immediate Addressing mode is for  
Add i.

# Micro Architecture level.

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

GOAL: Data path and control Unit.

→ Processor implementation style —

- (1) single cycle.
- (2) Multi cycle.



Cycle time = 20 ns

Cycle time = 5 ns

max time  
require that  
will be cycle time

⇒ Unpipelined. —



→ skip topics which are titled with  
"red" colour.



→ In lw rs will be base address of Array.  
and rd will be destination.



⇒ Data Path Executing Add -





$4 \times 1000 \times 1000 \times 1000$ 

Add Rd, Rs, Rt, Fd



## → control Signals -

| Signal Name   | Effect when<br>Deasserted                                                     | Effect when<br>asserted.                                                           |
|---------------|-------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| (1) Req.Dst   | The reg. destination number for the write req. comes from the rd field.       | The register destination number for the write req. comes from the rd field.        |
| (2) Req.write | None                                                                          | The req. on the write req. input is written with the value on the write data input |
| (3) ALUsrc    | The Second ALU operand comes from the second operand is the req. file output. | The second ALU sign-extended lower 16-bits of the instruction                      |
| (4) PCsrc     | The pc is replaced by the output of the adder that computes the value of Pct4 | The pc is replaced by the output of the adder that computes the branch target.     |

(c) Monhead

None

Data memory content  
designated by the  
addr. input  $A_1 P_1$ ,  
on the first head  
data output.

(c) Monwrite

None

Data



Memory write to reg

1 X X

Memory write

O O - O

Memory read

O - O O

PC SRC

ALU SHC

Reg. write

O - - O

Reg. Dst.

1 O X X

MUX is not  
used

Instruction name

R-type

LW SW

T-type  
(Bq, Bne)



## Multi-cycle Processor.

⇒ Disadvantages of Single-cycle P.C. -

- (1) Clock cycle must have the same length for every instruction.
- (2) Several inst. could fit in a shorter clock-cycle.

⇒ Goal: Break inst. into steps and apply same clock cycle to each step.

→ functional units can be shared between diff. cycles within one instruction.

⇒ Breaking instructions into steps -

→ Each step takes one clock cycle.

IF (1) Instruction fetch and PC increment

ID (2) Instn. decode and Register fetch

EX (3) Execution, memory computation, branch

MEM (4) Memory Access on R-type instru. comp.

WB (5) Memory read completion  
(write Back)

Steps

Step-3 R-type :  $ALUout = A \text{ op } B$ .

branch : IF ( $A == B$ ),  $PC = ALUout$ .

Jump :

Memory Ref :  $ALUout + \text{signextended} \#1A^{15:1}$

In 5<sup>th</sup> step branch and jump instru. are overl.

Read from 4s (Page) in PPT.

→ skip topics from b/w pages.

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

→ for (a), we require all 5 steps.

→ We have to write all the control signals for all steps.

## Pipelining

Goal: → clock cycle per instruction  $CPI \approx 1$ .  
→ want 100% hardware utilization.  
→ cycle time is not dependent on  
   the slowest instruction.

→ MIPS in RISC Type of Micro-processor  
so, Most instru are of same length.

→ Data flowing Right to left may cause  
   "Hazard".

⇒ Graphical Representation of Pipelining →

(1) Data Hazard

(2) Structure Hazard

(3) Control Hazard.

→ Clock cycle = No. of instructions + pipeline stages - 1.

(1) → When same registers are used for different steps like EX, FD... then it will create a problem is called a "Data Hazard".

e.g. Add \$50, \$51, \$52.

Add \$52, \$50, \$53

Add \$50, \$51, \$52.

(2) → In Multicycle Processor we can't use <sup>same</sup> memory with data and instruction.

e.g. LD \$50, 16(\$53)

Add \$54, \$51, \$55

Add \$52, \$55, \$56

Add \$54, \$55, \$56.

(3)