

## Arch 21

### Overview

ISA → instruction Set Architecture.

Def

Fact → Part of computer system

Theory → used to explain entire system (C/C++)

### Classes of Computer

(I) Personal mobile Device, e.g.: Smart phone

(II) Desktop

own Roberts  
own Comp. Arch  
Depend on it.

ISA

### Microarchitecture

Register-Transfer level

(MP related)

### Slide - 19

Book - 2 (2021)

Def

Parallelism

DLP

mean write DMR Data at a time

longer time for write operation

TIP

Task for one step

Parallelism is a major point } Task level.

## Arch. vs Micro Arch

also known as  
ISA

also known as Organization

## Why we need Arch?

(i) Improve Performance & (ii) Abilities

Arch. is a common future trend for fast comp.

20, fast comp. 20 years

20 years old comp. becomes obsolete comp.

future 20 years fast in old comp. future comp.

## Moore's Law

Memory & capacity is increasing, processor

capacity is increasing

IRAM

Abstraction → user can hide complexity

Arch, C-3Computer Organization & designInternal design of computerSystem SW

Basic difference between Application SW & system SW

OS: HW resource management & system management

1. Processor
2. Memory
3. I/O

Compiler

उत्तम वाक्य लंग. (जिसे तुलना लंग आ  
करते हैं) (Translator)



Computer का component

5 Br, (i) I/P (ii) O/P

Ex: keyboard, Direction: PC तक संदेश देने वाले device  
Mouse, ex: projector

(iii) memory (iv) Data path

Control

one kind of signal

(i) Processor } कम्प्यूटर device  
(ii) Memory } कम्प्यूटर 21  
21 basic criteria

## ISA (Instruction Set Architecture)

~~System set~~ ↗

## Application Binary Interface (ABI)

Application program & OS interface layer ↗  
may be

12

17

equations

## Networking C-9

### FTP

File Transfer Protocol.

Client - Server (2nd) file (txt) / (zip),  
Server -

FTP-client - local file system

→ Client - Server model.

→ Server port: 21.

(establish standard protocol)  
port use 20 or 21

## Arch, c-4

CPU Time = I. count  $\times$  CPI  $\times$  Clock Cycle Time

$$\text{CPU Time} = \frac{\text{I. count} \times \text{CPI}}{\text{clock Rate}}$$

$$\text{CPU clock cycles} = \sum_{i=1}^n (\text{CPI}_i \times C_i)$$

$$\begin{aligned}\text{CPU clock cycles}_1 &= 2 \times 1 + 1 \times 2 + 2 \times 3 \\ &= 10 \text{ cycles}\end{aligned}$$

$$\begin{aligned}\text{CPU clock cycles}_2 &= 4 \times 1 + 1 \times 2 + 1 \times 3 \\ &= 9 \text{ cycles}\end{aligned}$$

|     | A | B | C |
|-----|---|---|---|
| CPI | 1 | 2 | 3 |

|          | A | B | C |
|----------|---|---|---|
| Clock s. | 2 | 1 | 2 |
|          | 2 | 1 | 1 |

CPI: clock cycle per Instruction



## Wafer - die related Math

Wafer 30 cm ~~radius~~. diameter  
die side : 1.5 cm / 1.0 cm



$$\# \text{die per wafer} = \frac{\text{wafer Area}}{\text{Die Area}} = \frac{\pi \times \text{wafer Diameter}}{\sqrt{2 \times \text{Die Area}}}$$

$$\text{so, Dies per wafer} = \frac{\pi \times \left(\frac{30}{2}\right)^2}{(1.5)^2} = \frac{\pi \times 30}{\sqrt{2 \times 2.25}} = 270$$

For 2 cm?

$$\text{Dies per wafer} = \frac{\pi \times \left(\frac{30}{2}\right)^2}{(2)^2} = \frac{\pi \times 30}{\sqrt{2 \times 1}} = 640.8$$

no. of die  $\propto$  area  $\propto$  cost per die  $\propto$  wafer cost



### Home Assignment

- Chip → How it works / manufacturer
- Memory →
- Processor →

(বই পাঠের chapter  
1 পর্যন্ত)

Hard Drive vs SSD  
USB-B vs USB-C

display  
performFly  
performQuack

```
void performFly () throws CannotFly {  
    if (flyBehaviour == null)  
        throw new CannotFly ();  
    flyBehaviour.fly ();  
}  
}
```

② Arch, C = 5

CHAPTER 2: (COD: HSI)

Instructions: the language of the computer

ISA: Instruction Set, Arch

$a = b + c$  instruction  $\Rightarrow$  corresponding binary value  $\Rightarrow$  after parser via ISA.

## Compiler vs Interpreter

MIPS in ISA.

Millions of instruction  
Per Second.

Computer → Converts instruction (HLL or assembly binary) to appropriate 2's complement internal architecture.

ISA -

Natural Language (High Level)

(MID)

Programming  
Lang

Compiler

O/I

NL/AI or symbolic

(Low Level)

(HLL)

Compiler → Converts:

source lang

converts Lang

→ convert

→ target lang:

C lang or Assembly

→ target lang

Compiler Phase Step:

$i = i + 5 \rightarrow$   
add \$50, \$50, #5

(I) Lexical Analysis

(II) Parsing

(III) Semantic Analysis

(IV) Optimization

(V) Code Generation

(i) character by character (for meaningful word शब्द संक्षिप्त) (word identify <sup>परिभ्रमा</sup>)  
NL or word (for Compiler वा त्रैतीय लोगों के लिए)  
शब्द वर्णन।  $a = b + c;$   
 $\uparrow \uparrow \uparrow \uparrow \uparrow \uparrow$   
tokens.

(ii) Parsing (grammatically valid वाक्य  
or check में, language वा grammar  
"CSE 329 To welcome" <sup>संज्ञान</sup> X <sup>वाक्य</sup> valid नहीं

(iii) welcome meaning एट़ ?  
internal symmetric meaning find <sup>परिभ्रमा</sup>

(iv) optimize रूपरूप,  
 $b = 0$   
 $a = b * c,$   
 $| b = 0$   
 $| a = 0$  } multiplication optimize <sup>प्रैग्मान</sup>

Unused variable optimize रूपरूप,

(v) Code generate रूपरूप,

$$a = b + c * 5.0$$

(I)  $\langle \text{id}, 1 \rangle \Leftrightarrow \langle \text{id}, 2 \rangle \oplus \langle \text{id}, 3 \rangle \times \langle \text{id}, 4 \rangle$

(II)  $\langle \text{id}, 1 \rangle \Leftrightarrow \begin{array}{c} t_1 \\ + \\ \langle \text{id}, 2 \rangle \\ \times \\ \langle \text{id}, 3 \rangle \end{array}$

$t_1 = \text{int to float } (5.0)$

(III)  $\left. \begin{array}{l} t_2 = \text{id}_3 + t_1 \\ t_3 = \text{id}_2 + t_2 \end{array} \right\} \text{intermediate code generation}$   
 ~~$\text{id}_1 = t_3.$~~   
 ~~$\text{id}_1 = t_3.$~~

(IV)  $\left. \begin{array}{l} t_1 = \text{id}_3 + 5.0 \\ \text{id}_1 = \text{id}_2 + t_1 \end{array} \right\} \text{optimized}$

11  $r_2, \text{id}_3$  (load)  
mult  $r_2, r_2, \# 5.0$  (multiplication)  
10  $ld \# r_1, \text{id}_2$  (load)  
addt  $r_1, r_2, r_2$  (add)  
sft  $\text{id}_1, r_2$  (sftn)

Arch, C6Chapter 2, class - 2Instruction Set:

The vocabulary of commands understood by a given Architecture.

Example :

(i) MIPS    (ii) ARM    (iii) X86    (iv) RISC.

Stored Program Concept:

The idea that instruction and data of many types can be stored in memory as numbers.



→ After instruction (in binary string) convert  
to ISA. ISA एवं define करने वाले compiler ने define किया ISA.

→ ISA एवं concept एवं stored program concept  
सम्बन्धित हैं।

→ ISA एवं Concept depend हैं।

$$a = a + b + c + d$$



## ② MIPS Operand

32 Reg:

\$ \$0 - \$ \$7, \$ \$10 - \$ \$19, \$ zero, \$ \$00 - \$ \$99,  
\$ \$v0 - \$ \$v1, \$ gp, \$ fp, \$ sr, \$ ra, \$ at.

(32 bit register 2^32)

for general purpose (\$ \$0 - \$ \$7)

for temporary value save. (\$ \$10 - \$ \$19)

## MIPS Assembly Language:

### (I) Arithmetic Operation:

add \$ \$0, \$ \$1, \$ \$2.

sub \$ \$0, \$ \$1, \$ \$2

addi \$ \$0, \$ \$1, 40.

କ୍ରମିକ computer  
program କୁ ସାରି କାଟେ  
general instruction  
use କରା.

### (II) Data Transfer:

lw \$ \$1, 20(\$ \$2)

sw \$ \$1, 20(\$ \$2)

load word

store word

(iii) Logical operation:

and \$s1, \$s2, \$s3  
or  
nor

④ Conditional Branch:

beq \$s1, \$s2, 25

bne \$s1, \$2, 25

slt \$s1, \$2, \$3.

Unconditional Jump:

## ⑤ Principle of designing ISA:

(i) Simple Design

(ii) Smaller and faster.

\$zero register 0 and 0 means zero

mov \$s1, \$s2

add \$s1, \$s2, \$zero

} same instruction

$$\begin{aligned}
 & (s+1)^2 \left( \frac{s-2}{2} \right) \\
 & = 2 \cdot \frac{s^2 - 2s + 4}{s^2} \\
 & = \frac{s^2 - 2s + 4}{2(s+1)^2(s-2)}
 \end{aligned}$$

### Arch. C-7

ISA

MIPS Assembly Language & 5 BT category of instructions

মনে রেখা।

$a = b + c$  |       $d = a - e$  |      MIPS A2 & convert করুন

Add add a, b, c }      Sub d, a, e }      instructions প্রদর্শন

from এবং বিভিন্ন Register level এ ফর্ম করি, তাহলে

variable দ্বারা Register এ load করা হবে এবং যোগ

করি,

$a \rightarrow \$50$ ,  $b \rightarrow \$51$ ,  $c \rightarrow \$52$

$d \rightarrow \$53$ ,  $e \rightarrow \$54$ .

then, add \$50, \$51, \$52  
sub \$53, \$50, \$54.

প্রদর্শন করে।

instruction:  $f = (g+h) - (i+j)$

$\downarrow$        $\downarrow$        $\downarrow$        $\downarrow$        $\downarrow$   
 \$s0      \$s1      \$s2      \$s3      \$s4

design principle  
 simple is better  
 As smaller as faster

### MIPS AL:

```

add  $t0, $s1, $s2
add  $t1, $s3, $s4
sub  $s0, $t0, $t1.
  
```

### B Data Transfer:



MIPS 2 24 command words

LW : memory (24) word register 24 274 0000

SW : register (24) memory 274 0000

1 word = 32 bit = 4 byte

$$g = h + A[8]$$

LW \$t0, 8(\$s3)

base: \$s3

(lw \$t0, 8(\$s3))

8th index: 8(\$s3).

(lw \$t0, 32(\$s3))

MIPS 24 (274) Address 2 (274)

step 4 bit 274,

Binary to Decimal:

$$\begin{array}{r} 1011 \\ \times 2^0 + 0 \times 2^1 + 1 \times 2^2 + 1 \times 2^3 \\ \hline 1011 \end{array}$$

$$= 1 \times 2^0 + 0 \times 2^1 + 1 \times 2^2 + 1 \times 2^3$$

$$> 8 + 0 + 2 + 1$$

$$= 11.$$

$$\boxed{\phantom{000000} \dots 0001011}$$

least significant bit

most sign.

unsigned 32 bit after possible.  $2^{32}$ .

$$(x_{31} \times 2^{31}) + (x_{30} \times 2^{30}) + \dots + (x_0 \times 2^0) \Rightarrow$$

(MAX value) binary  
 $\Rightarrow 11111111111111111111111111111111$ .

negative result  $2^{31} -$

$$(-x_{31} \times 2^{31}) + (x_{30} \times 2^{30}) + \dots + (x_0 \times 2^0) \Rightarrow$$

$$\begin{array}{cccccccccc} 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1100 & \dots \\ \cancel{1} & \cancel{0} & \dots \\ (-x_{31} \times 2^{31}) + (x_{30} \times 2^{30}) + \dots + (x_0 \times 2^0) = ? \end{array}$$

$$0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0010 = 2$$

$$\begin{array}{cccccccccc} 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1101 & \dots \\ \cancel{1} & \cancel{0} & \dots \\ 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & \dots \\ \hline 0000 & 0000 & 0000 & 0000 & 0000 & 0000 & 0000 & 0010 & = 2 \\ 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1101 & = (\text{negative}) \\ \hline 0000 & 0000 & 0000 & 0000 & 0000 & 0000 & 0000 & 0010 & + 1 \\ 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1111 & 1101 & = (\text{positive}) \end{array}$$

add \$20, \$51, \$52 encode ৩২টি ২৮৫ ৩২ bit

পি. ২৭৩ \$ binary code

চেকসি করা আছে

main code (1 add ৩২ bit)

|  |      |   |   |   |      |
|--|------|---|---|---|------|
|  | 5bit | 5 | 5 | 5 | 6bit |
|--|------|---|---|---|------|

opcode  
6 bit

register ১৫  
৩২ bit

32 bit register so, 5 bit রয়ে।

|   |    |    |   |   |    |
|---|----|----|---|---|----|
| 0 | 17 | 18 | 8 | 0 | 32 |
|---|----|----|---|---|----|

add \$51 \$52 \$51 ← MIPS code ৬  
convert ৩২ bit

000000 100001 → (add 0 00000 100000) ৩২ bit word

10010  
01000

finally memory (or input  
and ১১ ০১ instructions)

copy করা অর্থ কে ০ register এর মধ্যে পার্শ্ব রয়ে।

2 bit এর প্রয়োগ ২৮৫ Number (১৩১ হলো),

থেকেন instruction set ৩২ bit রিচে represent

ব্যাখ্যা : basic concept of MIPS Arch.

| OP | rs | rt | rd | shamt | funct |
|----|----|----|----|-------|-------|
| 6  | 5  | 5  | 5  | 5     | 6     |

rs = source operand , ১মst register

rt = ২nd register .

rd = Register destination

shamt = shift amount , funct = function

function defined on the interval  $[0, \pi]$ ,

$$\left\{ \begin{array}{l} \text{so, } t_0 = 8 \\ \Rightarrow \text{if } f(x) \end{array} \right.$$

Laplace, C-

$$\Rightarrow \text{If } f(x) = \frac{a_0}{2} + \sum_{n=1}^{\infty} (a_n \cos nx + b_n \sin nx)$$

then prove that (i)  $a_0 = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) dx$

$$(ii) a_n = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \cos nx dx$$

$$(iii) b_n = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \sin nx dx$$

Prove:  $\frac{a_0}{2} + \sum_{n=1}^{\infty} (a_n \cos nx + b_n \sin nx) \quad (1)$

$f(x) = \frac{a_0}{2} + \sum_{n=1}^{\infty} (a_n \cos nx + b_n \sin nx)$  multiplying both sides of (1) by  $dx$  and integrating from  $x = \pi$  to  $x = -\pi$ , we get,

Date: / /

### Arch, c=8

$$A[300] = h + A[300]$$

$t_1$

\$52

byte mis of F PSET

$so, 300 \times 4 = 1200$

lw \$t\_0, 1200(\$t\_1)

add \$t\_0, \$52, \$t\_0,

sw \$t\_0, 1200(\$t\_1)

| Op | RS | Rt | Rd | address/shift | func |  |
|----|----|----|----|---------------|------|--|
| 35 | 9  | 8  |    | 1200 (16 bit) |      |  |
| 0  | 28 | 8  | 8  | 0 8           | 32   |  |
| 43 | 9  | 8  |    | 1200          |      |  |

MIPS Architecture

R-type operation 2005 - (1) no. 18 270.

MIPS Code BT 270,

Architecture BT 270

MIPS Arch (or I,J-type) instruction/opert

- 1 9225 16 bit fully instruction info. (hi, lo) 16 bit reg Address to indicate ans

For (1), MIPS machine code:

100011 01001 01000 0000 010010110000

00000 10010 0100001000 00000 100000

10101101001 01000 0000010010110000

\*  $S_0 = 16$  bits

\*  $t_0 = 8$  bits

\* last 5 bit (lo) decide बाटू 270 के लिए type instruction

Exm  $A[300] = h + A[300]$   $\Rightarrow$  machine  
Code & com  
प्र० 270

|      |    |    |                |    |                     |  |
|------|----|----|----------------|----|---------------------|--|
| type | op | rn | rt             | rd | Address/short funct |  |
| type | op | ns | rt             |    | Address/immediate   |  |
| type | op |    | Target Address |    |                     |  |

Code number $\$zero \rightarrow 0$  $\$at \rightarrow 1$  (reserved, we use MIPS 30-32 regis  
value  $v_0 - \$v_1 \rightarrow 2-3$  can't use it. Another 25 name. $\$t0 - \$t3 \rightarrow 4-7$  $\$t4 - \$t7 \rightarrow 8-15$  $\$s0 - \$s7 \rightarrow 16-23$  (Saved) $\$t8 - \$t9 \rightarrow 24-25$  (Saved) $\$k0 - \$k1 \rightarrow 26-27 \rightarrow$  special purpose  
△ os use them $\$gp \rightarrow 28$  $\$sp \rightarrow 29$  $\$fp \rightarrow 30$  $\$ra \rightarrow 31$ 

global pointer

stack "

frame "

returned Approach

## Arch, C-9

Logical Operant:

shift left logical (SLL)

shift right logical (SRL)

SLL \$t2, \$s0, 4      so per 4-2<sup>4</sup> bit left shift

| op | rs | rt | rd | shamt | funct |
|----|----|----|----|-------|-------|
| 0  | 0  | 16 | 10 | 4     | 0     |

(operations mode  
define 32)

32 位数 Normal

Add

op I type

operation type is I(15) 位数 (625 maximum  
possible address & 255 declare register 16 bit  
26 bit 地址)

Hexadecimal Cpu Architecture & INT 200

008f--- (hex)

↓  
0000 0000

256 bit 的 value 从 0 到 255 为决定操作  
I型 operation.

and \$t<sub>0</sub>, \$t<sub>1</sub>, \$t<sub>2</sub>.

or

nor

### MIPS code Assembly

(i) bne \$s<sub>3</sub>, \$s<sub>4</sub>, Else

(ii) add \$s<sub>0</sub>, \$s<sub>1</sub>, \$s<sub>2</sub>

(iii) j Exit

(iv) Else: sub \$s<sub>0</sub>, \$s<sub>1</sub>, \$s<sub>2</sub>

(v) Exit:

{

$$f = g + h$$
$$\downarrow \quad \downarrow \quad \downarrow$$
$$s_0 \quad s_1 \quad s_2$$

}

else

{

$$f = g - h;$$

}

(i) branch (not equal),

BS3, ~~\$s<sub>4</sub>~~ Else unconditional jump

loop:

while ( save[i] == k )

{

=

$$i = 1$$

}

Base Address of save stored in  
sc.

Loop: sll \$t<sub>1</sub>, \$s<sub>3</sub>, 2  
add \$t<sub>1</sub>, \$t<sub>1</sub>, \$s<sub>6</sub>  
lw \$t<sub>0</sub>, 0(\$t<sub>1</sub>)  
bne \$t<sub>0</sub>, \$s<sub>5</sub>, Exit  
add \$s<sub>3</sub>, \$s<sub>3</sub>, 1  
j loop

Exit:

(ii) \$t1 stores  $\$53 \xrightarrow{\text{Add 16 bit}} 2^2$  bit shift  $\Rightarrow$  (1)

at Address  $\text{vr}$

(iii)  $\$t1 \leftarrow$  offset address  $\text{mz}$  16 bits  
 $\$56 \xrightarrow{\text{Address}} \text{base} (\text{base}) \xrightarrow{\text{mz}} \text{actual}$   
 physical address, store  $\text{t1}$  loop  $\xrightarrow{\text{t1 = 0}}$   
 $\text{and } \text{Save[i]}$

(iv) ~~\$t1~~  $\$t1 \leftarrow \text{Save[i]}$  16 value  $\text{mz}$ ,  
 $\text{mz}$ ,  $\text{OFFSET 0, mzn, Address - vr}$

Q. target address  $\xrightarrow{\text{mz}}$   $\text{mz}$ ,

•  $t1$  address  $\leftarrow$   $\text{mz}$  - value  $\text{mz}$  of  
 load  $\xrightarrow{\text{mz}}$   $t1$



C code  $\rightarrow$  MIPS Assembly code  $\rightarrow$  Numeric value

↓  
0 1

Date: / /

- ② Today computers are built on Two "Key" principles:
- (i) Instructions are represented as Numbers.
  - (ii) Programs are stored in memory to be read or written, just like data.



Arch, C → 10MIPS Assembly Lang

most used  
② Jump and Link  
jal  $\langle\text{procedureAddress}\rangle$

③ Jump Register  
jr \$tca  $\langle\text{special register, return address}\rangle$

④ Jump  
j  $\langle\text{address}\rangle$

⑤ \$a0 - \$a3  $\rightarrow$  Parameters.

⑥ \$v0 - \$v1  $\rightarrow$  Return Values.

⑦ \$ra  $\rightarrow$  Return Address.

⑧ \$sp  $\rightarrow$  Stack Pointer

int leaf-example (int  $g^{\text{a0}}$ , int  $h^{\text{a1}}$ , int  $i^{\text{a2}}$ , int  $j^{\text{a3}}$ )

{

int f;  
 $f = (g + h) - (i + j)$

return f;

}

procedures in C  
(function, MIPS Arch)  
△ convert arr!

MIPS Arch

Leaf-example:

addi \$sp, \$sp, -12

sw \$t1, 8(\$sp)

t<sub>1</sub>, to, so (return  
earlier value store)

sw \$t0, 4(\$sp)

earlier  
value store

sw \$s0, 0(\$sp)

place 1

add \$t0, \$a0, \$a1

add \$t1, \$a2, \$a3

sub \$s0, \$t0, \$t1

add \$v0, \$s0, \$zero

lw \$s0, 0(\$sp)

load zero  
previous value  
cont.

lw \$t0, 4(\$sp)

lw \$t1, -8(\$sp)

add \$sp, \$sp, 12

jr \$ra



21st 12 bytes for \$sp, cont 13th address  
last 8 bytes \$sp = 20.

(V12), return 1st value v<sub>0</sub> to store  
2<sup>nd</sup>, v<sub>0</sub> to Value stack function returns  
2<sup>nd</sup> stack top address

```
int fact(int n)
```

```
{
    if (n < 1) return 1;
    else return n * fact(n-1);
}
```

fact:

addi \$sp, \$sp, -8.

sw \$ra, 4(\$sp) || currently, return address is  $a_0$

sw \$a0, 0(\$sp) || value stored

*set  
shift less  
than  
immediate* slti \$t0, \$a0, 1 ||  $a_0 < 1$  then to  $t_0$

beq \$t0, \$zero, L1 || to zero value goes to L1 - branch on equal

addi \$v0, \$zero, 1

addi \$sp, \$sp, 8  $a_0 > 1$  then L1 jumps

jr \$ra.

L1: addi \$a0, \$a0, -1  $a_0 = a_0 - 1$

jal fact || simultaneously reg. add in update part,  $a_0 = a_0 - 1$

lw \$a0, 0(\$sp) || load val from stack

lw \$ra, 4(\$sp)

addi \$sp, \$sp, 8

mul \$v0, \$a0, \$v0  $a_0 \otimes 5$  from part,  $s \times f(4)$

jr \$ra.

$n=3$ .

Sltj :-

$$a_0 = 3, < 1? \text{ No. } t_0 = 0$$



net add.

so, beg: jump L1

L1:

$$a_0 = a_0 - 31$$

$$24 - 1 = 3.$$



will again go fact.