

# CSE 305

## Computer Architecture

### Instructions

Prepared by  
**Madhusudan Basak**  
Assistant Professor  
**CSE, BUET**

\*Some modifications done by Saem Hasan



# Instructions

- ❑ A computer is run by instructions
- ❑ Instruction Set
  - All possible instructions for a CPU



# Instruction Set Architecture

- ❑ Instruction set architecture is the **attributes of a computing system** as seen by the assembly language programmer or compiler.
  - Instruction Set (**what operations can be performed?**)
  - Instruction Format (**how are instructions specified?**)
  - Data storage (**where is data located?**)
  - Addressing Modes (**how is data accessed?**)
  - Exceptional Conditions (**what happens if something goes wrong?**)



# Design Philosophy

*It is easy to see by formal-logical methods that there exist certain [instruction sets] that are in abstract adequate to control and cause the execution of any sequence of operations. ... The really decisive considerations from the present point of view, in selecting an [instruction set], are more of a practical nature: simplicity of the equipment demanded by the [instruction set], and the clarity of its application to the actually important problems together with the speed of its handling of those problems.*

Burks, Goldstine, and von Neumann, 1946 ▶



# MIPS Architecture

- Acronym of “Microprocessor without Interlocked Pipelined Stages”
  - Is a **RISC** (Reduced Instruction Set Computer) **ISA** (Instruction Set Architecture)
  - Developed by MIPS Technologies (then MIPS Computer Systems) in 1985



শ্রেণী  
পি  
যাবাব

যোনো decision নিলে যদি যোন design  
principle follow করে।

# Design Principles

- *Design Principle 1:* Simplicity favors regularity.
- *Design Principle 2:* Smaller is faster.
- *Design Principle 3:* Good design demands good compromises.



# Simplicity favors regularity

- ❑ The instructions of MIPS are fixed and rigid
- ❑ Rigidity ensures Regularity
- ❑ Simplicity favors regularity



# R-format



## Example Instruction

- *Add* and *subtract* instructions always follow the following fixed format

|          |                    |                    |                         |
|----------|--------------------|--------------------|-------------------------|
| add      | add \$s1,\$s2,\$s3 | \$s1 = \$s2 + \$s3 | Three register operands |
| subtract | sub \$s1,\$s2,\$s3 | \$s1 = \$s2 - \$s3 | Three register operands |

$$a = b + c + d + e$$

→ subtraction হবে প্রতিরুপ

$$a = b + c + d + e$$

$$\{ \cdot a = b + c + d + e$$

add a, b, c  
add a, a, d  
add a, a, e

# The sum of b and c is placed in a  
# The sum of b, c, and d is now in a  
# The sum of b, c, d, and e is now in a



# Example Instruction

- *Add* and *subtract* instructions always follow the following fixed format

|          |                    |                      |                         |
|----------|--------------------|----------------------|-------------------------|
| add      | add \$s1,\$s2,\$s3 | $\$s1 = \$s2 + \$s3$ | Three register operands |
| subtract | sub \$s1,\$s2,\$s3 | $\$s1 = \$s2 - \$s3$ | Three register operands |

a = b + c;  
d = a - e;



add a, b, c  
sub d, a, e

UP

~~fun ot~~



f = (g + h) - (i + j);

add t0,g,h # temporary variable t0 contains g + h

add t1,i,j # temporary variable t1 contains i + j

sub f,t0,t1 # f gets t0 - t1, which is (g + h) - (i + j)





# MIPS Instruction Properties

- ❑ Arithmetic operations only takes register values as operands
- ❑ Size of a register : 32 bit
- ❑ Number of registers:  $32 \rightarrow 2^5$  (এটি ৫ register representation (ন ৫ bit লাগব)
- Satisfies *Design Principle 2*: Smaller is faster.
  - Large number of registers -> longer time for electronic signal to travel -> Increased access time (বেশি লাগব)
  - Large number of registers -> Increased number of control bits
- ❑ A number is dedicated to a particular register.
  - 5 bit are reserved to indicate each register when the count is 32
  - Compiler maps a program variable to a register
  - There is a number assigned against each register



# MIPS Instruction Properties

## □ Example



৪  
x31

int 4

32

R11

## MIPS Instruction Properties

### □ What about data structures (Arrays and Structures)?

- These are kept in memory and a particular element is transferred to registers whenever required

starting address দ্বারা যথেষ্ট offset  
দিয়ে একই স্থানে access করা যায়।

### □ Compiler

- places an array/structure into memory
- keeps the starting address of the memory in a register
- tracks which register is used for which array/structure



### □ Data Transfer Instructions

- load word:  $lw$       \$t0, 8(\$s3)      A[2] আনতে চাই
- store word:  $sw$       \$t0,



fppt.com

base address

$*(\text{arr} + \text{4} \times i)$  → offset. Base element এবং বাত দুরে পাও।

integer:

$(\text{arr} + \text{4} \times i)$

char :

1 byte করে add করো।

যদি operation register এ হয়।

Array থাকে memory টো,

$$A[2] = A[2] + 5$$

A এর starting address register

\$s3 টো থাকে

base register - memory -এর starting  
addr. রাখে।

$A[2] \rightarrow$  offset হবে 8.

$\$s3\ 8$

to টো load  $\rightarrow$

$\$t0, 8(\$s3)$

lw \$reg, offset(\$base reg)

sw \$reg, offset (\$base reg)

2<sup>32</sup> operation

i) load word - memory-এর addr থেকে  
register এ আনি

ii) store word - register থেকে  
memory টো রেখে দ্যাবে।

# Data Transfer Instructions

□ Example  $\rightarrow \$s2$

□  $A[12] = h + A[8]$

A এর base register  $\$s3$

i)  $A[8]$  load to  $t0$   
 $\$t0, 32(\$s3)$   
 $8 \times 4$

```
lw $t0,32($s3) # Temporary reg $t0 gets A[8]
add $t0,$s2,$t0 # Temporary reg $t0 gets h + A[8]
sw $t0,48($s3) # Stores h + A[8] back into A[12]
        4x12
```

\ add \$dest-reg, \$source-reg1 , \$source-reg2



! যখন 32 টেক বেশি variable থাবে তখন যানুলো বেশি use হয় যোগুলো register-  
map করে রাখে। যাবিগুলো memory তে রাখে।

## Spilling Registers



- What if the number of variables and data structures is more than 32?

- The process of putting less commonly used variables into memory is called Spilling Registers.
- Keeps the most frequently used variables in registers and less frequent ones in memory
- Moves variables around registers and memory



# Operation with Constants

## □ Add immediate or addi instruction

- Takes one register and one constant as input

addi      \$s3,\$s3,4  
add immediate

$$\# \$s3 = \$s3 + 4$$

~~\$s3~~    ~~+~~    ~~4~~    ~~imm~~ +

register এর সাথে যথন value add  
বয়া লাগব। →

addi  
ori  
subi

immediate operation

## □ Constant 0 is used very often (e.g., for transfer operations)

- A register \$zero is dedicated for this purpose

add      \$s2,\$s3,\$zero  
machine code ( 5 bit ) convert হয়।

## □ Is an instruction represented using register names?

register এ বাইটগুলো অণুজড়ি হয়।

$a=b$   
mov a,b } assembly টে। MIPS টে  
add দিয়ে বয়া লাগে।



add, sub এরগুলো কিভাবে? — instruction গুলো number ( 5 bit ) convert হয়।

machine code — memory ରେ ଥାଏ :

ଯେତେ କମି ଥାଏ ଏକବେ ବନ୍ଦ ନା ।

# Representing Instructions in the Computer

## □ Key principles of Today's computer instructions

- Instructions are represented as numbers.
- Program are stored in memory to be read or written, just like data

- instruction memory → machine code convert ହେଲେ load ହୁଏ ।
- data memory → data ଥାଏ ।



# Representing Instructions in the Computer

- An instruction is a sequence of bits
- Each register is mapped to a number
  - There is a convention to map register names into numbers (e.g., registers \$s0 to \$s7 map onto registers 16 to 23) → এমন একটি নিয়ম রয়েছে।
- Each instruction is 32 bit long (size of a word)
  - 32 বit রয়েছে।
  - Satisfies *Design Principle 1: Simplicity favors regularity.*      > 32 বit রয়েছে।
- Instruction Format: format used by an instruction

question → 32 bit রয়েছে। তবে design principle follow করে?



# An MIPS Instruction Format

- R-Format → add, sub, or, and, NOR (যেগুলোতে register লাগে)
  - Deals with arithmetic operations with registers



- *op*: shorthand of opcode, denotes operation type and format type, যেন function করবে,
- *rs*: Source Register1
- ✓ *rt*: Source Register 2
- *rd*: Destination Register
- *shamt*: Shorthand of shift amount
- *funct*: shorthand of function code, denotes the specific variant of an operation

add    \$t0, \$t1, \$t2  
funct    rd    rs    rt



op code — 000000 (always)  
R format ↘  
যোকায়

অন্য format'এ funct নাকে extra bit  
নাই। অন্য op code দিয়ে operation type'ও  
যুক্ত।

# An MIPS Instruction Format



Would it be good to use just one format for all kind of operations?

No!

**Design Principle 3:** Good design demands good compromises.



`add $t0,$t1,5`    { R-format কো যাবে না।  
`lw $t0,32($s3)`

total 32 bit, first 6 bit read → I-format  
then 5 bit read → 1st register এর value

## Another MIPS Instruction Format

### □ I-Format — addi, or i, subi, lw, sw

- Deals with immediate and data transfer operations



func বলে যান্তি bit নাই। op code দিয়েই format এবং operation type ঘোষায়।



# MIPS Instruction Encoding (Simplified)

| Instruction     | Format | op                | rs  | rt  | rd   | shamt | funct             | address  |
|-----------------|--------|-------------------|-----|-----|------|-------|-------------------|----------|
| add             | R      | 0                 | reg | reg | reg  | 0     | 32 <sub>ten</sub> | n.a.     |
| sub (subtract)  | R      | 0                 | reg | reg | reg  | 0     | 34 <sub>ten</sub> | n.a.     |
| add immediate   | I      | 8 <sub>ten</sub>  | reg | reg | n.a. | n.a.  | n.a.              | constant |
| lw (load word)  | I      | 35 <sub>ten</sub> | reg | reg | n.a. | n.a.  | n.a.              | address  |
| sw (store word) | I      | 43 <sub>ten</sub> | reg | reg | n.a. | n.a.  | n.a.              | address  |

$$A[300] = h + A[300];$$

lw \$t0,1200(\$t1) # Temporary reg \$t0 gets A[300]  
 add \$t0,\$s2,\$t0 # Temporary reg \$t0 gets h + A[300]  
 sw \$t0,1200(\$t1) # Stores h + A[300] back into A[300]

Source

| Op | rs | rt | rd | address/<br>shamt | funct |
|----|----|----|----|-------------------|-------|
|----|----|----|----|-------------------|-------|

|    |       |      |      |               |    |
|----|-------|------|------|---------------|----|
| 35 | 9 t1  | 8 t0 |      | 1200 (offset) |    |
| 0  | 18 s2 | 8 t0 | 8 t0 | 0             | 32 |
| 43 | 9 t1  | 8 t0 |      | 1200 (offset) |    |

|        |       |       |                     |       |        |
|--------|-------|-------|---------------------|-------|--------|
| 100011 | 01001 | 01000 | 0000 0100 1011 0000 |       |        |
| 000000 | 10010 | 01000 | 01000               | 00000 | 100000 |
| 101011 | 01001 | 01000 | 0000 0100 1011 0000 |       |        |

R format's op-code  
always zero.

\$t0 = 8

\$t1 = 9

\$s2 = 18

xml MIPS  
এই operation থাবলী SW  
থাবলী



# Summary

## MIPS machine language

$\$S1 = 17$ ,  $\$S2 = 18$ ,  $\$S3 = 19$

| Name       | Format | Example |        |        |         |        |        | Comments                               |
|------------|--------|---------|--------|--------|---------|--------|--------|----------------------------------------|
| add        | R      | 0       | 18     | 19     | 17      | 0      | 32     | add \$s1,\$s2,\$s3                     |
| sub        | R      | 0       | 18     | 19     | 17      | 0      | 34     | sub \$s1,\$s2,\$s3                     |
| addi       | I      | 8       | 18     | 17     | 100     |        |        | addi \$s1,\$s2,100                     |
| lw         | I      | 35      | 18     | 17     | 100     |        |        | lw \$s1,100(\$s2)                      |
| sw         | I      | 43      | 18     | 17     | 100     |        |        | sw \$s1,100(\$s2)                      |
| Field size |        | 6 bits  | 5 bits | 5 bits | 5 bits  | 5 bits | 6 bits | All MIPS instructions are 32 bits long |
| R-format   | R      | op      | rs     | rt     | rd      | shamt  | funct  | Arithmetic instruction format          |
| I-format   | I      | op      | rs     | rt     | address |        |        | Data transfer format                   |



# Logical Operations

## MIPS assembly language

| Category              | Instruction         | Example            | Meaning               | Comments                            |
|-----------------------|---------------------|--------------------|-----------------------|-------------------------------------|
| Logical<br>R format { | and                 | and \$s1,\$s2,\$s3 | \$s1 = \$s2 & \$s3    | Three reg. operands; bit-by-bit AND |
|                       | or                  | or \$s1,\$s2,\$s3  | \$s1 = \$s2   \$s3    | Three reg. operands; bit-by-bit OR  |
|                       | nor                 | nor \$s1,\$s2,\$s3 | \$s1 = ~(\$s2   \$s3) | Three reg. operands; bit-by-bit NOR |
|                       | and immediate       | andi \$s1,\$s2,20  | \$s1 = \$s2 & 20      | Bit-by-bit AND reg with constant    |
|                       | or immediate        | ori \$s1,\$s2,20   | \$s1 = \$s2   20      | Bit-by-bit OR reg with constant     |
|                       | shift left logical  | sll \$s1,\$s2,10   | \$s1 = \$s2 << 10     | Shift left by constant              |
|                       | shift right logical | srl \$s1,\$s2,10   | \$s1 = \$s2 >> 10     | Shift right by constant             |



No NOT Operation?

↪ NOR দিয়েই এম যায়।

$\$s3 \rightarrow$  zero register করে  
দিয়ো।

sll  $\$s1, \$s2, 10$   
register      constant



# Shift Left

- `sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits`  
↳ shift amount ৪ যাবে।

| op | rs | rt | rd | shamt | funct |
|----|----|----|----|-------|-------|
| 0  | 0  | 16 | 10 | 4     | 0     |

shift operation  
হচ্ছে বেকার্য



# Branching Operations

branch equal

beq register1, register2, L1

Label

bne register1, register2, L1

branch not equal

if (i==j)

f = g + h; → add \$s0, \$s1, \$s2

else

f = g - h;

↳ sub \$s0, \$s1, \$s2

Else: sub \$s0, \$s1, \$s2

Exit:

slt \$t0, \$s3, \$s4 # \$t0 = 1 if \$s3 < \$s4

slti \$t0, \$s2, 10 # \$t0 = 1 if \$s2 < 10

sltu \$t1, \$s0, \$s1 # unsigned comparison

slt → set less than.

design principle: Smaller is faster

if-else

a+b

c+d

go to L1

jump

L1 : dte

Label (assembly টে  
label দিয়ে বাজ  
হয়)

MIPS Assembly Code:

bne \$s3, \$s4, Else # go to Else if i ≠ j

add \$s0, \$s1, \$s2 # f = g + h

j Exit

Else: sub \$s0, \$s1, \$s2

# f = g - h

mention করে দিয়ে

i = \$s3

j = \$s4

f = \$s0

g = \$s1

h = \$s2

যোগো branching হচ্ছেনা।

Why not blt?



blt → একটি যোগো operation নাই। 2 step করি-

1. slt দিয়ে check

2. beq দিয়ে \$zero এর মাঝে check

if ( $i == j$ )

$f = g + h; \rightarrow \text{add } \$s_0, \$s_1, \$s_2$

else

$f = g - h;$

$\hookrightarrow \text{sub } \$s_0, \$s_1, \$s_2$

bne  $\$s_3, \$s_4, L_1 \rightarrow i, j \text{ equal?}$

add  $\$s_0, \$s_1, \$s_2$

→  $L_1 \hookrightarrow \text{jump}$ .

j Exit

L1: sub  $\$s_0, \$s_1, \$s_2$

unconditional jump

Exit:

or.

beq  $\$s_3, \$s_4, L_1$

sub  $\$s_0, \$s_1, \$s_2$

j Exit

L1: add  $\$s_0, \$s_1, \$s_2$

Exit.

if ( $i < j$ )

$f = g + h$

else

$f = g - h$

slt  $\$t_0, \$s_3, \$s_4$

beq  $\$t_0, \$zero, L_1$

add  $\$s_0, \$s_1, \$s_2$

j Exit

L1: sub  $\$s_0, \$s_1, \$s_2$

Exit:

# Branching Operations

## MIPS assembly language

| Category           | Instruction                      | Example             | Meaning                                     | Comments                            |
|--------------------|----------------------------------|---------------------|---------------------------------------------|-------------------------------------|
| Conditional branch | branch on equal                  | beq \$s1,\$s2,25    | if (\$s1 == \$s2) go to PC + 4 + 100        | Equal test; PC-relative branch      |
|                    | branch on not equal              | bne \$s1,\$s2,25    | if (\$s1 != \$s2) go to PC + 4 + 100        | Not equal test; PC-relative         |
|                    | set on less than                 | slt \$s1,\$s2,\$s3  | if (\$s2 < \$s3) \$s1 = 1;<br>else \$s1 = 0 | Compare less than; for beq, bne     |
|                    | set on less than unsigned        | sltu \$s1,\$s2,\$s3 | if (\$s2 < \$s3) \$s1 = 1;<br>else \$s1 = 0 | Compare less than unsigned          |
|                    | set less than immediate          | slti \$s1,\$s2,20   | if (\$s2 < 20) \$s1 = 1;<br>else \$s1 = 0   | Compare less than constant          |
|                    | set less than immediate unsigned | sltiu \$s1,\$s2,20  | if (\$s2 < 20) \$s1 = 1;<br>else \$s1 = 0   | Compare less than constant unsigned |
| Unconditional jump | jump                             | j 2500              | go to 10000                                 | Jump to target address              |
|                    | jump register                    | jr \$ra             | go to \$ra                                  | For switch, procedure return        |
|                    | jump and link                    | jal 2500            | \$ra = PC + 4; go to 10000                  | For procedure call                  |



## branch operation ৰে constant থাবে

| Instruction         | Example            | Meaning                                     | Comments                        |
|---------------------|--------------------|---------------------------------------------|---------------------------------|
| branch on equal     | beq \$s1,\$s2,25   | if (\$s1 == \$s2) go to PC + 4 + 100        | Equal test; PC-relative branch  |
| branch on not equal | bne \$s1,\$s2,25   | if (\$s1 != \$s2) go to PC + 4 + 100        | Not equal test; PC-relative     |
| set on less than    | slt \$s1,\$s2,\$s3 | if (\$s2 < \$s3) \$s1 = 1;<br>else \$s1 = 0 | Compare less than; for beq, bne |

এটা instruction পৰে jump কৰা লাগবে

per instruction 4 byte.

$\therefore 25 * 4 \rightarrow 100$  byte পৰে jump.

• PC — program counter (instruction এখন load কৰা)

by default PC+4 কৰা  
লাগে,

jump compute কৰবলৈ PC+4 এর  
মালেকো। কাৰণ PC+4 by default কৰাই থাকে,





# jr \$ra (return address)

→ function থেকে return এর অবস্থা সে। ২য়। ra

register এ direct jump, function এর কাজ শেষে

শেষায় back করবে ra তে থাকে।

# jal 2500 → \$ra = PC + 4 (return করা লগিক next instruction)

jump & link ← 10000 এ যাবে।

যদি instruction এর  
address এ প্রতি part  
থাকবে, এই প্রতি 2-bit  
address এ দেখ না। ফলে  
jump এর range বাড়বে।

# Memory Allocation for Program and Data

- ❑ Reserved Memory
- ❑ Text Segment: Program Code
- ❑ Static Data Segment:
  - Contains static variables, constants, arrays
- ❑ Dynamic Data Segment (Heap):
  - Contains linked lists, variable length data
- ❑ Stack
  - Contains function local variables, parameters





Stack → function এর instruction থাকে e.g. দেখায় return.

# Stack During Procedure Call

- Stores/saves the values of registers and Restores those later
- Either \$sp and \$fp combination, or only \$sp is used
- The allocated stack area by a procedure is known as activation record or procedure frame

```
int non_leaf()
{
    int a=10, b=5, c=2, d=7, x;
    x= leaf(a,b,c,d);
    return x;
}

int leaf(int g, int h, int i, int j)
{
    int f;
    f=(g+h)-(i+j);
    return f;
}
```





non-leaf থেকে leaf যাওয়ার পথযায়। leaf থেকে return এর পথ, আর থাকবে না। non-leaf থেকে leaf func.

call →

Step 1. non leaf func এর argument stack গুলো store.

2. non leaf function বেছায় return কর্তৃত মৌলিক store.

3. এক func থেকে অন্য func যাবে then return এর পথ আগের func. এর স্বয়় যেন saved থাকে।

fp → certain function এর জন্য (stack এর বেছায় আছে)

↳ starting address

sp → stack pointer (currently বেছায় আছে)

fp to sp → activation range

$\$V_0 - \$V_1 \rightarrow$  return value store

আধো যেনি value return করতে থাকে  
stack use.

# Stack During Procedure Call

- Stores the values of registers

→ tmp register ও store করা নাগাতে পারো

| Name      | Register number | Usage                                        | Preserved on call? |
|-----------|-----------------|----------------------------------------------|--------------------|
| \$zero    | 0               | The constant value 0                         | n.a.               |
| \$v0-\$v1 | 2-3             | Values for results and expression evaluation | no                 |
| \$a0-\$a3 | 4-7             | Arguments (Parameter passing)                | no                 |
| \$t0-\$t7 | 8-15            | Temporaries                                  | no                 |
| \$s0-\$s7 | 16-23           | Saved                                        | yes                |
| \$t8-\$t9 | 24-25           | More temporaries                             | no                 |
| \$gp      | 28              | Global pointer                               | yes                |
| \$sp      | 29              | Stack pointer                                | yes                |
| \$fp      | 30              | Frame pointer                                | yes                |
| \$ra      | 31              | Return address                               | yes                |

preserve  
করাই  
নাগাতে



Question  
DATA SHEET

# Procedures in MIPS

## □ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.



# Procedures in MIPS

## ❑ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

১০ - ১৩  
আর্কো এন্ড লাগল  
register এ ফিলে।

## ❑ Place for parameters:

- \$a0-\$a3: four argument registers in which to pass parameters
- Memory (e.g., stack) can be used if more values to be passed



# Procedures in MIPS

## ❑ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

## ❑ Transfer Control:

### ➤ Jump-and-link instruction (jal)

jal *ProcedureAddress* → \$ra ← PC + 4 store ← PC

- Keeps the return address for the procedure in \$ra and jumps to the *ProcedureAddress*



# Procedures in MIPS

## ❑ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

## ❑ Acquiring the storage resources

- Different register values (manipulated by caller) are stored in the stack
- Prepares the registers for operations
- Can use the memory for data storage



# Procedures in MIPS

## □ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
  2. Transfer control to the procedure.
- 3. Acquire the storage resources needed for the procedure.**
4. Perform the desired task.
  5. Put the result value in a place where the calling program can access it.
  6. Return control to the point of origin, since a procedure can be called from several points in a program.

| Name      | Register number | Usage                                        | Preserved on call? |
|-----------|-----------------|----------------------------------------------|--------------------|
| \$zero    | 0               | The constant value 0                         | n.a.               |
| \$v0-\$v1 | 2-3             | Values for results and expression evaluation | no                 |
| \$a0-\$a3 | 4-7             | Arguments                                    | no                 |
| \$t0-\$t7 | 8-15            | Temporaries                                  | no                 |
| \$s0-\$s7 | 16-23           | Saved                                        | yes                |
| \$t8-\$t9 | 24-25           | More temporaries                             | no                 |
| \$gp      | 28              | Global pointer                               | yes                |
| \$sp      | 29              | Stack pointer                                | yes                |
| \$fp      | 30              | Frame pointer                                | yes                |
| \$ra      | 31              | Return address                               | yes                |



# Procedures in MIPS

## ❑ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
  2. Transfer control to the procedure.
  3. Acquire the storage resources needed for the procedure.
- 4. Perform the desired task.**
5. Put the result value in a place where the calling program can access it.
  6. Return control to the point of origin, since a procedure can be called from several points in a program.

## ❑ Performing Tasks

- Can perform all the operations allowed by the MIPS instruction set



# Procedures in MIPS

## Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
  2. Transfer control to the procedure.
  3. Acquire the storage resources needed for the procedure.
  4. Perform the desired task.

**5. Put the result value in a place where the calling program can access it.**

  6. Return control to the point of origin, since a procedure can be called from several points in a program.

## □ Result storing by the callee

- \$v0-\$v1: two value registers in which to return values
  - Memory (e.g., stack) can be used if more values to be returned

47 return \$V0  
27 return \$V1



# Procedures in MIPS

## ❑ Six steps during a procedure call

1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

## ❑ Procedure returned

- Stack is adjusted using \$sp and \$fp pointers
- An unconditional jump to the address from where the caller will resume execution

*jr \$ra*



Online

C code to MIPS

specialized  
register.

$a_0 - a_3 \rightarrow$  parameter  
 $v_0 - v_1 \rightarrow$  return

normal register এর মাত্র যে কোন ধরণের না।

T

# An Example

non\_leaf:

2. function call

```
int non_leaf()
{
    int a, b, c, d, x;
    ...
    x=leaf(a,b,c,d);
    ...
}

int leaf(int g, int h, int i, int j)
{
    int f;
    f=(g+h)-(i+j);
    return f;
}
```

3. stack

4. function

করা

বয়া।

5. Return

6. stack থেকে

আগের value  
read করে আলাদা।

...  
add \$a0, \$s1, \$zero  
add \$a1, \$s2, \$zero  
add \$a2, \$s3, \$zero  
add \$a3, \$s4, \$zero

jal leaf  
add \$s1, \$zero, \$v0

leaf: addi \$sp, \$sp, -12

sw \$t1, 8(\$sp)  
sw \$t0, 4(\$sp)  
sw \$s0, 0(\$sp)

add \$t0,\$a0,\$a1  
add \$t1,\$a2,\$a3  
sub \$s0,\$t0,\$t1

add \$v0,\$s0,\$zero

lw \$s0, 0(\$sp)  
lw \$t0, 4(\$sp)  
lw \$t1, 8(\$sp)  
addi \$sp,\$sp,12

jr \$ra

1.  $a_0 - a_3$  টি argument দেয়া।

3yq

# adjust stack to make room for 3 items  
# save register \$t1 for use afterwards  
# save register \$t0 for use afterwards  
# save register \$s0 for use afterwards

# register \$t0 contains g + h  
# register \$t1 contains i + j  
#  $f = t0 - t1$ , which is  $(g + h) - (i + j)$

# Set return Value

# restore register \$s0 for caller  
# restore register \$t0 for caller  
# restore register \$t1 for caller  
# adjust stack to delete 3 items

# jump back to calling routine

stack থেকে read করা লেখ

আগের \$sp কে আগের address ক  
set করে দিবো।

Put parameters in a place where the procedure can access them.

Transfer control to the procedure.

Acquire the storage resources needed for the procedure

Perform the desired task.

Put the result value in a place where the calling program can access it.

Return control to the point of origin, since a procedure can be called from several points in a program.

$$t = (g+h) - (i+j)$$

3 ৰି register use কରିବୋ, ৱେଳେ value store କରିଲାଗିବୁ।

12 byte write କରିଲାଗିବୁ।

value insert  $\rightarrow$  higher  $\rightarrow$  lower  
address ଏହାଙ୍କି।

- non-leaf function ପରିବାଞ୍ଚିତ କରାଯାଇଥାଏ  $t_0, t_1$  ଏର  
value change ହିଁ ଯାଏ, ତାହାର ଆଗେର value  
store କରିଲାଗେ (stack)।

- ଏହା order କି store କରିବୋ, ତାହାର reverse  
order କି load କରିବୋ।  
 $\hookrightarrow$  lower  $\rightarrow$  higher

t.f: କିମ୍ବା optimized ହେବେ register use କରିଯାଇବୁ ?

ଏହା ଏଥିରେ function ପରିବାଞ୍ଚିତ କରିବୁ ହେଲାବୋ, then store load ଏରି କାହାରେ।



\$sp 100 ଟି ଛିନ୍ତି 3 ৰି register  
save କରାଯାଇଥାଏ  $(100 - 12) / 8 = 88$  ଟି  
\$sp କେବେ ଆମାରି, Then \$sp  
base address ଏରି ଦ୍ୱାରା offset add  
କରିବୁ value ମୁଣ୍ଡିଲେ store କରିବୋ।

(Code 4)

return address  
Recursion

বারবার change হয়।

# Nested Call

```
int fact(int n)
{
    if(n<1)
        return 1;
    else
        return n*fact(n-1);
}
```

\$sp ধাগের  
point এ নে  
দিও  
n-1 call করা  
পাস দেও arg  
করা -1 করা  
দিও।

|         |                     |                                                    |
|---------|---------------------|----------------------------------------------------|
| fact:   | addi \$sp, \$sp, -8 | # adjust stack for 2 items                         |
|         | sw \$ra, 4(\$sp)    | # save the return address                          |
|         | sw \$a0, 0(\$sp)    | # save the argument n                              |
| if-else | slti \$t0,\$a0,1    | নতুন                                               |
|         | beq \$t0,\$zero,L1  | func.                                              |
|         | addi \$v0,\$zero,1  | call হয়                                           |
| L1:     | addi \$a0,\$a0,-1   | n টাই রা                                           |
|         | jal fact            | change করা                                         |
|         | lw \$a0, 0(\$sp)    | নটুন প্রট                                          |
|         | lw \$ra, 4(\$sp)    | ra load                                            |
|         | addi \$sp, \$sp, 8  | বরি নটুন                                           |
|         | mul \$v0,\$a0,\$v0  | # n >= 1: argument gets (n - 1)                    |
|         | jr \$ra             | jal fact # call fact with (n - 1)                  |
|         |                     | value টা থকা # return from jal: restore argument n |
|         |                     | fact এর # restore the return address               |
|         |                     | # adjust stack pointer to pop 2 items              |
|         |                     | # return n * fact (n - 1)                          |
|         |                     | # return to the caller                             |

function call হয়েছে তাই ra load

করলাম। আর arg. change করে n-1

বরেছি। আলোর arg. stack টিল যেটা  
load করবো।

2.  $\text{SLT}$  ক্ষিতি preserve করা আছে,

• argument  
• return address.

$4 \times 2 \rightarrow 8$  byte storage আছে।

$\text{SLT} \rightarrow$  set less than immediate  
const. আছে এই

$$\$ a_0 = n$$

$$\$ t_0 = 1 \quad n < 1$$

$$\$ t_0 = 0 \quad n \geq 1$$

## Pseudo direct addressing

J < 5000

(jump)

# J-Format

- To support long jump to a remote procedure address

32 bit



6 bits

opcode

↳ (J-Format)

26 bits

Jumping Address

আরো 6 bit লাগবে

$26 \ll 2$  [last 2 bit - 0 থাবে]

যাই যদি দিবে এখন  
(right shift)]

28 bit

যাবি  $32 - 28 = 4$  bit পাবে PCতে

PC - 31 30 29 28 (27 - 0)

MSB 4 bit



current execution এর  
address থাকবে



→  $2^{28}$  256 MB হৈব একটা program যাবি রাখত থাকে না।

32 bit হৈব যদি আবেও address register তোথে jr \$r দিবো  
গুলা থাকে।

lw  
sw } 4 byte

char → 1 byte এর জন্য cmd লাগবো

# ASCII representation of characters

- ASCII stands for *American Standard Code for Information Interchange*
- Uses 1 byte to represent a character

| ASCII value | Character | ASCII value | Character | ASCII value | Character | ASCII value | Character | ASCII value | Character | ASCII value | Character |
|-------------|-----------|-------------|-----------|-------------|-----------|-------------|-----------|-------------|-----------|-------------|-----------|
| 32          | space     | 48          | 0         | 64          | @         | 80          | P         | 96          | ~         | 112         | p         |
| 33          | !         | 49          | 1         | 65          | A         | 81          | Q         | 97          | a         | 113         | q         |
| 34          | *         | 50          | 2         | 66          | B         | 82          | R         | 98          | b         | 114         | r         |
| 35          | #         | 51          | 3         | 67          | C         | 83          | S         | 99          | c         | 115         | s         |
| 36          | \$        | 52          | 4         | 68          | D         | 84          | T         | 100         | d         | 116         | t         |
| 37          | %         | 53          | 5         | 69          | E         | 85          | U         | 101         | e         | 117         | u         |
| 38          | &         | 54          | 6         | 70          | F         | 86          | V         | 102         | f         | 118         | v         |
| 39          | '         | 55          | 7         | 71          | G         | 87          | W         | 103         | g         | 119         | w         |
| 40          | (         | 56          | 8         | 72          | H         | 88          | X         | 104         | h         | 120         | x         |
| 41          | )         | 57          | 9         | 73          | I         | 89          | Y         | 105         | i         | 121         | y         |
| 42          | *         | 58          | :         | 74          | J         | 90          | Z         | 106         | j         | 122         | z         |
| 43          | +         | 59          | :         | 75          | K         | 91          | [         | 107         | k         | 123         | {         |
| 44          | ,         | 60          | <         | 76          | L         | 92          | \         | 108         | l         | 124         |           |
| 45          | -         | 61          | =         | 77          | M         | 93          | ]         | 109         | m         | 125         | }         |
| 46          | -         | 62          | >         | 78          | N         | 94          | ^         | 110         | n         | 126         | _         |
| 47          | /         | 63          | ?         | 79          | O         | 95          | -         | 111         | o         | 127         | DEL       |



# 1 Byte Memory Operation

- ASCII demands 1-byte memory operation
- MIPS supports

*signed* {  
lb \$t0,0(\$sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of \$t0  
sb \$t0,0(\$gp) #Reads 1 byte from the lowest (rightmost) byte of \$t0 and stores in the memory

Unsigned Version:

Load Byte: lbu

Store Byte: Not available in MIPS

যদি a load করি

ম্যট তে 1 থাকি 24 bit ।

হয়ে যাবে, so value change

হয়ে যাবে। তখন lbu করবা।

load করে 4 byte / 32 bit register এ write করিঃ।



sign extension

হব

MSB তে যে bit থাকে ম্যট  
extend করে।



sbu দেই।  $\rightarrow$  store করার পদ্ধতি signed/unsigned  
এর মানে কানেক্স দেই। অখন  
কোনো operation নেই। just 8 bits store.

-2 load করা হলো  $\rightarrow$



থাকি 24 bit এ 1 হব।

# Dealing with Strings

## ❑ Three commonly available strategies

1. ➤ the first position of the string is reserved to give the length of a string
2. ➤ an accompanying variable has the length of the string
3. ➤ the last position of a string is indicated by a character used to mark the end of a string — ଲେଟ୍ ଫୋଲୋ ଏଣ୍ଟି ନ୍ୟୁ,



# Dealing with String: Example

- ASCII demands 1-byte memory operation
- MIPS supports

lb \$t0,0(\$sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of \$t0

sb \$t0,0(\$gp) #Reads 1 byte from the lowest (rightmost) byte of \$t0 and stores in the memory

Unsigned Version:

Load Byte: lbu

Store Byte: Not available in MIPS



# Dealing with String: An Example

```
void strcpy (char x[], char y[])
{
    int i;
    i = 0;
    while ((x[i] = y[i]) != '\0')
        i += 1;
}
```

x,y ଟାର୍ଗେ ଅର୍ଥ ମେମୋରି  
ତେ ଏହାକିମୁଁ ମେମୋରି ଦେଖାଯାଇଛି  
read ହେଉଥାନେ write କରି  
ଲାଗିବା.

string  
operation

L1:

strcpy: addi \$sp,\$sp,-4  
 sw \$s0, 0(\$sp)  
 add \$s0,\$zero,\$zero  
 L1: add \$t1,\$s0,\$a1  
lbu \$t2, 0(\$t1) → y ପରି ବେସ # \$t2 = y[i]  
 add \$t3,\$s0,\$a0 address  
 sb \$t2, 0(\$t3) ← ଏହାକିମୁଁ  
 beq \$t2,\$zero,L2  
 addi \$s0, \$s0,1 x ଏହାକିମୁଁ  
 j L1 address

L2:

lw \$s0, 0(\$sp)  
 addi \$sp,\$sp,4  
 jr \$ra

# adjust stack for 1 more item  
 # save \$s0  
 #  $i = 0 + 0$   
 # address of  $y[i]$  in \$t1  
 #  $t2 = y[i]$   
 # address of  $x[i]$  in \$t3  
 #  $x[i] = y[i]$   
 # if  $y[i] == 0$ , go to L2  
 #  $i = i + 1$   
 # go to L1. While loop continues  
 # End of string. Restore old \$s0  
 # pop 1 word off stack  
 # return

# String Copy Example

- MIPS code:

```
strcpy:  
    addi $sp, $sp, -4      # adjust stack for 1 item  
    sw   $s0, 0($sp)       # save $s0  
    add  $s0, $zero, $zero # i = 0  
L1: add  $t1, $s0, $a1      # addr of y[i] in $t1  
    lbu  $t2, 0($t1)       # $t2 = y[i]  
    add  $t3, $s0, $a0      # addr of x[i] in $t3  
    sb   $t2, 0($t3)       # x[i] = y[i]  
    beq  $t2, $zero, L2    # exit loop if y[i] == 0  
    addi $s0, $s0, 1        # i = i + 1  
    j    L1                # next iteration of loop  
L2: lw   $s0, 0($sp)       # restore saved $s0  
    addi $sp, $sp, 4        # pop 1 item from stack  
    jr  $ra                # and return
```

# Dealing with Multiple Languages

- ❑ Unicode, a universal encoding, supports alphabets of most human languages
  - ❑ Java uses Unicode → 2 byte ~~MTCT~~
  - ❑ Requires 16 bits to represent a character
  - ❑ Operations required to load and store 16bits (halfwords)
    - Load halfword: lh                          4 byte word
    - Load halfword unsigned: lhu                2 byte halfword
    - Store halfword: sh
- shu → শু



# Constant Size: Is larger feasible?

- The I-format is used by both immediate and memory data transfer operations



- More than 16 bit long constant?

- We have two options in hand
  - Supporting short constant in 1 instruction and deal long constant with additional instructions
  - Supporting long constant in 2 instructions ১st add (16 bits), 2nd add (16 bit)
- We opted for the first option (to exploit the benefit of common case fast)

- Are we limited to use 16 bit constants (-32,768 to 32,767)?

- No ☺ Use two instructions to populate a register with 32 bit constant



# Manipulating Larger Constant

## □ Populating a register with 32 bit constant

- Load Upper Immediate (lui) : Loads upper 16 bits
- Or Immediate (ori): : Loads lower 16 bits

## □ Example: $x = y + 4000000$

4000000 = 0000 0000 0011 1101 0000 1001 0000 0000

lui \$s0, 61      upper part load      // 61 = 0000 0000 0011 1101

ori \$s0, 2304    পুরো 16 bit add    // 2304 = 0000 1001 0000 0000

add \$s1, \$t0, \$s0

No sign extension for logical operations but it happens for arithmetic operations

MSB load  
(16bit)



## □ Think how to use 32 bit address to access data from memory



addi \$s0, \$zero, 2304 → left G 0 দিয়ে fill  
negative value 2304 → left G 1 দিয়ে fill. } sign extension

# Addressing in Branches I format

- PC is 32 bit but the address in conditional branching (if-else, loop etc.) is 16 bit  
*bne \$s0,\$s1,Exit #go to Exit if \$s0 ≠ s1*

| op     | rs     | rt     | constant or address |
|--------|--------|--------|---------------------|
| 6 bits | 5 bits | 5 bits | 16 bits             |

- Forces the program instruction count within  $2^{16}$  instructions ( $2^{18}$  Bytes=128 KB)
- Most of the conditional branches jump nearby (weather forward or backward)
- Use PC relative addressing to access forward (PC + relative constant address), or backward (PC - relative constant address) location
- Can support branching up to  $\pm 2^{15}$  relative constant address value
- PC normally increments 1 word (4 bytes) after every instruction
- The addressing should be (PC+4 + relative constant address) for forward access and (PC + 4 - relative constant address) for backward access

# Addressing in Jumps

❑ PC is 32 bit but the address in Jumps is 16 bit

❑ PC is 32 bit but the address in Jumps is 16 bit

➤ Forces the program instruction count within  $2^{26}$  instructions ( $2^{28}$ )

❑ Replaces the 28 rightmost byte of PC

➤ Known as pseudodirect addressing

➤ A program is not placed across an address boundary of 256 MB

➤ Otherwise, a jump must be replaced by a jump register instruction preceded by other instructions to load the full 32-bit address into a register

opcode

Jumping Address

➤ Forces the program instruction count within  $2^{26}$  instructions ( $2^{28}$ )



# Addressing in Jumps

- PC is 32 bit but the address in Jumps is 16 bit

- ❑ PC is 32 bit but the address in Jumps is 16 bit
    - Forces the program instruction count within  $2^{26}$  instructions ( $2^{28}$ )
  - ❑ Replaces the 28 rightmost byte of PC
    - Known as pseudodirect addressing
    - A program is not placed across an address boundary of 2.56 MB
    - Otherwise, a jump must be replaced by a jump register instruction preceded by other instructions to load the full 32-bit address into a register

## opcode

# Jumping Address

- Forces the program instruction count within  $2^{26}$ instructions ( $2^{28}$ )

- Replaces the 28 rightmost byte of PC

- Known as pseudodirect addressing
  - A program is not placed across an address boundary of 256 MB
  - Otherwise, a jump must be replaced by a jump register instruction preceded by other instructions to load the full 32-bit address into a register



# Food for Thought

- ❑ Can you reverse engineer a binary?
- ❑ Is your IP secured?

?



# Jump Addressing

- Jump (`j` and `jal`) targets could be anywhere in text segment
  - Encode full address in instruction



- (Pseudo)Direct jump addressing
  - Target address =  $PC_{31\dots 28} : (address \times 4)$



ক্ষম হৈ ফিট আৰে,

Code কৃতি corresponding machine code

# Target Addressing Example

- Loop code from earlier example
  - Assume Loop at location 80000

Loop: s11 \$t1, \$s3, 2 80000  
add \$t1, \$t1, \$s6 80004  
lw \$t0, 0(\$t1) 80008  
bne \$t0, \$s5, Exit<sup>PC</sup> 80012  
addi \$s3, \$s3, 1 <sup>PC+4</sup> 80016  
j Loop 80020  
Exit: ... 80024

| Machine code  | \$t2=10 \$s2             |
|---------------|--------------------------|
| 0 0 19 9 4 0  |                          |
| 0 9 22 9 0 32 |                          |
| 35 9 8 0 0 0  |                          |
| 5 8 21 2 0 0  | ← PC+4 এর respect        |
| 8 19 19 1 0 0 |                          |
| 2 0 0 0 0 0   |                          |
| 2 0 0 0 0 0   | 20000                    |
| 1 0 0 0 0 0   |                          |
| 2 0 0 0 0 0   |                          |
| ins. count    | প দ্বাৰা divide কৰে বোনা |

80012 execute হওয়াৰ ক্ষমতা

PC 80016 টে point (by default next ins. execute হোতে গাঈ)

# Branching Far Away

- If branch target is too far to encode with 16-bit offset, assembler rewrites the code
- Example

```
beq $s0,$s1, L1  
      ↓  
bne $s0,$s1, L2  
j L1  
L2: ...
```

↳ beq পিছে লিখান

assembler যখন দুর্বল 16bit একেন তখন

ins. change হয়ে bne করে নিবে।

beq রেভেন্যুটে bne করে।

26bit সাথে অনেক addr যথার জন্য

যান্ত্র বড় 1ddr রলে jump register



Xm ৰ অংশতে

explanation  
diagram  
example

# Addressing Mode Summary



# Synchronization

- ❑ Two processors sharing an area of memory
  - P1 writes, then P2 reads
  - Data race if P1 and P2 don't synchronize
    - Result depends of order of accesses
- ❑ Hardware support required
  - Atomic read/write memory operation
  - No other access to the location allowed between the read and write
- ❑ Could be a single instruction
  - E.g., atomic swap of register  $\leftrightarrow$  memory
  - Or an atomic pair of instructions



# Synchronization in MIPS

- Load linked: **ll rt, offset(rs)**
- Store conditional: **sc rt, offset(rs)**

- Succeeds if location not changed since the **ll**
  - Returns 1 in rt
- Fails if location is changed
  - Returns 0 in rt

- Example: atomic swap (to test/set lock variable)

```
try: add $t0,$zero,$s4 ;copy exchange value
      ll $t1,0($s1)    ;load linked
      sc $t0,0($s1)    ;store conditional
      beq $t0,$zero,try ;branch store fails
      add $s4,$zero,$t1 ;put load value in $s4
```



OS - (virtual memory এবং physical  
memory কেন্দ্র)

# Translation and Startup



এটি program কে abstraction দেওয়া হয় যে তার জন্য full memory আছে।  
— Virtual memory

MIPS က ၂၂၁၇၃ — machine ဘဲ convertible

# Assembler Pseudoinstructions

- ❑ Most assembler instructions represent machine instructions one-to-one
  - ❑ Pseudoinstructions: figments of the assembler's imagination

MIPS<sub>L</sub> { move \$t0, \$t1 → add \$t0, \$zero, \$t1  
 লাই blt \$t0, \$t1, L → slt \$at, \$t0, \$t1  
 এবং ins.  
 bne \$at, \$zero, L } Higher level assembly  
 lang L এবং ins থেকে  
 ➤ \$at (register 1): assembler temporary  
 ↗ lower level (MIPS)



obj file ഒരു machine code + ഖാലീ വിവരങ്ങൾ ആകും



# Producing an Object Module

- ❑ Assembler (or compiler) translates program into machine instructions
- ❑ Provides information for building a complete program from the pieces
  - Header: described contents of object module — procedure name, data size (memory at)
  - Text segment: translated instructions — machine code പാട്ട്  
process ഫോറ്മാറ്റ്
  - Static data segment: data allocated for the life of the program  
space നിബിഡ്,
  - Relocation info: for contents that depend on absolute location of loaded program A procedure B ക്ക് call ചെയ്തിരുന്ന്, B യുടെ addr. പെഖാനേതാവാൻ
  - Symbol table: global definitions and external refs — code പാട്ട് symbol അക്കേ ഇവ നിയേ
  - Debug info: for associating with source code  
Hash table (detailed)



# Linking Object Modules

- ❑ Produces an executable image
  1. Merges segments
  2. Resolve labels (determine their addresses)
  3. Patch location-dependent and external refs
- ❑ Could leave location dependencies for fixing by a relocating loader
  - But with virtual memory, no need to do this
  - Program can be loaded into absolute location in virtual memory space



# Object 1

| Object file header     |           |                    |                              |
|------------------------|-----------|--------------------|------------------------------|
|                        | Name      | Procedure A        |                              |
|                        | Text size | 100 <sub>hex</sub> |                              |
|                        | Data size | 20 <sub>hex</sub>  | memory තු ගැනීමේ space ඇත්තා |
| Text segment           | Address   | Instruction        |                              |
|                        | 0         | lw \$a0, 0(\$gp)   |                              |
|                        | 4         | jal 0              | → x load                     |
|                        | ...       | ...                |                              |
| Data segment           | 0         | ( X )              |                              |
|                        | ...       | ...                |                              |
| Relocation information | Address   | Instruction type   | Dependency                   |
|                        | 0         | lw                 | X                            |
|                        | 4         | jal                | B                            |
| Symbol table           | Label     | Address            |                              |
|                        | X         | -                  |                              |
|                        | B         | -                  |                              |

# Object 2

| Object file header     |           |                    |             |
|------------------------|-----------|--------------------|-------------|
|                        | Name      | Procedure B        |             |
|                        | Text size | 200 <sub>hex</sub> |             |
|                        | Data size | 30 <sub>hex</sub>  |             |
| Text segment           | Address   | Instruction        |             |
|                        | 0         | sw \$a1, 0(\$gp)   | → Y ↗ store |
|                        | 4         | jal 0 A            |             |
|                        | ...       | ...                |             |
| Data segment           | 0         | (Y)                |             |
|                        | ...       | ...                |             |
| Relocation information | Address   | Instruction type   | Dependency  |
|                        | 0         | sw                 | Y           |
|                        | 4         | jal                | A           |
| Symbol table           | Label     | Address            |             |
|                        | Y         | -                  |             |
|                        | A         | -                  |             |



# ମୋବ କ୍ଷତ୍ର ବିଜ୍ଞାନ ଏବଂ Virtual memory ବିଜ୍ଞାନ

$\$sp \rightarrow 7fff\ ffffc_{hex}$

data seg start addr ହାତ ଉପରେ ଥାଇ  
 $(8000_{hex} \text{ ଲେବ୍})$

data seg. ହାତ starting  $\rightarrow 1000\ 8000_{hex}$   
 addr.

text seg. starting  
 pc  $\rightarrow 0040\ 0000_{hex}$



# Merged Object

| Executable file header |  | Text size          | $300_{hex}$                                     | $200_{hex} + 100_{hex}$                   |
|------------------------|--|--------------------|-------------------------------------------------|-------------------------------------------|
| Text segment           |  | Data size          | $50_{hex}$                                      | $20_{hex} + 30_{hex}$                     |
|                        |  | Address            | Instruction                                     |                                           |
|                        |  | $0040\ 0000_{hex}$ | <code>lw \$a0, 8000<sub>hex</sub> (\$gp)</code> | প্রথম না<br>থেক্স machine<br>code থাক     |
| A                      |  | $0040\ 0004_{hex}$ | <code>jal 40 0100<sub>hex</sub></code>          | $\rightarrow$ B প্রথম starting<br>address |
|                        |  | ...                | $0099_{hex}$                                    |                                           |
|                        |  | $0040\ 0100_{hex}$ | <code>sw \$a1, 8020<sub>hex</sub> (\$gp)</code> |                                           |
|                        |  | $0040\ 0104_{hex}$ | <code>jal 40 0000<sub>hex</sub></code>          |                                           |
|                        |  | ...                | ...                                             |                                           |
| Data segment           |  | Address            | (X)                                             | $0($gp) \rightarrow X$<br>B(0)            |
|                        |  | $1000\ 0000_{hex}$ | ...                                             |                                           |
|                        |  | ...                | $1000\ 0019_{hex}$                              |                                           |
|                        |  | $1000\ 0020_{hex}$ | (Y)                                             |                                           |
|                        |  | ...                | ...                                             |                                           |
|                        |  |                    |                                                 |                                           |

(0000 - 0099)<sub>hex</sub> → procedure A  
 0 - 19<sub>hex</sub> → procedure A এর জন্য  
 20 - 49<sub>hex</sub> → procedure B এর জন্য  
 (300<sub>hex</sub>) B → procedure B এর জন্য  
 (90 + 80)<sub>hex</sub> ≠ 170<sub>hex</sub> → IMP hex এ add করা মানবে

$(90 + 80) \neq 170_{hex}$  →  $0($gp) \rightarrow X$   
 A(0)

IMP hex এ add করা  
 মানবে





$JP \rightarrow$  higher address

$\overline{8000}$

$\times \overline{1000}$

$\rightarrow 8000$  hex

$\swarrow$  16's complement  $\overline{8000}$

হবে।

$= 8000_{\text{hex}}$

$$\begin{array}{r} 8000 \\ - 20 \\ \hline 7980 \end{array}$$

$\circlearrowleft - 7980 \rightarrow$  16's complement 8020

exam: obj file দেওয়া থাবারে merge করা লাগবে।

# Loading a Program

## ❑ Load from image file on disk into memory

1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory
  - Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including \$sp, \$fp, \$gp)
6. Jump to startup routine
  - Copies arguments to \$a0, ... and calls main
  - When main returns, do exit syscall



library file  
→ যদি copy করে না এনে address রাখে so update করা  
মোটা পাবে।

## Dynamic Linking

### Only link/load library procedure when it is called

- Requires procedure code to be relocatable
- Avoids image bloat caused by static linking of all (transitively) referenced libraries
- Automatically picks up new library versions

xm: static, dynamic linking difference

ঘন্টবিধি → updated version অথবা procedure যদি মিল না হয় এবং  
name change রয়ে থাকে  
— DLL not found.



# Dynamically Linked Libraries (DLL)

- Library routines that are linked to a program during execution
  - To incorporate the updated version of library routines
- In **lazy procedure linkage**, each routine is linked only when it is called
  - Provides a level of indirection → যাই procedure call ক্ষয়াতি আছে তাঁর addr. মুক্তু কপি করে। অন্ধির তে।  
copy
  - Nonlocal routine calls a set of dummy entries at the end of the program, with one entry per nonlocal routine. These dummy entries each contain an indirect করে না, jump
  - The Dynamic Linker/Loader finds the desired routine, remaps it, and changes the address in the indirect jump location to point to that routine
  - From then, the call to the library routine jump indirectly to routine without the extra hops



TF ৬ - অ্যাম্বিল

routine এর পাতা - library পাতা  
address থেকে বের করা

# Lazy Linkage

library address রাখে  
Indirection table  
না থাকলে address বের  
করার জন্য routine থাকে।  
Stub: Loads routine ID,  
Jump to linker/loader

Linker/loader code

Dynamically  
mapped code





# Starting a Java Program

- Java programs ensure **portability** sacrificing some **performance**
  - Compiled first into an easy-to-interpret instruction set: **Java bytecode**
  - A software interpreter, called **Java Virtual Machine (JVM)** can execute Java byte code
    - This process is slow
  - **Just In Time Compiler (JIT)** makes it faster
    - Statistically identify the commonly used (hot) methods →  
convert করে যেখেন্দৰ  
বাকিগুলো interpret করে
    - Compiles these methods into native instruction set
- commonly used method গুলো  
machine code করে যেখেন্দৰ  
বাকিগুলো interpret করে
- make the common case faster



# Starting Java Applications





# C Sort Example

- Illustrates use of assembly instructions for a C bubble sort function
- Swap procedure (leaf)

```
void swap(int v[], int k)
{
    int temp;
    temp = v[k];
    v[k] = v[k+1];
    v[k+1] = temp;
}
```

➤ v in \$a0, k in \$a1, temp in \$t0

ତଫ୍ଟା ପାଇଁ

# The Procedure Swap

```
swap: sll $t1, $a1, 2    # $t1 = k * 4
      add $t1, $a0, $t1 # $t1 = v+(k*4)
                          #   (address of v[k])
      lw $t0, 0($t1)     # $t0 (temp) = v[k]
      lw $t2, 4($t1)     # $t2 = v[k+1]
      sw $t2, 0($t1)     # v[k] = $t2 (v[k+1])
      sw $t0, 4($t1)     # v[k+1] = $t0 (temp)
      jr $ra              # return to calling routine
```



# The Sort Procedure in C

- ❑ Non-leaf (calls swap)

```
void sort (int v[], int n)
{
    int i, j;
    for (i = 0; i < n; i += 1) {
        for (j = i - 1;
             j >= 0 && v[j] > v[j + 1];
             j -= 1) {
            swap(v, j);
        }
    }
}
```

➤ v in \$a0, k in \$a1, i in \$s0, j in \$s1



# The Procedure Body

```

move $s2, $a0          # save $a0 into $s2
move $s3, $a1          # save $a1 into $s3
move $s0, $zero         # i = 0
for1tst: slt $t0, $s0, $s3      # $t0 = 0 if $s0 ≥ $s3 (i ≥ n)
    beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n)
    addi $s1, $s0, -1     # j = i - 1
for2tst: slti $t0, $s1, 0       # $t0 = 1 if $s1 < 0 (j < 0)
    bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)
    sll $t1, $s1, 2        # $t1 = j * 4
    add $t2, $s2, $t1        # $t2 = v + (j * 4)
    lw $t3, 0($t2)          # $t3 = v[j]
    lw $t4, 4($t2)          # $t4 = v[j + 1]
    slt $t0, $t4, $t3        # $t0 = 0 if $t4 ≥ $t3
    beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3
    move $a0, $s2            # 1st param of swap is v (old $a0)
    move $a1, $s1            # 2nd param of swap is j
    jal swap                # call swap procedure
    addi $s1, $s1, -1        # j -= 1
    j for2tst               # jump to test of inner loop
exit2: addi $s0, $s0, 1        # i += 1
    j for1tst               # jump to test of outer loop

```

Move  
params

Outer loop

Inner loop

Pass  
params  
& call

Inner loop

Outer loop

# The Full Procedure

```
sort:    addi $sp,$sp, -20      # make room on stack for 5 registers
        sw $ra, 16($sp)       # save $ra on stack
        sw $s3,12($sp)        # save $s3 on stack
        sw $s2, 8($sp)         # save $s2 on stack
        sw $s1, 4($sp)         # save $s1 on stack
        sw $s0, 0($sp)         # save $s0 on stack
...
...
exit1:   lw $s0, 0($sp)        # restore $s0 from stack
        lw $s1, 4($sp)         # restore $s1 from stack
        lw $s2, 8($sp)         # restore $s2 from stack
        lw $s3,12($sp)        # restore $s3 from stack
        lw $ra,16($sp)        # restore $ra from stack
        addi $sp,$sp, 20       # restore stack pointer
        jr $ra                 # return to calling routine
```



# Effect of Compiler Optimization

Compiled with gcc for Pentium 4 under Linux



ins. count বাড়লো



CPI কম



# Effect of Language and Algorithm



# Example: Clearing an Array

```

clear1(int array[], int size) {
    int i;
    for (i = 0; i < size; i += 1)
        array[i] = 0;
}

```

```

clear2(int *array, int size) {
    int *p;
    for (p = &array[0]; p < &array[size];
         p = p + 1)
        *p = 0;
}

```

$6 \times 10$   
 $= 60$   
 $6 \text{ ins.}$   
 $\text{shift loop এর ভিত্তি}$   
 $(\text{প্রতি operation } 4 \text{ পুর্ণ})$

```

move $t0,$zero      # i = 0
loop1: sll $t1,$t0,2   # $t1 = i * 4 (4 byte কাটা)
       add $t2,$a0,$t1  # $t2 =
                           # &array[i]
       sw $zero, 0($t2)  # array[i] = 0
       addi $t0,$t0,1     # i = i + 1
       slt $t3,$t0,$a1    # $t3 =
                           # (i < size)
       bne $t3,$zero,loop1 # if (...) # goto loop1

```

$4 \times 10$   
 $faster$   
 $4 \text{ ins.}$   
 $\text{shift বাটুরে}$   
 $\text{থাক}$

```

move $t0,$a0      # p = & array[0]
sll $t1,$a1,2    # $t1 = size * 4
add $t2,$a0,$t1  # $t2 = array র first addr
                  # &array[size]
loop2: sw $zero,0($t0) # Memory[p] = 0
       addi $t0,$t0,4    # p = p + 4
       slt $t3,$t0,$t2    # $t3 =
                           #(p < &array[size])
       bne $t3,$zero,loop2 # if (...) # goto loop2

```

# Comparison of Array vs. Ptr

- ❑ Multiply “strength reduced” to shift
- ❑ Array version requires shift to be inside loop
  - Part of index calculation for incremented i
  - c.f. incrementing pointer
- ❑ Compiler can achieve same effect as manual use of pointers
  - Induction variable elimination
  - Better to make program clearer and safer



# MIPS (RISC) Design Principles In Summary

- ❑ Simplicity favors regularity
  - fixed size instructions
  - small number of instruction formats
  - opcode always the first 6 bits
- ❑ Smaller is faster
  - limited instruction set
  - limited number of registers
  - limited number of addressing modes
- ❑ Good design demands good compromises
  - Three instruction formats
- ❑ Make the common case fast
  - arithmetic operands using the registers
  - Saving the commonly used registers into stack
  - PC-relative addressing
  - allow instructions to contain immediate operands

tf: ফান্সি decision পিলে কোন design pattern  
follow করছো।



# Lessons Learnt

- ❑ Instruction count and CPI are not good performance indicators in isolation
- ❑ Compiler optimizations are sensitive to the algorithm
- ❑ Java/JIT compiled code is significantly faster than JVM interpreted
  - Comparable to optimized C in some cases
- ❑ Nothing can fix a dumb algorithm!



# Alternative Architectures

## □ Design alternative

- provide more powerful operations
- goal is to reduce number of instructions executed
- danger is a slower cycle time and/or a higher CPI



# ARM & MIPS Similarities

- ARM: the most popular embedded core
- Similar basic set of instructions to MIPS

|                       | ARM                       | MIPS                      |
|-----------------------|---------------------------|---------------------------|
| Date announced        | 1985                      | 1985                      |
| Instruction size      | 32 bits                   | 32 bits                   |
| Address space         | 32-bit flat               | 32-bit flat               |
| Data alignment        | Aligned                   | Aligned                   |
| Data addressing modes | 9                         | 3                         |
| Registers             | $15 \times 32\text{-bit}$ | $31 \times 32\text{-bit}$ |
| Input/output          | Memory mapped             | Memory mapped             |



# Compare and Branch in ARM

- ❑ Uses condition codes for result of an arithmetic/logical instruction
  - Negative, zero, carry, overflow
  - Compare instructions to set condition codes without keeping the result
- ❑ Each instruction can be conditional
  - Top 4 bits of instruction word: condition value
  - Can avoid branches over single instructions



# Instruction Encoding

|                   | 31                                                                                   | 28               | 27               | 20                                 | 19                  | 16                  | 15               | 12 | 11 | 4 | 3 | 0 |
|-------------------|--------------------------------------------------------------------------------------|------------------|------------------|------------------------------------|---------------------|---------------------|------------------|----|----|---|---|---|
| Register-register | ARM                                                                                  | Opx <sup>4</sup> | Op <sup>8</sup>  | Rs1 <sup>4</sup>                   | Rd <sup>4</sup>     | Opx <sup>8</sup>    | Rs2 <sup>4</sup> |    |    |   |   |   |
|                   | MIPS                                                                                 | Op <sup>6</sup>  | Rs1 <sup>5</sup> | Rs2 <sup>5</sup>                   | Rd <sup>5</sup>     | Const <sup>5</sup>  | Opx <sup>6</sup> |    |    |   |   |   |
| <hr/>             |                                                                                      |                  |                  |                                    |                     |                     |                  |    |    |   |   |   |
| Data transfer     | ARM                                                                                  | Opx <sup>4</sup> | Op <sup>8</sup>  | Rs1 <sup>4</sup>                   | Rd <sup>4</sup>     | Const <sup>12</sup> |                  |    |    |   |   | 0 |
|                   | MIPS                                                                                 | Op <sup>6</sup>  | Rs1 <sup>5</sup> | Rd <sup>5</sup>                    | Const <sup>16</sup> |                     |                  |    |    |   |   | 0 |
| <hr/>             |                                                                                      |                  |                  |                                    |                     |                     |                  |    |    |   |   |   |
| Branch            | ARM                                                                                  | Opx <sup>4</sup> | Op <sup>4</sup>  |                                    | Const <sup>24</sup> |                     |                  |    |    |   |   | 0 |
|                   | MIPS                                                                                 | Op <sup>6</sup>  | Rs1 <sup>5</sup> | Opx <sup>5</sup> /Rs2 <sup>5</sup> | Const <sup>16</sup> |                     |                  |    |    |   |   | 0 |
| <hr/>             |                                                                                      |                  |                  |                                    |                     |                     |                  |    |    |   |   |   |
| Jump/Call         | ARM                                                                                  | Opx <sup>4</sup> | Op <sup>4</sup>  |                                    | Const <sup>24</sup> |                     |                  |    |    |   |   | 0 |
|                   | MIPS                                                                                 | Op <sup>6</sup>  |                  | Const <sup>26</sup>                |                     |                     |                  |    |    |   |   | 0 |
|                   |  |                  |                  |                                    |                     |                     |                  |    |    |   |   |   |



# The Intel x86 ISA

## ❑ Evolution with backward compatibility

- 8080 (1974): 8-bit microprocessor
  - Accumulator, plus 3 index-register pairs
- 8086 (1978): 16-bit extension to 8080
  - Complex instruction set (CISC)
- 8087 (1980): floating-point coprocessor
  - Adds FP instructions and register stack
- 80286 (1982): 24-bit addresses, MMU
  - Segmented memory mapping and protection
- 80386 (1985): 32-bit extension (now IA-32)
  - Additional addressing modes and operations
  - Paged memory mapping as well as segments



# The Intel x86 ISA

## ❑ Further evolution...

- i486 (1989): pipelined, on-chip caches and FPU
  - Compatible competitors: AMD, Cyrix, ...
- Pentium (1993): superscalar, 64-bit datapath
  - Later versions added MMX (Multi-Media eXtension) instructions
  - **The infamous FDIV bug**
- Pentium Pro (1995), Pentium II (1997)
  - New microarchitecture (see Colwell, *The Pentium Chronicles*)
- Pentium III (1999)
  - Added SSE (Streaming SIMD Extensions) and associated registers
- Pentium 4 (2001)
  - New microarchitecture
  - Added SSE2 instructions



# The Intel x86 ISA

## ❑ And further...

- AMD64 (2003): extended architecture to 64 bits
- EM64T - Extended Memory 64 Technology (2004)
  - AMD64 adopted by Intel (with refinements)
  - Added SSE3 instructions
- Intel Core (2006)
  - Added SSE4 instructions, virtual machine support
- AMD64 (announced 2007): SSE5 instructions
  - Intel declined to follow, instead...
- Advanced Vector Extension (announced 2008)
  - Longer SSE registers, more instructions

## ❑ If Intel didn't extend with compatibility, its competitors would!

- Technical elegance ≠ market success



# Basic x86 Registers

| Name   | Use                                  |
|--------|--------------------------------------|
| EAX    | GPR 0                                |
| ECX    | GPR 1                                |
| EDX    | GPR 2                                |
| EBX    | GPR 3                                |
| ESP    | GPR 4                                |
| EBP    | GPR 5                                |
| ESI    | GPR 6                                |
| EDI    | GPR 7                                |
| CS     | Code segment pointer                 |
| SS     | Stack segment pointer (top of stack) |
| DS     | Data segment pointer 0               |
| ES     | Data segment pointer 1               |
| FS     | Data segment pointer 2               |
| GS     | Data segment pointer 3               |
| EIP    | Instruction pointer (PC)             |
| EFLAGS | Condition codes                      |



# Basic x86 Addressing Modes

- Two operands per instruction

| Source/dest operand | Second source operand |
|---------------------|-----------------------|
| Register            | Register              |
| Register            | Immediate             |
| Register            | Memory                |
| Memory              | Register              |
| Memory              | Immediate             |

## ■ Memory addressing modes

- Address in register
- Address =  $R_{base} + \text{displacement}$
- Address =  $R_{base} + 2^{\text{scale}} \times R_{index}$  (scale = 0, 1, 2, or 3)
- Address =  $R_{base} + 2^{\text{scale}} \times R_{index} + \text{displacement}$



# x86 Instruction Encoding

a. JE EIP + displacement



b. CALL



c. MOV EBX, [EDI + 45]



d. PUSH ESI



e. ADD EAX, #6765



f. TEST EDX, #42



## □ Variable length encoding

- Postfix bytes specify addressing mode
- Prefix bytes modify operation
  - Operand length, repetition, locking, ...



# Implementing IA-32 (Seg:Off)

- ❑ Complex instruction set makes implementation difficult
  - Hardware translates instructions to simpler microoperations
    - Simple instructions: 1-1
    - Complex instructions: 1-many
  - Microengine similar to RISC
  - Market share makes this economically viable
- ❑ Comparable performance to RISC
  - Compilers avoid complex instructions



# ARM v8 Instructions

- In moving to 64-bit, ARM did a complete overhaul
- ARM v8 resembles MIPS
  - Changes from v7:
    - No conditional execution field
    - Immediate field is 12-bit constant
    - Dropped load/store multiple
    - PC is no longer a GPR
    - GPR set expanded to 32
    - Addressing modes work for all word sizes
    - Divide instruction
    - Branch if equal/branch if not equal instructions



ins. করা বল্ট ফাইবেলি

# Fallacies

## ❑ Powerful instruction $\Rightarrow$ higher performance

- Fewer instructions required
- But complex instructions are hard to implement
  - May slow down all instructions, including simple ones
- Compilers are good at making fast code from simple instructions

## ❑ Use assembly code for high performance

- But modern compilers are better at dealing with modern processors
- More lines of code  $\Rightarrow$  more errors and less productivity



# Fallacies

- Backward compatibility  $\Rightarrow$  instruction set doesn't change
  - But they do accrete more instructions



আগের ins. গুলো থাকবেই  
আগের নতুন ins. আছে

x86 instruction set



# Pitfalls

- Sequential words are not at sequential addresses
  - Increment by 4, not by 1!
- Keeping a pointer to an automatic variable after procedure returns
  - e.g., passing pointer back via an argument
  - Pointer becomes invalid when stack popped



- Design principles
  1. Simplicity favors regularity
  2. Smaller is faster
  3. Make the common case fast
  4. Good design demands good compromises
- Layers of software/hardware
  - Compiler, assembler, hardware
- MIPS: typical of RISC ISAs
  - c.f. x86



# Concluding Remarks

- Measure MIPS instruction executions in benchmark programs
  - Consider making the common case fast
  - Consider compromises

| Instruction class | MIPS examples                        | SPEC2006 Int | SPEC2006 FP |
|-------------------|--------------------------------------|--------------|-------------|
| Arithmetic        | add, sub, addi                       | 16%          | 48%         |
| Data transfer     | lw, sw, lb, lbu,<br>lh, lhu, sb, lui | 35%          | 36%         |
| Logical           | and, or, nor, andi,<br>ori, sll, srl | 12%          | 4%          |
| Cond. Branch      | beq, bne,slt,<br>slti, sltiu         | 34%          | 8%          |
| Jump              | j, jr, jal                           | 2%           | 0%          |



# Acknowledgements

- These slides contain material developed and copyright by:
  - Lecture slides by Dr. Tanzima Hashem, Professor, CSE, BUET
  - Lecture slides by Mehnaz Tabassum Mahin, Assistant Professor, CSE, BUET



Thank You ☺

