

50% → Endsem

20% → Project / Assignment

30% → Midsem

Core → Relevance & Importance  
 ↪ Stream

### Electives



Execute programs — Application (oops/c)

Design program — How to solve problems

↓  
[Application]

(DAA, TOC)

Theoretical

Computer Science

Alan Turing  
 ↪ What should  
the computer do

Prerequisites : Combinatorics, Graph Theory

GPU + System software → Machine Learning  
 ↪ Hardware

# Computer Architecture

Structural → Load

Architecture → Design & Properties

X functional components in the computer — Structural

How they are placed — Architecture

↳ Computation



I/P & O/P → Separate components

↳ Not part of Main processing unit



← Von Newman Architecture /  
Stored memory Architecture

$a = 5$  ← Temporary variable  
 $b = 6$

$a+b \rightarrow$  stored in Registers

ALU : Arithmetic and Logical Unit

CU : Control Unit

8/1/25

calculator



but what for  $a+b+c+d+e?$

We need to store  $a+b$  result to add with  $C$  and so on.

\* we use register for this temporary storage

at most 16 bits are required to store the data

$$a+b \rightarrow r_1, r_1+c \rightarrow r_2 \\ r_2+d \rightarrow r_3, r_3+e \rightarrow r_4$$

O/I - add  
SWO - this is done by Select lines.  
this bits don't come under the output  
it is only for selecting an operation.



processor or RAM which is imp?



\* John Von Neumann.

We used to work in chaotic kind of environment.



- ALU : Circuits for operations which are performed
    - Very fast compared to main memory
  - Registers : Fastest memory , Stores Temporary Data (Intermediate data)  
Smallest Size , e.g. Sequence of bits / flipflops
  - Control Unit :
    - (i) Timing Signal : Order of execution of instructions
    - (ii) Control Signal : Control the read/write operations on registers .
  - Main components are attached by bus.
    - Implemented by Multiplexers
- Word : Memory representational Unit (Along with bytes)
- Byte → Fixed
- Word → Multiple bytes
- Accumulator : Intermediate data is stored temporarily



Hard disk

SSD : Solid State Device

Data goes through RAM, cannot be taken directly from Hard disk.  
(Memory → Referred as RAM)



SSD

HDD

Bottleneck - Hindrance

Magnetic Disk

- System bus:





```

int *p;
int a = 65;
p = &a;
  
```

*variable  
a + b*

$a \rightarrow$  Data Bus  
 $+$  → Control Bus  
 $b \rightarrow$  Data Bus

Fetch the data from  
location & Transfer

Where to fetch from : Address Bus

What to fetch : Data Bus

Hard disk - External Memory

- ↳ Not connected to System Bus
- Connected via SATA cable

System Bus : Embedded into the chip itself

↳ Hardwired into processor

∴ Very fast

Performance Metric : Clock speed

↳ i.e. no. of instructions in one time unit (Hertz)

P : Min. time taken for one instruction

R : Rate ( $1/P$ )

High Level  
Computer  
Language

Instruction : Basic unit, to tell the  
processor what to do

Parameters :

N : No. of instructions executed

M : Avg. time of an instruction

Assembly

Machine  
Language

→ Checking speed

Cache → Permanent in memory but may/may not be used.

- ↳ Size : Less than main memory i.e. subset of main memory
- ↳ Whatever is frequently accessed
- ↳ Consists of levels - Level 1, 2, 3, Not used beyond 3.



- Load : Data is loaded into register from memory
- Store : Data is stored into memory from registers.
- Move : Move data from register / address to register

lscpu  
lsmem

} Commands (Ubuntu-based)

Cache → Cannot be used instead of main memory.  
↪ Temporary



[Ref] TLP : Translational Lookaside Buffer

10/1

Server — More powerful system than computers

Params:

Throughput : Set of programs / tasks to be completed in a point of time

Response time

Performance Metrics:

(i) Clock Cycle time → Capacity of the Processor [Hardware]

Clock cycle : Positive half cycle + Negative Half cycle

CPU time = no. of programs × Clock cycle time

[Program = n × no. of instructions]

$$\Rightarrow \text{CPU time} = \frac{\text{no. of instructions}}{\text{per program}} \times \frac{\text{no. of cycles}}{\text{instructions}} \times \text{clock cycle time}$$

(sec)

CPI : Cycles per instruction [Architecture]

Each instruction — Average of Instruction Set.

$$\frac{\sum_{i=0}^n P_i + n}{n}$$

Moore's Law : Size of a chip / transistor halves every 18 months.

↪ Intel  
[Quantum Computing is independent]

i.e. same job can be done by a smaller semiconductor chip.

Python - User friendly not System friendly

(no. of instructions)  
per program ↑

∴ High level language

CPU is not only dependent on the Hardware capacity.

Multi-processing : Parallel Computing

$$\text{Speed up} = \frac{\text{without parallelism}}{\text{with parallelism}}$$

[ Divide the work & work]

Concurrency → Resources can be parallel

Parallelism → different tasks being done at the same time

- Degree of Multi-programming → Read [diff. b/w Parallelism & Concurrency]

→ Pipelining : Not in Mid Sem

13/1

Intel : Processor (chip manufacturing company)

Xilinx : ARM, Processors, FGPA

[Virtual Machine Types offered by AWS (18 types)]

- CISC : Complex Instruction Set Computer

RISC : Reduced Instruction Set Computer

x86 → 8086

c64 - Alternative

z → Intel & 8086 → Version

↳ Some value

↳ Intel Processor, comes under

High level language

CISC.

(Add  $x_1, x_2$ ) Assembly Language [8086]



Machine Language



Byte based : Point out every byte in memory.

Word based : Point out every word in memory.

Location of 0 → 8 bytes

Location of 1 → 8 bytes

⋮ ⋮ ⋮ ⋮

Location of 4096 → 8 bytes

Can be 2-3 bytes

$$2^{12} \times 2^3 = 2^{15} \text{ bits}$$

## Locations - Undivided further

Byte Addressing : 0 

|   |   |
|---|---|
| A | B |
|---|---|

 X Cannot be addressed individually  
1 

|   |  |
|---|--|
| B |  |
|---|--|

 ✓

$$2^{12} \rightarrow 12 \text{ bits} \rightarrow \text{Address}$$

∴ Address → 12 bits

In Word Addressing system,  
word size = 2 bytes

∴ Cannot go less than 2 bytes :

$$\frac{4096}{2} = 2048 \longrightarrow 2^{11}$$

∴ Need 11 bits to represent

More memory → byte based Addressing becomes a problem.



CISC → Complex

Less instructions ⇒ More load onto CPU



RISC → Simple



(RAM to Register)  
[load data into register]

Powerful Compiler → RISC

Humans manually fetch the operations.

More instructions ⇒ Less clock cycles

4 Operations in RISC = 1 Operation CISC

Easy for us, Complex for processors

Step-by-step

4 Clock cycles

(Assuming 1 operation takes)  
1 clock cycle.

Automatic

Not 1 clock cycles. Requires more than 1

clock cycle.

### CISC

- 1) Complex Instruction Set Computer
- 2) Large Number of Instructions
- 3) Variable Length Instruction format
- 4) Large No. of addressing modes
- 5) Cost is High
- 6) More Powerful
- 7) Several Cycle Instructions
- 8) Manipulation directly in Memory
- 9) Microprogrammed Control Unit
- 10) Examples: Mainframes, Motorola 6800, Intel 8080  
MULT 2:2, 3:3

### RISC

- 1) Reduced Instruction Set Computer
- 2) Less No. of Instructions
- 3) Fixed Length Instruction format
- 4) Few no. of ALU
- 5) Less cost
- 6) Less Powerful
- 7) Single Cycle Instructions
- 8) Only in Registers
- 9) Hardwired Control Unit
- 10) MIPS, ARM, SPARC, Fugaku

LOAD A, 2:2  
LOAD B, 3:3  
PROD A, B  
STORE 2:2, A



- ARM : Advanced RISC machine.

Easier to design : CISC

Only Intel → CISC [∴ 1986 → Software was not powerful]

Converting RISC to CISC is possible, but not useful.

∴ Processor has the same load

Converting CISC to RISC is better.

i.e. High to Low

Processor → Computing Intensive  
RAM → Memory Intensive.

$\begin{cases} \text{GPU} \rightarrow \text{Process parallelly, } \therefore \text{Reduce load} \\ (\text{CPU} + \text{GPU} \rightarrow \text{Gaming}) \end{cases}$

Non Parallelised → CISC

Efficiently Parallelised → RISC

∴ 1 operation → Many operations [1 Operation → 1 Clock cycle]  
∴ Easier to divide)

CISC can be parallelised but difficult for us.

∴ Not efficiently Parallelised.

16/1

MASM → .asm

↳ Assembly code  
↳ Tool in which it is compiled

Debugging → Memory-based error

↳ Debugs line-by-line

ARM → Advanced RISC Machine

Program → Set of Instructions

Assembler → Makes sure Assembly language is executed on processor.

Without math.h :

gcc file.c -o file  
./file

With math.h :

Add -lm  
Add -lcl

Assembly Language - Written precisely

↳ Assume running on processor, no memory involved.

ARM Processor  $\rightarrow$  32 bit processor

Word size, i.e.  $2^{30}$  words can be possible.

30 Addresses

RAM -  $2^{30}$

Primary Memory / Main Memory / Memory

[Variation of ARM : MIPS, X86]

Contains 16 Registers

Program Counter, Stack Pointer, LR, r0, r1, ..., r12

↳ Store the address of the locations  
Next instruction be loaded is stored.



Assembly Language - Generic Term  
↳ Instruction set

- Operations :

- Arithmetic operations :

- (1) ADD

- (2) SUB

- (1) ADD :

- Operands :  $r_1, r_2, r_3$  [Registers]

- $r_1 = r_2 + r_3$

} Instruction Set Specific to ARM Processor

- (2) SUB :

- $r_1 = r_2 - r_3$

Ex. AND  $r_1, r_2, r_3 \equiv r_1 = r_2 \& r_3$

LDR  $r_1, [r_2, #40]$

Load  $\downarrow$  Register  $r_1$   $\xrightarrow{r_2 + 40}$   $[r_2 : \text{base address}]$

$$\therefore r_1 = [r_2 + 40]$$

Content of  $r_2 + 40$  = data of  $r_1$

|     |     |     |     |     |     |
|-----|-----|-----|-----|-----|-----|
| 200 | 204 | 208 | 212 | 216 | 220 |
|-----|-----|-----|-----|-----|-----|

IntArray

Base Address

Automatically 4 bytes are loaded.

LDR : Load Register → Assigning value to the register

STR : Store Register → Store memory content in the address  
↪ Reverse of LDR

For loading only single byte,

LDRB is used.

STR  $r_1, [r_2, \underbrace{\#40}]$

17/1

Lab :

gcc -g -o sample sample.c

./sample → Running the code

gdb ./sample → Debugging CLI

[gdb : gcc debugger]

run → Run the code b/w specified breakpoints & run the code entirely if no breakpoints mentioned

n → Execute the line, display the next line and get ready to debug the next line.

s → Step-into, Works only if the debugger is currently on the line where a function is called.

c → Continue onto the next breakpoint and get ready to debug

b <line number> OR b <function name> → Create a breakpoint at specified LOC

clear → Delete all breakpoints

clear <line number> → Delete the breakpoint at specified LOC.

display <variable name> → Display the value of variable & continue displaying it in the next debugging lines.

print <variable name> → Display the value of variable

quit → Exit from the debugging CLI.

bt → Print backtrace of all stack frames, or innermost COUNT frames

```
advance -- Continue the program up to the given location (same form as args for break command).
attach -- Attach to a process or file outside of GDB.
cancel -- Cancel the current program being debugged, after signal or breakpoint.
detach -- Detach a program from GDB if currently attached.
detach checkpoint -- Detach from a checkpoint (experimental).
detach inferior -- Detach from Inferior ID (or list of IDs).
disconnect -- Disconnect from a target.
exec -- Execute until selected stack frame returns.
handle -- Specify how to handle signals.
inferior -- Use this command to switch between inferiors.
interrupt -- Interrupt the execution of the debugged program.
kill -- Continue program being debugged at specified line or address.
list -- List the selected line of the program being debugged.
kill inferior -- Kill inferior (or list of IDs).
next -- Step program, proceeding through subroutine calls.
nexti -- Step one instruction, but proceed through subroutine calls.
queue-signal -- Queue a signal to be delivered to the current thread when it is resumed.
resume -- Resume the program being debugged.
reverse-continue -- Continue program being debugged but run it in reverse.
reverse-finish -- Execute backward until just before selected stack frame is called.
reverse-next -- Step program backward, proceeding through subroutine calls.
step -- Step one instruction, or step over pages if needed.
stepi -- Step one instruction exactly.
stepi-skip -- Step backward exactly one instruction.
run -- Start debugged program.
signal -- Continue program with the specified signal.
start -- Start the debugged program stopping at the beginning of the main procedure.
starti -- Start the debugged program stopping at the first instruction.
step -- Step program until it reaches a different source line.
stepi -- Step one instruction exactly.
targs -- Apply a command to all threads (ignoring errors and empty output).
target -- Connect to a target machine or process.
```

17/1  
 $r_0, r_1, \dots, r_{12}, PC, SP, LC$

- Arithmetic ADD  $r_1, r_2, r_3 \rightarrow r_1 = r_2 + r_3$

SUB  $r_1, r_2, r_3 \rightarrow r_1 = r_2 - r_3$

Logical AND  $r_1, r_2, r_3 \rightarrow r_1 = r_2 \& r_3$  [Bitwise]

OR  $r_1, r_2, r_3 \rightarrow r_1 = r_2 | r_3$  [Bitwise]

- MOV  $r_1, r_2 \leftarrow$  Syntax, i.e. Content of  $r_1$  is in  $r_2$ .

MOV  $r_1, \#42 \leftarrow$  Example

MOV  $r_2, r_1$

[copy command]

(from register to register)

- Loading content from the memory (location) specified in RHS into the register:

LDR  $r_1, [r_2, \#42]$   
(Load Register)

If  $r_2 = 40,$

$r_2 + 42 \rightarrow 82$

$\downarrow r_1$

Address = Base + Offset  
 $x^2 \quad \downarrow 10$

1 digit of Hexadecimal number  $\rightarrow 4$  bits

$\therefore$  2 digits of Hexadecimal number  $\rightarrow 8$  bits

$\therefore$  8 digits of Hexadecimal number  $\rightarrow 32$  bits

Memory  $\rightarrow$  Always in bytes

Word size  $\rightarrow$  How we access the content



LDRB  $r_1, [r_2, \#42]$

$\hookrightarrow$  Copy one byte of data from the memory location

$r_2 = 2000$

$r_2 + 42 \rightarrow 2042$

- STR  $r_1, [r_2, \#42]$

If  $r_2 = 2000$ ,

$$2000 + 42 = 2042$$

Copy the content of  $r_1$  to 2042  
 ↳ 4 bytes

STRB  $r_1, [r_2, \#42]$

↳ Copy one byte at a time

Q) Add two numbers

48, 62

Sol: MOV  $r_1, 48$

MOV  $r_2, 62$

ADD  $r_3, r_1, r_2$

H.W

Q)  $a + b = c$

↳ 20 bit (Memory location)  
 ↳ 18 bit (Memory location)

Base = 2000 }  
 Offset = 48    a

b → Immediate location [Ex. a → 2042, b → 2046, c → 2050]

Write a code to add.

c → Offset = 50

20/1

Sol: MOV  $r_1, \#2000$

LDR  $r_2, [r_1, \#48]$

      ↳ 2048

LDR  $r_3, [r_1, \#52]$

ADD  $r_4, r_3, r_2$

STR  $r_4, [r_1, \#56]$

2000 is just a number, but in this context,  
 it is an address.

$$\begin{cases} r_2 = a \\ r_3 = b \end{cases}$$

→ ORR  $r_1, r_2, r_3$

MVN  $r_1, r_2$  :  $r_1$  = bitwise negation of  $r_2$

LSR → Left Shift (Cannot be used independently, use MOV or ADD)

ADD  $r_4, r_3, r_2$ , LSR #2  $\Rightarrow r_4 = r_3 + (r_2 \gg 2)$

MOV  $r_4, r_3$ , LSR #2

Q) Base address  $\rightarrow 2000$

Offset  $\rightarrow 48$

$$d = a + b + c$$

$[$  LSL : Left shift  
LSR : Right Shift  $]$

Sol: MOV  $r_1, \#2000$

LDR  $r_2, [r_1, \#48]$

LDR  $r_3, [r_1, \#52]$

LDR  $r_5, [r_1, \#56]$

ADD  $r_4, r_3, r_2$

ADD  $r_2, r_4, r_5$

STR  $r_2, [r_1, \#60]$

- Comparison:

CMP  $r_1, r_2$

Should be used with a label

EQ, GT, GE, LE, LT, BEQ

$\Downarrow$

Branching

$\Downarrow$

Equal

Greater than

Less than or equal to

Ex. CMP  $r_1, r_2$

Eg label

$\} \rightarrow$  Outputs into label if  $r_1 = r_2$

To code efficiently

Instruction / Instruction set

Q) if ( $a < b$ ), add  $a + b + c$

else,  $b + c$

Sol:

MOV  $r_1, \#2000$

LDR  $r_2, [r_1, \#48]$

LDR  $r_3, [r_1, \#52]$

LDR  $r_5, [r_1, \#56]$

ADD  $r_4, r_3, r_2$

CMP  $r_1, r_2$

GT label

label : STR  $r_4, [r_1, \#60]$

ADD  $r_2, r_4, r_5$

STR  $r_2, [r_1, \#60]$

[GNU ARM Tool]

20/1

- MOV r<sub>1</sub>, #4

MOV r<sub>2</sub>, #6

CMP r<sub>1</sub>, r<sub>2</sub> → Flags get activated.

BNE LABEL

LABEL: ADD r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>

- MOV r<sub>1</sub>, #4

MOV r<sub>2</sub>, #6

CMP r<sub>1</sub>, r<sub>2</sub>

BNE FIRST

FIRST: ADD r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>

MOV r<sub>1</sub>, #4

MOV r<sub>2</sub>, #6

CMP r<sub>1</sub>, r<sub>2</sub>

BNE EXIT

FIRST: ADD r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>

EXIT:

Optional

BNE: Not equal to

↳ Branching

MOV r<sub>1</sub>, #4

MOV r<sub>2</sub>, #6

CMP r<sub>1</sub>, r<sub>2</sub>

BNE EXIT

FIRST: ADD r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>

EXIT: SUB r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>



If a == b, both the FIRST label and the EXIT labels will be executed sequentially.

Fix:

MOV r<sub>1</sub>, #4

MOV r<sub>2</sub>, #6

CMP r<sub>1</sub>, r<sub>2</sub>

BNE EXIT

FIRST: ADD r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>

B EXIT

SUB : SUB r<sub>3</sub>, r<sub>1</sub>, r<sub>2</sub>; comment using semicolon

EXIT:



Q)  $a = b = 2000$

offset for  $a = 24$

offset for  $b = 28$

if ( $a > b$ )

$a \gg 2$

else

$b \ll 2$

Sol:

MOV  $r_1, \#2000$

LDR  $r_2, [r_1, \#24]$

LDR  $r_3, [r_1, \#28]$

CMP  $r_2, r_3$

BGT RIGHT

BLE LEFT

RIGHT: MOV  $r_4, r_2 \text{ LSR } \#2$

B EXIT

LEFT: MOV  $r_4, r_3 \text{ LSL } \#2$

EXIT:

$a = b = 2000$

offset for  $a = 24$

offset for  $b = 28$

Offset for  $c = 32$

if ( $a > b$ )

$c = a \gg 2$

else

$c = b \ll 2$

MOV  $r_1, \#2000$

LDR  $r_2, [r_1, \#24]$

LDR  $r_3, [r_1, \#28]$

CMP  $r_2, r_3$

BGT RIGHT

BLE LEFT

RIGHT: MOV  $r_4, r_2 \text{ LSR } \#2$

B EXIT

LEFT: MOV  $r_4, r_3 \text{ LSL } \#2$

EXIT:

STR  $r_4, [r_1, \#32]$

$(r_2 = a)$   
 $(r_3 = b)$

Q)  $r_1 \rightarrow \#4$

$r_2 \rightarrow \#9$

$r_3 \rightarrow \#92$  (offset)

Write Assembly code for:

while ( $r_1 \neq r_2$ ) {

$r_1 ++$

$r_3 ++$

}

Sol:

MOV  $r_4, \#2000$

LDR  $r_1, [r_4, \#4]$

LDR  $r_2, [r_4, \#9]$

LDR  $r_3, [r_4, \#92]$

MAIN: CMP  $r_1, r_2$

BNE BLOCK

B EXIT

BLOCK: CMP r<sub>1</sub>, r<sub>2</sub>

ADD r<sub>1</sub>, r<sub>1</sub>, #1

ADD r<sub>3</sub>, r<sub>3</sub>, #1

B MAIN

EXIT:

Lab: [20/1]

disass <function-name> : Shows assembly code

info f : Give details of the frame

frame <function-name> : Give details of the frame for specified function

layout src : Shows an interface

info register / ir : Show details of all registers

print \$<register name> : Displays value of specified register

disassemble <function-name> : Shows assembly code for the function

info frame : Information about current frame

21/1

Q) k = 4

do {

k = k + 1

}

while (k < 12)

Sol:

MOV r<sub>1</sub>, #4

ADD r<sub>1</sub>, r<sub>1</sub>, #1

MAIN: CMP r<sub>1</sub>, #12

[OR]

BGE EXIT

ADD r<sub>1</sub>, r<sub>1</sub>, #1

B MAIN

EXIT:

MOV r<sub>1</sub>, #4

WHILE: ADD r<sub>1</sub>, r<sub>1</sub>, #1

CMP r<sub>1</sub>, #12

BLT WHILE

→ Registers:

r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>12</sub>, pc, sp, lr

General Purpose registers

Stack pointer : Stores the starting address of the stack

4040 MOV r<sub>4</sub>, #2000  
 4044 MOV r<sub>1</sub>, #3  
 4048 MOV r<sub>2</sub>, #2  
 4052 ADD r<sub>1</sub>, r<sub>2</sub>, r<sub>3</sub>  
 4056 BL #5000

### SUBTRACT:

5000 SUB r<sub>1</sub>, r<sub>2</sub>, r<sub>3</sub>  
 5004 STR r<sub>1</sub>, [r<sub>4</sub>, #3]  
 MOV pc, lr

[BL : Branch Link]

Current instruction's next instruction address is stored in lr.

PC → 5000 (Actual Next instruction)

LR → 4060 (Supposed to be Next instruction)

Procedure call → Set of Instructions

Label → Sequential code in separate memory location  
 ↗ Used for looping

- Local variables:

r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, r<sub>3</sub> → Passing the arguments conventionally, as reserve.  
 r<sub>4</sub>, r<sub>5</sub>, ..., r<sub>11</sub> → Store the value and give it back conventionally.

Stack → Required when registers are not available

↗ Temporary → Existing registers are stored in stack.  
 Restored into registers when needed.

Stack pointer: Points to top of the stack



\* when sp updates → top will decrease  
 4000 → 3996 → 3992

22/1

 $lr \rightarrow$  insufficientmain()  $\longrightarrow$  AC()  $\longrightarrow$  BC() $lr \rightarrow 80$ , Hence 42 gets overwritten

BC() will have count of how many variables are there.

B will save old registers.

## Assembly Language Function : Procedure

Stack pointer  $\longrightarrow$  Incremented to create space if we don't know whether space is available or notTo store 3 values : MOV  $r_3, sp$ SUB sp, sp, #12  $\longrightarrow$  Erasing Old dataMOV sp,  $r_3$        $sp \rightarrow 52$ STR  $r_0, [sp, \#8]$        $(0, \dots, Top)$        $sp \rightarrow 48$ STR  $r_1, [sp, \#4]$        $sp \rightarrow 44$ STR  $r_2, [sp, \#0]$        $sp \rightarrow 40$  $sp \rightarrow$  location of the top of the stack

Use LDR to restore the values.

24/1

push  $\longrightarrow$  decrease add  $\rightarrow sp$ pop  $\longrightarrow$  increase adds  $\rightarrow sp$ MOV  $r_1, \#40$ MOV  $r_1, \#40$ MOV  $r_2, \#46$ MOV  $r_2, \#46$ MOV  $r_3, \#80$ MOV  $r_3, \#80$ 

SUB sp, sp, #12

LDR  $r_2, [sp, \#8]$ STR  $r_2, [sp, \#8]$ LDR  $r_3, [sp, \#4]$ STR  $r_3, [sp, \#4]$ LDR  $r_4, [sp, \#0]$ STR  $r_4, [sp, \#0]$ 

ADD sp, sp, #12

P1 :

2000 MOV r<sub>5</sub>, #6

2004 ADD r<sub>4</sub>, r<sub>5</sub>, #4

2008 BL P2

PC = 2012

LR = 2012

P2 :

3000 MOV r<sub>6</sub>, #6

3004 ADD r<sub>7</sub>, r<sub>6</sub>, r<sub>9</sub>

3008 BL P3

3012 LDR lr, [sp, #0]

3016 MOV pc, lr

PC = 3012

LR = 3012

P3 :

4000 MOV r<sub>7</sub>, #9

4004 ADD r<sub>7</sub>, r<sub>8</sub>, #8

MOV pc, lr

Standard way to end a function &  
return back to previous call  
Update lr → new lr

A

4000 ADD r<sub>4</sub>, r<sub>4</sub>, r<sub>5</sub>

4004 BL B

LR = PC

PC = 5000

(LR = 4008)

B

5000 SUB r<sub>6</sub>, r<sub>6</sub>, r<sub>7</sub>

5004 BL C

PC = 6000

LR = 5008

4008 is being overwritten

C

6000 ADD r<sub>8</sub>, r<sub>9</sub>, r<sub>9</sub>

6004 SUB r<sub>8</sub>, r<sub>9</sub>, r<sub>9</sub>

6008 MOV pc, lr

∴ B

5000 SUB r<sub>6</sub>, r<sub>6</sub>, r<sub>7</sub>

5004 STR lr, [sp, #0]

5008 BL C

5012 LDR lr, [sp, #0]

5016 MOV pc, lr

r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, r<sub>3</sub> → Arguments

↳ Return value which can be stored

A

ADD r<sub>4</sub>, r<sub>4</sub>, r<sub>5</sub>

BL B

ADD r<sub>4</sub>, r<sub>0</sub>, r<sub>4</sub>

SUB sp, sp, #8

SUB r<sub>6</sub>, r<sub>6</sub>, r<sub>7</sub>

MOV r<sub>0</sub>, r<sub>6</sub>

STR lr, [sp, #0]

STR r<sub>0</sub>, [sp, #4]

BL C

LDR lr, [sp, #0]

ADD r<sub>7</sub>, r<sub>7</sub>, r<sub>8</sub>

MOV pc, lr

B

C

ADD r<sub>0</sub>, r<sub>9</sub>, r<sub>9</sub>

SUB r<sub>8</sub>, r<sub>9</sub>, r<sub>9</sub>

MOV r<sub>0</sub>, r<sub>8</sub>

MOV pc, lr

Store r<sub>4</sub> & r<sub>5</sub> stack

MOV r<sub>4</sub>, #4

MOV r<sub>5</sub>, #9

ADD r<sub>4</sub>, r<sub>5</sub>, r<sub>4</sub>

LDR r<sub>4</sub>, [sp, #0]

Store lr  
in stack

D

| A                                                    | B                                                    | C                                                    | D                                                    |
|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|
| ADD r <sub>4</sub> , r <sub>4</sub> , r <sub>5</sub> | SUB sp, sp, #8                                       | SUB sp, sp, #8                                       | Store r <sub>4</sub> & r <sub>5</sub> stack          |
| BL B                                                 | SUB r <sub>6</sub> , r <sub>6</sub> , r <sub>7</sub> | STR lr, [sp, #0]                                     | MOV r <sub>4</sub> , #9                              |
| ADD r <sub>4</sub> , r <sub>0</sub> , r <sub>4</sub> | MOV r <sub>0</sub> , r <sub>6</sub>                  | STR r <sub>0</sub> , [sp, #4]                        | MOV r <sub>5</sub> , #9                              |
|                                                      | STR lr, [sp, #0]                                     | ADD r <sub>6</sub> , r <sub>9</sub> , r <sub>9</sub> | ADD r <sub>4</sub> , r <sub>5</sub> , r <sub>4</sub> |
|                                                      | STR r <sub>0</sub> , [sp, #4]                        | SUB r <sub>8</sub> , r <sub>9</sub> , r <sub>9</sub> | LDR r <sub>4</sub> , [sp, #0]                        |
|                                                      | BL C                                                 | MOV r <sub>0</sub> , r <sub>8</sub>                  |                                                      |
|                                                      | LDR lr, [sp, #0]                                     | MOV pc, lr                                           |                                                      |
|                                                      | ADD r <sub>7</sub> , r <sub>7</sub> , r <sub>8</sub> |                                                      |                                                      |
|                                                      | MOV pc, lr                                           |                                                      |                                                      |

Every Procedure has its own frame to store local variables.

Frame → Part of Stack, specific for a procedure call / function

Flags →  $z, c, v, 0, \dots$   
 $\underbrace{\quad}_{\text{32 bits}}$

CMP command updates these flags.

H.W

```
int func(int a) {
    if (a < 1)
        return a
    else {
        a = a - 1;
        return func(a)
    }
}
```

$[a \rightarrow \text{Random +ve number}]$   
 $(a = 4)$

Sol:

```
MOV r1, #4
FUNC: SUB sp, sp, #8 [2008 → 2000]
       STR lr, [sp, #4]
       STR r0, [sp, #0]
       CMP r1, #1
       BGE ELSE
       MOV r0, #1
       MOV pc, lr
ELSE: SUB r1, r1, #1
      BL FUNC
      LDR lr, [sp, #4]
```



When moving to different function

BL: Unconditional function call

```
ADD sp, sp, #8
MOV pc, lr
```

27/11 (Lab)

sudo apt install gcc-arm-linux-gnueabi qemu-user gdb-multiarch -y

QEMU - Virtual Machine emulator

↳ ARM processor based system  
(Server)

• Compilation:

```
arm-linux-gnueabi-gcc -g -o main sample.c  
qemu-arm -g 8080 -L /usr/arm-linux-gnueabi ./main
```

New terminal :

```
gdb-multiarch ./main  
set architecture main           / set arch main  
target remote localhost:8080  
break main                      / b main  
continue                         / c  
ni : next instruction  
b address *address : Go to the location
```

29/11

```
① int func(int a) {  
    if (a < 1)  
        return a  
    else {  
        a = a - 1;  
        return (func(a) + a);  
    }  
}
```

Sol:

```
MOV r1, #4  
FUNC: SUB sp, sp, #8  
STR lr, [sp, #4]  
CMP r1, #1  
BGE ELSE  
MOV r0, #1  
MOV pc, lr
```

```
ELSE: SUB r1, r1, #1  
STR r1, [sp, #0]  
BL FUNC ; Backtracking  
LDR r1, [sp, #0]  
LDR lr, [sp, #4]  
ADD r0, r0, r1  
ADD sp, sp, #8 ; Deallocate stack space  
MOV pc, lr
```

```

④) int func(int k) {
    if (k < 1)
        return 1;
    else {
        k = k + func(k-1);
        return k;
    }
}

```

- Two's Complement:

- ① Sign bit representation:

$$5 \rightarrow 0101$$

$$-5 \rightarrow 1101$$

Issue: +0 : 0000  
-0 : 1000 } Must be same

- ② One's Complement:

$$5 \rightarrow 0101$$

$$-5 \rightarrow 1010$$

$$+0 : 0000$$

$$-0 : 1111$$

- ③ Two's Complement

$$5 \rightarrow 0101$$

$$\begin{array}{r} -5 \rightarrow 1010 \\ \hline 1011 \end{array}$$

$$+0 : 0000$$

$$-0 : 1111$$

10000  
Ignored      sign bit

- Multiplication:

$$\begin{array}{r}
\times \begin{array}{c} 2 \\ 14 \\ 25 \end{array} \\
\hline
70 \\
280 \\
\hline
350
\end{array}
} \text{ Shift + Addition} \quad (3)$$

$$\begin{array}{r}
14 \\
25 \\
\hline
39
\end{array}$$

$n^2 + n + \dots \rightarrow$  Depends on the no. of digits of Multiplicand & Multiplier.

$\times \begin{array}{c} 102304 \\ 000004 \end{array}$  is more complex than  $\times \begin{array}{c} 363 \\ 102 \end{array}$

6x6      Acc. to computer      :: 3x3

## Binary Multiplication:

$$\begin{array}{r}
 101 \longrightarrow 5 \\
 \times 110 \longrightarrow 6 \\
 \hline
 000 \\
 1010 \\
 10100 \\
 \hline
 11110 \longrightarrow 30
 \end{array}$$

Not recommended

Recommended : Booth's Algorithm

Shifting can be used instead of Multiplying  
 ↘ Left shift

SC: Size of Multiplier

sum = 0

B : Base

sum = sum + a[i]

A : 0000 (Initially)

Accumulator

B → use

3/2  
A = 0

QR = Multiplier (Analyzed)

BR = Multiplicand (Operation)

n bits →  $q_{n+1} = 0$  [n → no. of bits of QR]

Sc → n

Ex.  $7 \times 5$

$$\begin{array}{l}
 0101 \longrightarrow +5 \\
 0111 \longrightarrow +7
 \end{array}$$

Works for signed numbers (Booth's Algo)

QR → no. having more continuous 1s or 2s  
 (Analyzed easier)

Append  $q_{n+1} = 0$  @ the end

01110

Compare 2 bits @ a time



Ex. 0111  
→ 01110

0111 0111  
A QR

1 → 11011011 ⇒ 10 (Appended value)  $[01110 \Rightarrow A - BR]$

2 → 1110 1101 ⇒ 11  $[01110 \Rightarrow \text{Shift Right}]$

3 → 1111 0110 ⇒ 11  $[01110 \Rightarrow \text{Shift Right}]$

4 → 0010 0011 ⇒ 01  $[01110 \Rightarrow A + BR]$

A: 1111 (updates)  
BR: 0101 (doesn't update)

32 +2+1 → 35  
0010 0011  
0100 0110

Ex.  $3 \times 5$

$101 \rightarrow 0101 \rightarrow +5$  → BR

$011 \rightarrow 0011 \rightarrow +3$  →  $\overline{QR}$

00110 ←  $Q_{n+1}$

$\overline{BR} \rightarrow 1010$

$\overline{BR} + 1 \rightarrow 1011$

10: A 1011 0011 QR  
 $[SC = 4]$  1101 1001 ← Final

00110

11 : 1110 1100

[sc = 3]

00110

01 : 0011 1100

[sc = 2] 0001 1110

$$\begin{array}{r} 1110 \\ 0101 \\ \hline 10011 \end{array}$$

↙

00110

00 : 0000 1111

[sc = 1]

↙ ∴ 15

[Shifting always]

4/1

Ex.  $6 \times 7$

0110 ↗ 0111

Sol :  $A = 0000$

$QR = 0111$

$00/11 \rightarrow \text{ashr}(A, QR)$

$BR = 0110$

$01 \rightarrow A = A + BR \& \text{ Right Shift \& sc--}$

$\emptyset_{NH} = 0$

$10 \rightarrow A = A + \overline{BR} + 1 \& \text{ Right Shift \& sc--}$

$Sc = \text{no. of bits of } QR = 4$

④  $QR = 01110$

1010 0111

Sc ↓ Right shift  
③ 1101 0011

$QR = 0111$

$A = 0000$

$BR = 0110$

$\Rightarrow A = 1010$

$QR = 01110 \Rightarrow 1110 1001 \quad Sc = ②$

$QR = 01110 \Rightarrow 1111 0100 \quad Sc = ①$

$QR = 01110$

↓

0101 0100

$A = 1111$

$BR = 0110$

$\downarrow 0101$

↓ Right shift

0010 1010  $Sc = ①$

→  $10 + 32 = 42$

$\therefore 6 \times 7 = 42$

Ex.  $-4 \times 5$

$$\begin{array}{l} -4 : 1100 \rightarrow QR \\ A \rightarrow 0000 \\ BR \rightarrow 0101 \end{array}$$

$\begin{array}{r} 0100 \\ 1011 \\ \hline 1100 \end{array}$

$$QR = 11000$$

0000 1100

③ 0000 0110 11000

② 0000 0011 11000

1011 0011

① 1101 1001 11000

⑤ 1110 1100 11000

(-ve) 0001 0100  $\longrightarrow -20$

Ex.  $-3 \times 7$

$$\begin{array}{l} A = 0000 \\ QR = 7 = 0111 \\ \Rightarrow QR = 01110 \end{array}$$

$3 : 0011$

$\begin{array}{r} 1100 \\ 1 \\ \hline 1101 \end{array} \leftarrow BR$

$$Sc = 4$$

Sc 0000 0111

③ 0011 0111 01110  $1101 \rightarrow 0011$

1001 1011

② 1100 1101 01110

① 1110 0110 01110

⑤ 1011 0111 01110

1101 1011

$\downarrow$  -ve

$\begin{array}{r} 1110 \\ 1101 \\ \hline 1101 \end{array}$

$$\Rightarrow \underline{-21}$$

$$\begin{array}{r} 11011011 \\ 00100100 \\ \hline 1 \\ 00100101 \end{array}$$
$$\Rightarrow \underline{21}$$

$$Q) -2 \times -3$$

$$QR = 1110$$

$$Sc = 4$$

$$\rightarrow QR = 11100$$

$$BR = 1101$$

$$A = 0000$$

|                |                |
|----------------|----------------|
| $\frac{-2}{0}$ | $\frac{-3}{0}$ |
| 0010           | 0011           |
| 1101           | 1100           |
| 1              | 1              |
| 1110           | 1101           |

Sol:



$$A + (BR)^{2^S} = \begin{array}{r} 0000 \\ 0011 \\ \hline 0011 \end{array} \quad \begin{array}{r} 1101 \\ 0010 \\ \hline 0011 \end{array}$$

If Starting from 1  $\Rightarrow$  Right Shift is Circular

If Starting from 0  $\Rightarrow$  omit ending digit, start with 0.

$$\therefore -3 \times -2 = (0110)_2 = \underline{\underline{6}}$$

$$\text{Ex. } -13 \times -5$$

5/1

- Another method:

$$Q) 6 : 0110 \text{ (Multiplicand)}$$

$$S : 0101 \text{ (Multiplier)}$$

Sol: 0000 0000

0000 0110  $\leftarrow$  Step 1

0000 0110  $\leftarrow$  Step 2

O : Do Nothing

0001 1110  $\leftarrow$  Step 3

0001 1110  $\leftarrow$  Step 4

$\swarrow$   
 $\searrow$

30

Booth's Algorithm  $\rightarrow$  better

Ex.  $7 \times 8$



00000000 1000

0 Initially :

$$Mr \rightarrow 01\underset{0}{00}$$

$$Md \rightarrow 01110$$

00000000 0100

0 :

$$Mr \rightarrow 001\underset{0}{0}$$

$$Md \rightarrow 11100$$

00000000 0010

0 :

$$Mr \rightarrow 0001\underset{0}{1}$$

$$Md \rightarrow 111000$$

00000000 0001

1 :  $A \rightarrow A + Md$

Then, shift

00111000 0000  
↓ 56

$$\begin{aligned} Md &\rightarrow 0000\underset{0}{1} \\ Mr &\rightarrow 1110000 \end{aligned} \quad \left. \right\} \text{Unnecessary}$$

Steps :

① Add if 1

② Shift →

Right → QR - Mr  
Left → BR - Md

#### • Adding Floating Point numbers :

Bigger value should be converted into smaller value

$$\begin{array}{ccc} 49.323 & \xrightarrow{\text{Bigger}} & 4.9323 \times 10^1 \\ 4.86 & \xrightarrow{\text{smaller}} & 4.86 \times 10^0 \end{array} \quad \left. \right\} \text{Now Add}$$

$$\text{Ex. } 42.50 \Rightarrow 101010.1 \Rightarrow 1.010101 \times 2^5$$

$$31.625 \Rightarrow 11111.101 \Rightarrow 1.1111101 \times 2^4$$

$$1.010101 \times 2^5$$

$$E = b+e$$

$$b = 127$$

$$e = 5$$

$$\Rightarrow E = 132$$

$$\Rightarrow 0 \ 10000100 \ 0101010$$

$$1.1111101 \times 2^4$$

$$E = b+e$$

$$b = 127$$

$$e = 4$$

$$\Rightarrow E = 131$$

$$\Rightarrow 0 \ 10000011 \ 111101$$

Can't add  $\because 2^5 \text{ & } 2^4 \rightarrow \text{Different}$

$$10.10101 \times 2^4$$

$$1.1111101 \times 2^4$$

$$42.50$$

$$10.101010$$

$$\underline{1.1111101}$$

$$31.625$$

$$\underline{74.125}$$

$$100.1010001 \times 2^4$$

$$\Rightarrow 1.001010001 \times 2^6$$

7/2

$$42.50 \Rightarrow 1.0101010 \times 2^5 \Rightarrow 0 \ 10000100 \ 0101010$$

$$31.625 \Rightarrow 1.11111 \times 2^4 \Rightarrow 0 \ 10000011 \ 111110$$

$$10.101010$$

$$1.111110$$

$$\underline{100.101000} \times 2^4 \Rightarrow 1.00101000 \times 2^6$$

$$\underline{\underline{e=6}}$$

Overflow:  $0.1001 \Rightarrow 0.101 \Rightarrow 0.11$  (Round off)

Underflow: Add min no



$$\begin{array}{r} 1000101 \times 2^0 \\ 1.000101 \times 2^6 \\ \hline 1.00011 \Rightarrow 1.0010 \end{array}$$



$$10.0101 \times 2^1$$

$$010100 \times 2^1$$

$$4 : 100.0$$

$$2.5 : 10.1$$

10/2

$$10.10 \times 2^4 \rightarrow \text{Fixed Point number}$$

$$1.010 \times 2^5 \rightarrow \text{Floating Point number}$$

$$-2.5 \Rightarrow 2.5 : 10.1$$

$$4 \Rightarrow 4 : 100.0$$

$$\therefore 2.5 \rightarrow 010.1$$

$$\begin{array}{r} 101.0 \\ - 2.5 \rightarrow 101.1 \\ \hline \end{array}$$

$$\text{Now, } 4 + (-2.5)$$

$$\begin{array}{r} 4 : 100.0 \\ - 2.5 : 101.1 \\ \hline 1001.1 \\ \swarrow \quad \searrow \\ (1.1)_2 \equiv (1.5)_{10} \end{array}$$



Multiplicand  $\rightarrow$  32  $\times$  64 bits

Multiplier  $\rightarrow$  32

Result: 65 bits  $\times$  64 bits



13/2

Revision:

- `int fun(int a) {  
 a++;  
}`

$r_0 - r_3 \rightarrow$  Reused, For storing values.  
 $r_4 - r_{11} \rightarrow$  Not Reused  
 $lr \nearrow$

fun: ADD  $r_1, r_1, \#4$

- `int fun(int a) {  
if (a < 6) {  
 a++;  
}  
}`

`int fun(int a) {  
if (a < 6) {  
 a++;  
}  
else {  
 exit(1);  
}`

fun: ADD  $r_1, r_1, \#4$   
CMP  $r_1, \#6$

fun: ADD  $r_1, r_1, \#4$

CMP  $r_1, \#6$

BLT L1

L1: ADD  $r_1, r_1, \#1$

fun: ADD  $r_1, r_1, \#4$

CMP  $r_1, \#6$

BGE L1

ADD  $r_1, r_1, \#1$

L1:

↓ This is wrong  
( $\because$  Sequential)

```

• int fun(int a) {
    while (a < 6) {
        a++;
    }
    else {
        exit(1);
    }
}

```

|      |                                                |  |
|------|------------------------------------------------|--|
| 4000 | fun : ADD r <sub>1</sub> , r <sub>1</sub> , #4 |  |
| 4004 | WHILE : CMP r <sub>1</sub> , #6                |  |
| 4008 | BGE L1                                         |  |
| 4012 | ADD r <sub>1</sub> , r <sub>1</sub> , #1       |  |
| 4016 | B WHILE                                        |  |
| 4020 | L1:                                            |  |
|      | MOV pc, lr                                     |  |

|      |        |                                   |
|------|--------|-----------------------------------|
| 5000 | MAIN   | PC : 5004                         |
| 5004 | BL fun | PC : 5008 $\Rightarrow$ lr = 5008 |
| 5008 |        | PC : 4000                         |

```

• int fun(int a) {
    while (a < 6) {
        a++;
    }
    return a;
}

```

|      |                                                |  |
|------|------------------------------------------------|--|
| 4000 | fun : ADD r <sub>1</sub> , r <sub>1</sub> , #4 |  |
| 4004 | WHILE : CMP r <sub>1</sub> , #6                |  |
| 4008 | BGE L1                                         |  |
| 4012 | ADD r <sub>1</sub> , r <sub>1</sub> , #1       |  |
| 4016 | B WHILE                                        |  |
| 4020 | L1:                                            |  |
|      | MOV pc, lr                                     |  |

|      |                         |  |
|------|-------------------------|--|
| 5000 | MAIN                    |  |
| 5004 | MOV r <sub>1</sub> , #4 |  |
| 5008 | BL fun                  |  |

```

• int fun(int a) {
    while (a < 6) {
        a++;
    }
    fun2(int b);
    return a;
}

```

```

int main() {
    int a = 4;
    fun(a);
    return 0;
}

```

```

fun2(int b) {
}

```

|                                                                                                                                                                                                                             |                                                       |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|
| <pre> 4000  fun : ADD r1, r1, #4 4004 WHILE: CMP r1, #6 4008      BGE L1 4012      ADD r1, r1, #1 4016      B WHILE 4020 L1:   MOV r0, r1           MOV pc, lr           STR lr, [sp, #0]           LDR lr, [sp, #0] </pre> | <pre> 5000  MAIN 5004  MOV r1, #4 5008  BL fun </pre> |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|

MIDSEM : No pipelining

18/2

Division:

4 bit - Divisor, Dividend

4 bit - Remainder, Quotient

5/3

5 : 0101

3 : 0011

$$4+1 \longrightarrow \underline{5^{\text{th}}}$$

$\downarrow_n$

8 bit - Divisor

8 bit - Remainder

4 bit - Quotient



| $q$    | Divisor   | Reminder  |
|--------|-----------|-----------|
| 0000   | 0011 0000 | 0000 0101 |
| ① 0000 | 0001 1000 | 0000 0101 |
| ② 0000 | 0000 1100 | 0000 0101 |
| ③ 0000 | 0000 0110 | 0000 0101 |
| ④ 0000 | 0000 0011 | 0000 0101 |
| ⑤ 0001 | 0000 0001 | 0000 0010 |

Quotient: 1      Remainder: 2

Triple Adder  $\rightarrow$  In Syllabus  
 Fixed Point Addition  
 Fixed Point Subtraction

Ex. 6/2

6 : 0110

2 : 0010

$$4+1 \longrightarrow \underline{5^{\text{th}}}$$

$\downarrow_n$

| $q$    | Divisor   | Reminder  |
|--------|-----------|-----------|
| 0000   | 0010 0000 | 0000 0110 |
| ① 0000 | 0001 0000 | 0000 0110 |
| ② 0000 | 0000 1000 | 0000 0110 |
| ③ 0000 | 0000 0100 | 0000 0110 |
| ④ 0001 | 0000 0010 | 0000 0010 |
| ⑤ 0011 | 0000 0001 | 0000 0000 |

Quotient: 3      Remainder: 0

Ex. 11/4

11 : 1011

4 : 0100

$5 + 1 \xrightarrow{\downarrow n} \underline{6^{\text{th}} \text{ Step}}$

| $\varnothing$ | Divisor     | Reminder                                                                          |
|---------------|-------------|-----------------------------------------------------------------------------------|
| Step ① 00000  | 00100 00000 | 00000 01011 $\rightarrow 01\ldots\ldots$                                          |
| ② 00000       | 00010 00000 | 00000 01011 $\rightarrow 01\ldots\ldots$                                          |
| ③ 00000       | 00001 00000 | 00000 01011                                                                       |
| ④ 00000       | 00000 10000 | 00000 01011                                                                       |
| ⑤ 00001       | 00000 00100 | 00000 00011 ( $11 - 8$ ) $\rightarrow 011\ldots\ldots$                            |
| ⑥ 00010       | 00000 00010 | $\underbrace{00000}_{\text{Quotient: 2}} \underbrace{00011}_{\text{Reminder: 3}}$ |

g) -4/2

4 : 0100  $\rightarrow 1011$

2 : 0010

$\frac{1}{1100}$

$\rightarrow -4$

$4 + 1 \xrightarrow{\downarrow n} \underline{5^{\text{th}} \text{ Step}}$

| $\varnothing$ | Divisor   | Reminder                                                                  |
|---------------|-----------|---------------------------------------------------------------------------|
| 0000          | 0010 0000 | 0000 1100 $\rightarrow 01\ldots\ldots$                                    |
| ① 0000        | 0001 0000 | 0000 1100                                                                 |
| ② 0000        | 0000 1000 | 0000 1100                                                                 |
| ③ 0001        | 0000 0100 | 0000 0100                                                                 |
| ④ 0011        | 0000 0010 | 0000 0000                                                                 |
| ⑤ 0110        | 0000 0001 | $\underbrace{0000}_{\text{3 bit}} \underbrace{0000}_{\text{Reminder: 0}}$ |

$\begin{array}{r} 0110 \\ \overline{110} \end{array} \xrightarrow{2^s \text{ Comp.}} \underline{010}$

$\xrightarrow{\text{Quotient: } -2}$

$\begin{array}{r} 110 \\ 001 \\ \hline 010 \end{array}$

H.W

g) -4/-2  $\rightarrow$  Check

No  $2^s$  complement @ the end

19/2

- ①  $\frac{7}{3} \Rightarrow Q: +ve, R: +ve$
- ②  $\frac{-7}{3} \Rightarrow Q: -ve, R: -ve$
- ③  $\frac{7}{-3} \Rightarrow Q: -ve, R: +ve$
- ④  $\frac{-7}{-3} \Rightarrow Q: +ve, R: -ve$

$$\boxed{Q = \text{Div}^d \oplus \text{Div}^r}$$

$$R = \text{Div}^d$$

Q)  $-13/5$ 

Dividend : 01101

Divisor : 00101

Quotient : 00000

$$n+1 = 5+1 = \underline{6}$$

Q

|                        |       |       |                         |              |         |
|------------------------|-------|-------|-------------------------|--------------|---------|
| 00000                  | 00101 | 00000 | 00000                   | 01101        | ① ..... |
| ① 00000                | 00010 | 10000 | 00000                   | 01101        | ① ..... |
| ② 00000                | 00001 | 01000 | 00000                   | 01101        | ① ..... |
| ③ 00000                | 00000 | 10100 | 00000                   | 01101        |         |
| ④ 00000                | 00000 | 01010 | 00000                   | 01101        |         |
| ⑤ 00001                | 00000 | 00101 | 00000                   | 00011        | ① ..... |
| ⑥ <u>00010</u>         | 00000 | 00010 | <u>00000</u>            | <u>00011</u> | ① ..... |
| $\curvearrowleft Q: 2$ |       |       | $\curvearrowright R: 3$ |              |         |
| $\Rightarrow R = -3$   |       |       |                         |              |         |

Can be done with 4 bits - Manually  
 Conventionally & Systematically  $\rightarrow$  5 bits

21/2

Wallace Tree Multiplication:

| $a_3$     | $a_2$     | $a_1$     | $a_0$     |       |
|-----------|-----------|-----------|-----------|-------|
| $b_3$     | $b_2$     | $b_1$     | $b_0$     |       |
| $a_3 b_0$ | $a_2 b_0$ | $a_1 b_0$ | $a_0 b_0$ |       |
| $a_3 b_1$ | $a_2 b_1$ | $a_1 b_1$ | $a_0 b_1$ | X     |
| $a_3 b_2$ | $a_2 b_2$ | $a_1 b_2$ | $a_0 b_2$ | X X   |
| $a_3 b_3$ | $a_2 b_3$ | $a_1 b_3$ | $a_0 b_3$ | X X X |

$$\begin{array}{r}
 1010 \\
 1110 \\
 \hline
 0000
 \end{array}$$

$$\begin{array}{r}
 1010X \\
 1010XX \\
 \hline
 10001100
 \end{array}$$



More partial products  
(4 levels)

Along with carry  $\rightarrow$  4 levels

$\therefore$  Reduce



## $\rightarrow$ Addressing Modes (ISA)

Immediate Addressing Mode :  $MOV r_1, \#42$

Register Addressing Mode :  $MOV r_4, r_1$

Direct Addressing Mode :  $MOV r_1, [\#42]$

Indirect Addressing Mode :  $MOV r_1, \#42$

$MOV r_2, [r_1]$

Similar to :

Index Addressing Mode  
Base Addressing Mode

} Future topics

Floating Point Multiplication — Not in MidSem

Syllabus : Until pipelining

→ Traditional Hardware Multiplication:

| (64 bit)    | (64 bit)     | (32 bit)   |                              |
|-------------|--------------|------------|------------------------------|
| Acc         | Multiplicand | Multiplier |                              |
| 0000 0000   | 0000 0110    | 0100       | (32 bit) Multiplicand : ←    |
| ① 0000 0000 | 0000 1100    | 0010       | (32 bit) Multiplier : →      |
| ② 0000 0000 | 0001 1000    | 0001       |                              |
| ③ 0001 1000 | 0011 0000    | 0000       | Multiplicand : 4 bit { 8 bit |
| ④ 0001 1000 | 0110 0000    | 0000       | Multiplier : 4 bit           |
| <u>24</u>   |              |            | Accumulator : 8 bit          |

→ Steps : No. of bits of Multiplier  
 $\&$  = No. of bits of Multiplicand Initially

$$1 : \text{Acc} = \text{Acc} + \text{Mul}^d$$

Naturally :



Optimization :



21/2



Control Unit : Read the LSB



0111 - Mul<sup>d</sup>  
0101 - Mul<sup>r</sup>

$$\begin{array}{r} 0000 \ 0101 \\ 0000 \ 0010 \\ 0111 \ 0010 \\ 0011 \ 1001 \\ 0001 \ 1100 \\ 0111 \ 0 \\ \hline 1000 \ 1100 \\ \hline 0100 \ 0110 \end{array}$$



4/3

- ① Memory Instructions
- ② Arithmetic and Logical Operations
- ③ Branching

MIPS : Million instructions per second

Microprocessors without Internal pipelining stages





5/3

iVerilog → Software  
GTK → (Analysing Circuits)



STR : Data is transferred from register to location



ALU → doesn't store anything  
↳ Not sequential, i.e. Comb"

State machine

More states ⇒ More speed of clock cycle.

If no. of clock cycles/sec  $\uparrow$ ,  $\Rightarrow$  Capacity of the Hardware  
& complexity of operations  $\uparrow$ ,  $\Rightarrow$  Hinder the system

- ① Manufacturer
- ② Architect  $\rightarrow$  RISC (Smaller  $\rightarrow$  easier to execute)
  - Minimizes the CPI, decides how fast the system is
- ③ Compiler

### 6/3 ASIC : Application specific

Logisim : Realising small circuits

Not Scalable

VLSI  
 $\hookrightarrow$  Very Large Scaling Integration



$\hookrightarrow$  Simple language  $\rightarrow$  Logisim  
Better for design  $\rightarrow$  Verilog

#### • Verilog:

- ① Supports GATE Level Design  
Each operation is a function



$$\text{AND}(k, A, B)$$

$$\text{OR}(Y, k, C)$$

Problem with this 1<sup>st</sup> level :

Interconnections  $\rightarrow$  Complex  
More no. of Gates

- ② Supports Data Flow Model

- Uses Gates directly
- Data flow is decided by commands

$$k = A \& B$$

$$Y = k | C$$

### ③ Behaviour Model

- System translates by itself
- Advantage : No need to worry about no. of gates, design of circuit
- Ex. MUX

MUX  $\equiv$  Switch - Case (C language)

4 Inputs  $\Rightarrow$   $2^4$  testcases

Best way to Realising output : LED light (for small output)  
For complex operations : ?

To analyse in set of outputs  $\rightarrow$  Timing Diagram

GTK  $\rightarrow$  Timing Diagram : Data  $\Rightarrow$  Waveform (Analyse Output)

UVT  $\rightarrow$  TestBench

↳ of Testbench

iVerilog  $\rightarrow$  Compiler (Takes Verilog file and gives output)

Verilog  $\rightarrow$  Just a behaviour Model allowing us to give set of inputs

iVerilog is for basic circuits.

For advanced circuits,

Vivado (IDE) is used to design the circuits.  
(License Based)

gedit sample.v  $\leftarrow$  For iVerilog

```
module mux4to1 (
    input [3:0] data_in,
    input [1:0] select,
    output reg output_y
);
```

If reg is not mentioned,  
then it is a wire.

Wire : Does not store any  
value

```
1 module mux4to1 (
2     input [3:0] data_in, // 4-bit input vector
3     input [1:0] select, // 2-bit select input (renamed
4     from 'sel')
5     output reg output_y
6 );
7     always @(*) begin
8         case(select)
9             0: output_y = data_in[0]; // Select first
10            1: output_y = data_in[1]; // Select second
11            2: output_y = data_in[2]; // Select third
12            3: output_y = data_in[3]; // Select fourth
```

$3:0 \equiv 4:1 \Rightarrow$  Syntax  
 $\curvearrowright$  Array of pins

Output : Always a reg

input wire [3:0] data\_in

$\curvearrowright$  Default, no need  
to explicitly  
mention

always @(data\_in, select) ≡ always @(\*)

→ Whatever input is given, the change of either of the two is monitored.

Asterisk → When change is detected in any of the inputs.

begin → end

→ Start keyword

case → endcase

Select → 2 bits [::1:0]

$2^6 \text{ bits} \equiv 4 \text{ bits}$   
 ↓                  ↓  
 Indicates bits      0110  
 No. of bits



Hexadecimal ⇔ Binary

output\_y → 1 bit

3:0 → 0110  
 ↓↓↓↓  
 3 2 1 0

module → endmodule

No Indentation rules

Commenting: // comment

### • Testcase:

```

1 module tb_mux4to1;
2
3   reg [ : ] data_in_reg;
4   reg [ : ] select_reg;
5   wire output_y_wire;
6
7
8   mux4to1 uut (
9     .data_in(data_in_reg), // Map 'data_in' to
10    'data_in_reg'
11    .select(select_reg), // Map 'select' to
12    'select_reg'
13    .output_y(output_y_wire) // Map 'output_y' to
14    'output_y_wire'
15  );

```

```

8   case(select)
9     2'b00: output_y = data_in[0]; // Select first
10    input (data_in[0])
11    2'b01: output_y = data_in[1]; // Select second
12    input (data_in[1])
13    2'b10: output_y = data_in[2]; // Select third
14    input (data_in[2])
15    2'b11: output_y = data_in[3]; // Select fourth
16    input (data_in[3])
17  end
18

```



```

3   reg [ : ] data_in_reg;
4   reg [ : ] select_reg;
5   wire output_y_wire;
6
7
8
9   mux4to1 ut (
10     .data_in(data_in_reg), // Map 'data_in' to
11     .select(select_reg), // Map 'select' to
12     .output_y(output_y_wire) // Map 'output_y' to
13     'output_y_wire'
14   );
15 // Test stimulus

```

} Mapping to the sample.v

```

// Test stimulus
6 initial begin
7
8   $dumpfile("and_gate.vcd");
9   $dumpvars(0, tb_mux4to1);
10  data_in_reg = 4'b1010; // Set inputs to 1010
11  select_reg = 2'b00; // Initially select the
12    first input
13  #10; // Wait for 10 time units
14
15  select_reg = 2'b01; // Select the second input
16  #10; // Wait for 10 time units
17
18  select_reg = 2'b10; // Select the third input
19  #10; // Wait for 10 time units
20
21  select_reg = 2'b11; // Select the fourth input
22  #10; // Wait for 10 time units
23
24  $finish; // End the simulation
25 end
26
27 // Monitor the signals
28 initial begin
29   $monitor("Time = %t | data_in = %b | select = %b | "
30   "output_y = %b", $time, data_in_reg, select_reg,
31   output_y_wire);
32 end
33
34 endmodule

```

← Initialise the value

→ { 4 : 4  
b : bits  
1010 : input }  
 $3:0 \Rightarrow 3\ 2\ 1\ 0$   
 $1010$

unit : keyword

Outputs are stored in registers.  
For realising output in waveform,

Data is put in the dumpfile

[0 → default value]

iverilog sample.v -o test → Compilation

iverilog sample.v tb-mux4to1.v -o testobj

vvp testobj

→ Testcases file

→ Executes the file



LS : Left Shift

+ve :

4 bits : 0011

8 bits : 0000 0011

-ve :

4 bits : 1011

8 bits : 1111 1011

Offset  $\rightarrow$  Variable

$$\begin{aligned} \hookrightarrow 16 \text{ bits} &\Rightarrow 2^{16} \text{ bytes} \\ &= 2^{10} \times 2^6 \text{ bytes} \\ &= 64 \text{ kilobytes} \end{aligned}$$

Small value of offset  $\implies$  More space in registers

But for operations to be possible : 16 bits  $\Rightarrow$  32 bits

Offset is used in :

(i) Branching

(ii) Memory

(00000) 0  $\leftarrow$  0 (0000)

(00100) 4  $\leftarrow$  1 (0001)

(01000) 8  $\leftarrow$  2 (0010)

(01100) 12  $\leftarrow$  3 (0011)

(01000) 16  $\leftarrow$  4 (0100)

(10100) 20  $\leftarrow$  5 (0101)

$\hookrightarrow$  Requires only 3 bits, but we get 2 bits extra (last two  $\rightarrow$  00)

Locality of Memory  $\rightarrow$  Later Topic

13/3 :

Absent

14/3 :



$I_1$



$I_2$

① 1 of  $I_1$



Best way of arranging 5 Instructions

|       |          |          |          |          |          |          |          |          |
|-------|----------|----------|----------|----------|----------|----------|----------|----------|
|       | 0        | 1        | 2        | 3        | 4        | 5        |          |          |
| $I_1$ | $I_{11}$ | $I_{12}$ | $I_{13}$ | $I_{14}$ | $I_{15}$ |          |          |          |
| $I_2$ | X        | $I_{21}$ | $I_{22}$ | $I_{23}$ | $I_{24}$ | $I_{25}$ |          |          |
| $I_3$ | X        | X        | $I_{31}$ | $I_{32}$ | $I_{33}$ | $I_{34}$ | $I_{35}$ |          |
| $I_4$ | X        | X        | X        | $I_{41}$ | $I_{42}$ | $I_{43}$ | $I_{44}$ | $I_{45}$ |

Total no. of cycles taken for execution :  $k + n - 1$

17/3

n : no. of instructions

k : no. of stages

$$\text{non-pipelined} = n \times k \times t$$

$$\text{pipelined} = (n + k - 1) \times t$$

$$\text{speed up} = \frac{n \times k \times t}{(n + k - 1) \times t}$$

$I_1$  100 | 200 | 200 | 200 | 300

$S_1 = 100 \text{ ns}$

$I_2$  X 100 200 200 200 st 300

$S_2 = 200 \text{ ns}$

$I_3$  X X 100 200 200 200 st st 300

$S_3 = 200 \text{ ns}$

Max = 1600

$S_4 = 200 \text{ ns}$

$S_5 = 300 \text{ ns}$

Stalling

Overhead: Work more instead of actually getting the benefit

Thrashing: Operation itself is less loaded, but fetching it is more load.

CISC - Pipelining is difficult

Sol: Max size of clock cycle & fix for all stages.

$$300 \longrightarrow (3+5-1) \times 300 = 2100$$

But max = 1600

After k stages  $\Rightarrow$  pattern repeats

H.W: Calculate for 10 instructions & check (Manual & Split method)

- Hazard - Threat to system



$I_1$ : ADD  $r_1, r_2, r_3$

$I_2$ : SUB  $r_1, r_1, r_4$

$I_1$ : IF ID AR MEM RW

$I_2$ : X IF ID AR MEM RW

$\therefore$  Data Hazard

Hardware  $\longrightarrow$  Data accessed into buffers (Overlooking)

LOAD  $\longrightarrow$  4<sup>th</sup>/5<sup>th</sup>

AKA Bypassing

ADD  $\longrightarrow$  3<sup>rd</sup>

Bubbles  $\longrightarrow$  Skipping pipeline instruction to align.

Q)  $n = 18$  instructions

$k = 4$  stages

$t = 1$  cycle

Each cycle  $\rightarrow 10$  ns (Wait for 10 ns between each cycle)

for 1 cycle  $\rightarrow 100$  ns

T for pipeline?

$(18 + 4 - 1) \times 100$  without interleaving time

Time for pipelining:

$$(18 + 4 - 1) \times (100 + 10) - 10$$

for the last cycle

$(n + k - 1)$  times

Sol:  $[(18 + 4 - 1) \times 100] + [(18 + 4 - 2) \times 10]$

H.W

Q)  $n = 18$  instructions

$k = 4$  stages

$t = 3, 4, 3, 4, 5$

1 cycle  $\rightarrow 100$  ns

17/3

```
1module d_ff_posedge (
2    input wire clk,           // Clock signal
3    input wire rst_n,         // Active-low reset
4    input wire d,             // Data input
5    output reg q              // Output
6);
7
8    always @{posedge clk} begin
9        if (rst_n)
10            q = 0; // Reset output to 0 when rst_n is low
11        else
12            q = d; // Capture 'd' on-rising edge of clk
13    end
14
15endmodule
```

sudo apt-get install gtkwave

gtkwave counter.vcd

dumpfile nam

- Bypassing AKA forwarding: Skipping part of instructions

|       |       |       |       |       |
|-------|-------|-------|-------|-------|
| $t_1$ | $t_2$ | $t_3$ | $t_4$ | $t_5$ |
| IF    | REG   | ALU   | MEM   | REG   |
| IF    | REG   | ALU   | MEM   | REG   |

Skip the entire instruction  $\rightarrow$  Bubbles

- Hazards:

- Data Hazard
- Structural Hazard
- Control Hazard

- Control Hazard:

CMP r1, #0  
BEQ FACT

IF REG ALU MEM REG  
IF REG ALU MEM REG } Correct

IF REG ALU MEM REQ  
IF REG ALU MEM REQ } Usual, But wrong

$\therefore$  Solution: Bubbles

Solution: Branch prediction (Precalculation)

|                  | IF REG ALU MEM REG           |
|------------------|------------------------------|
| ADD r1, r2, r3   | IF REG {r2, r3} ALU MEM {r1} |
| SUB r4, r1, r2   | IF REG ALU MEM REG           |
| ADD r5, r4, r1   | IF REG ALU MEM REG           |
| SUB r6, r1, r2   | IF REG ALU MEM REG           |
| STR r6, [r7, #4] | IF REG ALU MEM REG           |

### MIPS Pipeline

- Five stages, one step per stage
  - IF: Instruction fetch from memory
  - ID: Instruction decode & register read
  - EX: Execute operation or calculate address
  - MEM: Access memory operand
  - WB: Write result back to register





| A      | B       | 100 |
|--------|---------|-----|
| 10 sec | 100 sec |     |
| 0.4    | 0.6     |     |

$$0.4 \times 100 \times 10$$

Latency : Overhead (For understanding - Delay)

Q) Hit rate

Miss rate : 40%.

hit  $\rightarrow$  1 cycle

RAM  $\rightarrow$  10 cycles

no. of instructions = 100

$$\begin{aligned} EAT &: (100 \times 1) + (0.4 \times 100 \times 10) \\ &= 500 \text{ cycles} \end{aligned}$$

Q) Cache : 10 ns

Memory : 40 ns

hit rate : 90%

$$EAT = \left( \frac{10}{100} \times 40 \right) + (10 \times 1)$$
$$= \underline{\underline{14}}$$

Cache : Has to be small

∴ Expensive,

Sequential processing → More Latency → More time taken to access

→ Fast access for small chunk of data

∴ Two caches : L<sub>1</sub> cache and L<sub>2</sub> cache

L<sub>1</sub> : Nearest to the processor

Speed :

$$L_1 > L_2 > L_3 > L_4$$

Size :

$$L_1 < L_2 < L_3 < L_4$$

ls cpu  
ls mem



L<sub>1</sub> hit, L<sub>1</sub> miss, L<sub>1</sub> Miss penalty, L<sub>1</sub> Hit time

4/4

$$EAT = L_1 \text{ Acc} + (U_{\text{miss}} \times \text{Mem Access}) \Rightarrow$$

↓  
↓  
↓

$$\text{L1 hit rate, } U_{\text{miss rate}}, \quad L_1 \text{ Access, Mem Access} \quad EAT$$
$$L_1 \text{ Acc, hit rate} + (U_{\text{miss}} \times (\text{Mem Access} + L_1 \text{ Access}))$$

Q) L<sub>2</sub> Miss : 2%

L<sub>1</sub> Miss : 4%

L<sub>1</sub> Access : 10 cycles

L<sub>2</sub> Access : 20 cycles

MMAT : 100 cycles

$$(0.2 \times 20) + (0.4 \times 10)$$

Q) L<sub>1</sub> Miss : 3%

L<sub>1</sub> Access time : 10 cycles

Main mem Access time : 100 cycles

$$(1 \times 10) + (0.3 \times 10) + (100 \times 1)$$

$$(L_1 \text{ Miss rate} \times L_1 \text{ access time}) + (L_1 \text{ Miss rate} \times \text{Main mem Access Time})$$

Q) 1.5 cycles  $\rightarrow$  1 memory access

1000 instructions

1 instruction : 3 cycles

$$\left. \begin{array}{l} \text{MAT} = 100 \text{ ns} \\ L_1 \text{ AT} = 10 \text{ ns} \\ L_2 \text{ AT} = 20 \text{ ns} \\ L_1 \text{ M} = 4\% \\ L_2 \text{ M} = 2\% \end{array} \right\} EAT = ?$$

Q) If no L<sub>2</sub> :

$$10 + (4\% \times 2000) = \underline{\underline{90}}$$

3000 cycles : 2000 memory access

If L<sub>2</sub>:

$$10 + 4\% (20 \times 1 + 2\% (100))$$

Q) How many cycles to access L<sub>2</sub> cache?

7/4

Interrupt : External Disturbance

Exception : Internal Disturbance

## Handling Exceptions

In MIPS, exceptions managed by a System Control Coprocessor (CP0)

- Save PC of offending (or interrupted) instruction
- In MIPS: Exception Program Counter (EPC)
- Save indication of the problem
- In MIPS: Cause register
- We'll assume 1-bit
  - 0 for undefined opcode, 1 for overflow

Jump to handler at 8000 00180

## An Alternate Mechanism

Vectored Interrupts

- Handler address determined by the cause

Example:

- Undefined opcode: C000 0000
- Overflow: C000 0020
- ...: C000 0040

Instructions either

- Deal with the interrupt, or
- Jump to real handler

## Exceptions in a Pipeline

Another form of control hazard

Consider overflow on add in EX stage

add \$1, \$2, \$1

- Prevent \$1 from being clobbered
- Complete previous instructions
- Flush add and subsequent instructions
- Set Cause and EPC register values
- Transfer control to handler

Similar to mispredicted branch

- Use much of the same hardware



11/4

128 KB  $\rightarrow$  Cache

Main memory : 64 GB  $\rightarrow 2^6 \times 2^{10} \times 2^{10} \times 2^{10}$

$\hookrightarrow$  1 word = 4 bytes

1 block = 1 word

tag = ?

68, 128, 64, 83, 128  $\rightarrow$  Hit/Miss

Sol<sup>n</sup> :

64 GB  $\rightarrow 2^6 \times 2^{10} \times 2^{10} \times 2^{10}$

= 36 bits

128 KB  $\rightarrow 2^7 \times 2^8 \times 2^2$

$8 + 7 = 15$

$\hookrightarrow$  Index

$36 - 15 - 2 = 19$  bits

$\hookrightarrow$  tag

2 : offset

$\hookrightarrow$  Blocks  $= 2^{15}$

Index : 15

(Multi-word mapping)

64 KB  $\Rightarrow$  Cache

$\Rightarrow 2^6 \times 2^{10} \times$  byte

$\Rightarrow 2^{16} \Rightarrow 16$  blocks

$2^{16}/2 = 2^{15}$  sets

32 bit : 16 + 16 + 0

32 bit : 16 + 2 + 14

$\hookrightarrow$  2 bits

32 bits : 16 + n

$\hookrightarrow$  n ways

Size of cache : constant

$\therefore$  tag  $\rightarrow$  constant

& index  $\rightarrow$  changes

Q) 128 KB → cache

→ 1 byte  
→ 8 way associative mapping

$$2^{17}$$

$$64 \text{ GB} \rightarrow 2^6 \times 2^{10} \times 2^{10} \times 2^{10} \\ = 36 \text{ bits}$$

$$128 \text{ KB} = 2^7 \times 2^{10}$$

$$36 - 14 - 3 = 15 \text{ bits}$$

$$2^{17}/2^3 = 2^{14}$$

→ tag

Index : 14

Set : 3

$2^{14}$  sets

|       | 0 | 1  |
|-------|---|----|
| Set 0 | 4 | 8  |
| Set 1 |   |    |
| Set 2 |   |    |
| Set 3 | 7 | 15 |

$$3 \div 4 = 3$$

$$15 \div 4 = 3$$

$$4 \div 4 = 0$$

|   |   |
|---|---|
| 0 | 8 |
| 1 |   |
| 2 |   |
| 3 | 3 |
| 4 | 4 |
| 5 |   |
| 6 |   |
| 7 | 7 |

tag + set → no. of checks

Q) 3, 4, 15, 8, 14, 16, 11, 16

8 blocks

→ 1 block = 1 byte

2 way set associative mapping

|   |    |
|---|----|
| 0 | 8  |
| 1 |    |
| 2 |    |
| 3 | 3  |
| 4 | 4  |
| 5 |    |
| 6 | 14 |
| 7 | 15 |

$$3 \times 8 = 3 \quad 3 \times 4 = 3$$

$$4 \times 8 = 4 \quad 4 \times 4 = 0$$

$$15 \times 8 = 7 \quad 15 \times 4 = 3$$

$$8 \times 8 = 0 \quad 8 \times 4 = 0$$

$$14 \times 8 = 6 \quad 14 \times 4 = 2$$

$$16 \times 8 = 0 \rightarrow \text{Miss} \leftarrow 16 \times 4 = 0$$

$$11 \times 8 = 3 \quad 11 \times 4 = 3$$

$$8 \times 8 = 0 \rightarrow \text{Hit}$$

$$16 \times 8 = 0 \rightarrow \text{Miss} \quad 16 \times 4 = 0 \rightarrow \text{Hit}$$





15/4

## → Virtual Memory :

For processor → It exists

For RAM → It does not exist

Virtual Address (Pages) → Disk (CPU uses)

Physical Address (Frames) → RAM

TLB (Translational look-aside Buffer)



Index → Page Table

16/4

## Multilevel Caches

Primary cache attached to CPU

- Small, but fast

Level-2 cache services misses from primary cache

- Larger, slower, but still faster than main memory

Main memory services L-2 cache misses

Some high-end systems include L-3 cache

L1 → Fast access

More hit rate ⇒ Better

Nearer to CPU

L2 → Access more data

Valid bit != 1 ⇒ Proper data &  
is read



Dirty bit : Whether the address of content of valid bit is updated or not.  
(written or not)

Tag bit : Block updated in the memory

Coherence

Consistency

Cache consistency : Two caches accessing same shared memory

Snooping : Lock the main memory such that all existing data in the other caches are invalidated.

∴ No Dirty read

16/4



Shared / local  
Data invalidated

Snooping → Cache coherence

AMAT : Average Memory Access Time

EAT : Effective Access Time

Q) L<sub>1</sub>, L<sub>2</sub>

|                        |                                    |
|------------------------|------------------------------------|
| L <sub>1</sub> MR : 4% | L <sub>1</sub> Penalty : 10 cycles |
| L <sub>2</sub> MR : 6% | L <sub>2</sub> Penalty : 80 cycles |

L<sub>1</sub> cache : 2 cycles (Hit)

60% → Access memory

Hit

$$(2 \times 1) + (0.04)(1 \times 10 + 0.06 \times 80)$$

$$\Rightarrow 2 + 0.04(14.8)$$

$$\Rightarrow 2.59 \cdot i$$

Write : 10

## 2-Bit Predictor

Only change prediction on two successive mispredictions



CPU → very fast

Keyboard, Mouse → Moderately slow

Printer, Speaker → Slow

- Control Bus:



- Modes of operations:

- ① Programmed I/O

CPU → breaks in interval → leave & start again

- ② Interrupt - Driven I/O

Requires access to the CPU → I/O device calls CPU

- ③ DMA: (Direct Memory access)

Takes access of the bus, CPU just waits

All operations are done by its own.



Process synchronization

Block ← Lock  
 $n$  memory units

→ DMA:

- (1) Programmed I/O
- (2) Interrupt Driven I/O
- (3) DMA

Peripherals → I/O

Boot loader → ROM  
 Microsoft : MBR  
 Linux : GRUB

Virtual Machine → Ubuntu  
 (LTS)

Linux Distributions

Debian                          RHEL  
 Ex. Ubuntu                      Ex. Fedora



Interrupt Handler [Diagram]

KDE  
GNOME

