

## Mark Distribution

Attendance + Class Performance → 20

Online <sub>(11)</sub> + Offline <sub>(11)</sub> → ( ) will be provided later

Mid <sub>(written)</sub>

Final <sub>(code)</sub>

### Offline:

- Covers Page will be provided

- Write from book

-- in your own way

- Both hand copy and soft copy

-- hand copy will be hand written

## Micro computers



### Memory:

- stores one bit
- 8 bits make a byte
- first memory byte is stored in 0
- address is <sup>fixed</sup> ~~same~~, content is ~~not~~ <sup>same</sup>
- Address is bits
- Content is bytes
- 1 word = 16 bit = 2 byte = 4 nibble



MSB changes  
~~most~~ least

LSB changes  
least

### Operations:

- + Read/Fetch
- + Write/Add

- Bus :  $\rightarrow$  Connection

- ↳ Address Bus
- ↳ Data Bus
- ↳ Control Bus

- Cache Hit : Memory is found immediately in cache  
" Miss ? " " " " not "

-  $2^{10} \rightarrow$  KB

$2^{20} \rightarrow$  MB

$2^{30} \rightarrow$  GB

## CPU:

- Execution Unit
- Bus interface Unit
- while Execution-Unit performs,  
BIU fetches 6 bits of the next instruction  
(in a queue) → Bus Fetch  
↓  
instruction  
queue
- Execution Unit:
  - ~~Circuit~~ ALU has all the instructions
    - AX: Accumulator } Registers
    - BX: Base } Registers
    - CX:
    - DX:
    - SI: Source } Index
    - DI: Destination }
    - BP: Base } Pointers
    - SP: Stack

- FLAG register : reflects result of a computation

- Bidirectional bus → bidirectional bus

↳ connects

→ Transmits

→ CS

DS

ES

SS

IP: Instruction Pointer

bioseg0      bioseg0

postscript

push

pop

jmp

add

sub

mul

div

and

or

not

neg

imul

idiv

imod

inot

ineg

inot

</

TOPIC NAME: \_\_\_\_\_

## I/O Ports:

- Input → Input Port → Do something
- Output ← Output Port ←

## Instruction Execution:



Machine language

(0/1)



Assembly Language

(Converts machine language  
to a more understandable language)



High level language



# Comp Arch

- \* Classroom: what had
- \* "Core courses are important" ~ Miss
- \* "Fully theoretical. Less Math"
- \* 4 Quizzes

## Computer Architecture:





TOPIC NAME :



\* I/O unit  $\neq$  I/O device

\* Cache is fastest

Main Memory is slowest

\* Immediate use = Registers

Frequent " = Cache

Everything else = Main Memory

## Bus: Group of parallel wires



- Each wire in a bus can transfer one bit of information



TOPIC NAME: W-1

## Bus Structures:

\* To solve Single Bus we use single Registers

## Organization of Cache Memory:



does not periodically refresh ← faster

has 6 transistors

SRAM

\* RAMs contain  
Memory Cells with  
Registers which contains  
1 bit

# Access time of Cache is less:

1. Closer to Processor

2. Has Uses SRAM

3. Cache Memory uses pipelining ← will be explained later (last week)

4. Size is less and so searching is easier

## Computer Components: Top Level View:



PC points to the next instruction to be executed

## Running a Program:

1. PC points to the address of next address
2. IR stores the instruction that is currently being executed.
3. MAR stores the address where read or write will be performed
4. MDR stores the data where read or write will

\* How a program is executed by a Comp:

1. Instruction Fetch
2. Decode and Execute
3. Output

Lab has more detailed



### 1. Instruction Fetch Cycle:



2. Decode & Execute: → happens in IR

i) Decodes the instruction

ii) Memory  $\xrightarrow{\text{location}}$  MARMA CU  $\xrightarrow{\text{read signal}}$  MA MemoryMAR  $\xrightarrow{\text{data}}$  MDR

WMFC → Wait for Memory Function Completion

MDR  $\xrightarrow{\text{data}}$  GPR3. Output:

Store in

Registers

Memory Location

MAR  $\xrightarrow{\text{data}}$  MDRCU  $\xrightarrow{\text{write}}$  MemoryMemory  $\xrightarrow{\text{ }}$  MARMemory  $\xrightarrow{\text{write}}$  MDRMDR  $\xrightarrow{\text{write}}$  Memory

WMFC

## A Typical Instruction:

\* MOV LOCA, R0 ← \* lab has dest snc  
snc dest  
 theory " puts value of LOCA in R0

\* Load = Memory in registers → MOV LOCA, R0  
 Store = Registers " Memory → MOV R0, LOCA



LOAD LOCA, R1

STORE R2, LOCB

\* MOV R0, LOCA:

### 1. Instruction Fetch:

i) Memory  $\xrightarrow{\text{instructions}}$  PC  
 for MOV

PC  $\xrightarrow{\text{sets points}} \text{MAR}$   
 at

CU  $\xrightarrow{\text{sends read signal for MOV}}$  Memory

Memory  $\xrightarrow{\text{reads instruction}}$  MAR  $\xrightarrow{\text{sends address of MOV}}$  Memory

Memory  $\xrightarrow{\text{sends data}}$  MDR

MDR  $\xrightarrow{\text{sets instruction}}$  IR

### 2. Decode and Execute:

i) Decodes the instruction  
 ii) Memory  $\xrightarrow{\text{location of LOCA}}$  MAR

CU  $\xrightarrow{\text{sends read signal for LOCA}}$  Memory

Memory  $\xrightarrow{\text{meads data}}$  MAR

MAR  $\xrightarrow{\text{sends data of RO to LOCA}}$  Memory

Memory  $\xrightarrow{\text{moves data}}$  MDR

### 3. Output :

i) ~~MAR~~ <sup>data</sup>  $\rightarrow$  MDR  
of LCA

CU  $\xrightarrow{\text{write signal}}$  Memory

Memory  $\xrightarrow{\text{writes}} \text{MAR}$   
data

MDR  $\xrightarrow[\text{in}]{\text{writes}} \text{LOCA}$

~~WFM~~ WMFC

i) ~~MAR~~ <sup>sets</sup>  $\rightarrow$  M<sub>L</sub>  
address

\* try at home

# MULTI COMPUTER

---

&

---

# MULTI PROCESSOR

Quiz - 1  
on 26th  
Chap - 1

M. Processor



'One memory'

Multiple Processor

M. Computer



Network connected  
Communication through  
messages

## CH - 2

- Signed numbers (Numbering system)

-

### Number System

- 5

1. Sign & Magnitude - 1101
2. 1's complement - 1010 (0101)
3. 2's complement - 1011



#1000

Sign and magnitude:  $1\ 000 = -0$

Or 1's complement:  $0\ 111 = -7$

2's " :  $(0111 + 1) = 1000 = -8$

# 4 - (-2)

$-2 = 0010$ 's 2's complement

$$\begin{array}{r}
 4 \quad | \quad 0100 = 8 \\
 -(-2) \quad | \quad 0010 \\
 \hline
 6 \quad 0110
 \end{array}
 \quad
 \begin{array}{r}
 = 1101 \\
 + \quad \quad \quad 1 \\
 \hline
 1110
 \end{array}$$

$$\begin{array}{r}
 -(-2) = 1110 \text{'s 2's complement} \\
 = 0001 \\
 + \quad \quad \quad 1 \\
 \hline
 0010
 \end{array}$$

# Self Study: why 2's complement is preferred

Overflow:

In 2's complement clock,

$$3 + 5$$

$$= 0011 + 0101$$

$$= 1000 = -8$$

which also counts as overflow



i) Acquired result above range

ii) Opposite signs after adding same signed numbers

iii) MSB Cin  $\neq$  MSB Cout

## Machine Instructions

---

## and Programs

Sign and Mag  $\rightarrow -2^{n-1} + 1 \rightarrow 2^{n-1} - 1$

One's comp  $\rightarrow$

Two's comp  $\rightarrow -2^{n-1} \rightarrow 2^{n-1} - 1$

## Memory Location Address

---

## and Operation

④ There are  $2^n$  bit addresses that can be accessed

$2^{10}$  = kilobyte

$2^{20}$  = megabyte

$2^{30}$  = gigabyte

## Byte-Addressable:

Each byte is given a definite address  
 This addressing is referred to as byte addressable memory

### Big Endian

|       |           |           |           |           |
|-------|-----------|-----------|-----------|-----------|
| 0     | 0         | 1         | 2         | 3         |
| 4     | 4         | 5         | 6         | 7         |
| :     | :         | :         | :         | :         |
| $2^k$ | $2^k - 4$ | $2^k - 3$ | $2^k - 2$ | $2^k - 1$ |

LSB → MSB

### Little Endian

|           |           |           |           |   |
|-----------|-----------|-----------|-----------|---|
| 0         | 3         | 2         | 1         | 0 |
| M         | 7         | 6         | 5         | 4 |
| :         | :         | :         | :         | : |
| $2^k - 1$ | $2^k - 2$ | $2^k - 3$ | $2^k - 4$ |   |

→ MSB → LSB

## Word Alignment:

If words are aligned, ~~word length~~ addresses will be multiple of bytes.

\* Registers  $N_{bit} = \text{Word Length}$

## Instruction and Instruction

### Sequencing

- i) Data transfer between memory and registers
- ii) Arithmetic and Logical Unit Operation
- iii) Sequencing
- iv) I/O Data Transfer



## Basic Instruction Type: $C = A + B$

\* Three address instruction: OP A, B, C

\* 8086 has 20 address bit

\* Two address instruction: Uses 2 addresses

\* One Address

: Uses AC/AX ← is  
↓ used implicitly

LOAD A ; AC ← M[A]

ADD B ; AC ← [AC] + M[B]

STORE C ; M[C] ← [AC]

## CPU Organization

based on  
memory



## Stack Memory.

TOS  
Top of Stack



when ALU uses zero memory instruction, it uses the top 2 data in the stack.

#

$$X = (A + B) * (B \cdot C + D)$$

address

Three instructions:

ADD A, B, R1; ~~M[x]~~<sup>R1</sup>  $\leftarrow [A] + [B]$

ADD C, D, R2; R1  $\leftarrow [C] + [D]$

MUL R1, R2, X; M[x]  $\leftarrow [X] * [R1]$

Two address:

MOV A, R1 ; R1  $\leftarrow [A]$

MOV C, R2 ; R2  $\leftarrow [C]$

ADD B, R1 ; R1  $\leftarrow [B] + [R1]$

ADD D, R2 ; R2  $\leftarrow [D] + [R2]$

MUL R1, R2 ; R2  $\leftarrow [R1] * [R2]$

MOV R2, X

TOPIC NAME :

DAY:

TIME:

DATE:

DATE: / /

One address:

LOAD A ; AC  $\leftarrow [A]$

ADD B ; AC  $\leftarrow [A] + [B]$

~~STORE~~  
MOV X

LOAD C ; AC  $\leftarrow [C]$

ADD D

MUL X

STORE X

GOOD LUCK

TOPIC NAME:

W-4

DAY:

C-8

TIME:

DATE: 22/12/24

\*2000 address :  $x = (A+B) + (C+D)$

PUSH A

8

TE: 22/12/24

TOPIC NAME:

DAY:

TIME:

DATE:

/ /

## RISC Instruction:

- Can use 3 registers in a single instruction
- Only LOAD and STORE

LOAD A, R1  
LOAD B, R2  
LOAD C, R3  
LOAD D, R4  
ADD ~~A, B~~ R1, R2, A1  
ADD R3, R4, R3  
MUL R1, R3, R1  
STORE R1, X

GOOD LUCK!

## Instruction Execution:

- Straight Line Sequencing → reads line by line
- Branching → may skip lines if necessary
  - ↳ Conditioned Branching → goto inside if else

### Straight Line Sequencing:

#### Assumptions:

- One memory operand per instruction
- 32-bit word length → 32 bit = 4 bytes =  $i, i+4, i+8, \dots$
- Memory byte addressable
- Each instruction fits one word

Branching:

To PC can point towards another address that's NOT the immediate next address. The other address is called branch target



Branch Target  
LOOP

Program  
Loop

Condition  
Loop

MOV N, R1

Clear R0

Determine the  
address of  
"Next" numbers are  
odd "Next" numbers  
to R0

Decrement R1

Branch >0 LOOP

Move R0, SUM

SUM  
N

## Condition Codes / Status Flags :

Bits of a special register



$$A : -20 \Rightarrow A = \text{Two's Complement} (110100)$$

$$\begin{array}{r}
 B : 16 \\
 = 101011 \\
 + 1 \\
 \hline
 = 101100
 \end{array}$$

$$-B = \text{Two's complement} (100000)$$

$$\begin{array}{r}
 = 101111 \\
 + 1 \\
 \hline
 110000
 \end{array}$$

TOPIC NAME : \_\_\_\_\_

DAY : \_\_\_\_\_

TIME : \_\_\_\_\_

DATE : / /

$$\begin{array}{r} \therefore A - B = 101100 \\ 110000 \\ \hline 101100 \end{array}$$

*→ This is wrong  
Used 8 bit*



~~1000~~

~~10001100~~

## Addressing Mode

→ Ways to address Operands

### Register Mode:

i) Operand is content of register

Absolute Mode: Operand is in memory

\* Also called direct mode

\* Registers and absolute modes can be used to represent variables

Immediate Mode: Represents constants

- \* Operand is given explicitly in the instruction
- \* ~~#~~ is used before value  $\rightarrow \#500$

\* Effective Address: Actual address of operand

For Registers:  $EA = R_i$   
 " Absolute:  $EA = LOCA$

Indirect Mode: Address is the content of operand

Pointer |  $EA = [R_i] / [LOCA]$

- \* Expressed with  $( ) \rightarrow (R_1)$

$R_1 = B$  implies

$(R_1) \rightarrow$  Operand in  $B$

[Check Page 55 of slide]

$\rightarrow 54: \text{Move } \#NUM1 \text{ R2}$   
 $\rightarrow$  address of  
 $NUM1$

TOPIC NAME

DAY

TIME

DATE

Index Mode: operand is generated by adding constant

$$\begin{aligned} * & X(R_i) \\ * & PA = [R_i] + x \end{aligned}$$

↑      ↓      → offset

GOOD LUCK

## Organization of the IBM

### Personal Computer

\* Protected Virtual Addresses → Address is stored elsewhere

\* 80486 is strongest in 8086 family

### Registers:

|    |    |    |
|----|----|----|
| AX | AH | AL |
| BX | BH | BL |
| CX | CH | CL |
| DX | DH | DL |

16 bit  
is divided  
into 2 8bit

AX: Used Accumulation Register

\* Arithmetic and logic data transfer instr.

Bx: Base Registers

\* Stores addresses

Cx: Count Registers

\* Loop counters

\* CL is used to shift and rotate bits

DX: Data Registers; (stores remainder)

\* Used in MUL and DIV

\* " " I/O Operations

## \* Segmentation :



\* SP → points at the top  
 \* BP → " " any index

\* 64 bit segment can be given as 16 bit numbers  
 \* think DLD encoder/MUX

\* Physical Address = Segment  $\times 10H$  + Offset

\* Always do it in Hex

Logical Address = Segment:Offset

\* Segment registers should not be used for data transfer.

For 12345 h as physical address

1230h → DS → Segment  
0045h → SI → Offset

\* IP contains offset of CS

Flag Registers:

# ASSEMBLY LANGUAGE

## Assemblers :

- \* Not Case sensitive

## Statements :

- \* Each statement is one line

## \* PROC MAIN

- + Can have upto 4 operands

- \* ; to start comments.

## \* name :

### Name Field

- \* Labels, procedures, names and variables can be 1 to 31 char long

### Operation Field

- \* Op codes are used in Operation field
- \* pseudo op codes are used for mentioning stuffs

### Program Data:

- \* for hex, use a digit before numbers starting with A - F (0 ABCh ✓ 0' ABCh X)
- \* "A" = 65d = 41h = 100001 b

Constants: name EQU address/value

LF EQU 0Ah

Funcs:

\*MOV → cannot be performed between  
two addresses → an intermediate  
register is to be used

\*XCHG → swaps the values of two addresses  
→ same cannot do for mem, mem

\*NEG → does two's complement

TOPIC NAME : \_\_\_\_\_

DAY : \_\_\_\_\_

TIME : \_\_\_\_\_

DATE : / /

## Program Structure :

- MODEL SMALL
- STACK 100h
- DATA
- CODE

MAIN PROC

XMAIN ENDP

END MAIN

## I/O

MOV AH, 1 } input ASCII CODE  
INT 21H } ↳ stored in AL

MOV AH, 2 } output ASCII CODE  
INT 21H } ↳ which is stored in DL

MOV AH, 9 } outputs string ↳ strings must  
INT 21H } ↳ which is stored in DX end with \$

# Flags

Status  
(Status of any operation)  
2, 4, 7, 8, 11

Control  
(Controls operation)  
8, 9, 10.



## Status Flags:

Carry flag: 0 when carry 0  
1 " " 1

Parity: 0 when 100 byte has odd no. of 1  
1 " " " even no. " 1

Auxilliary Carry Flag: 0 if BCD addition carry  $\neq 0$   
 1 " " " " " = 1

Zero Flag: 0 if result is (not  $\neq 0$ )  
 1 " " " (0)

Sign Flag: 0 if MSB is 0 (or +ve)  
 1 " " " " 1 (or -ve)

Overflow Flag: 0 if there is overflow

1 " " " "

→ If acquired answer is higher than the range  
 → No  
 → Sign Unsigned → if CF = 1  
 → Signed → if SF changes  
 → Both signed and unsigned

\* OF = Carry In MSB  $\oplus$  Carry Out MSB

## Instructions' effect on flags:

MOV/XCHG — None

ADD/SUB — All

INC/DEC — All except CF

NEG — All

CF = 1 unless result = 0

OF = 1 if word operand = 8000h

or byte " = 80h

2's complement

of -128 is 128

where range is between -127 to 128

\* If CF = 1 with all trailing 0s, -

ZF = 1

## Addressing Mode

Relative Address:  $X(\text{PC}) \rightarrow$  Used for loops

Offset Address  
of instr.

$-16(\text{PC}) \rightarrow$  go to 16 bytes before  
PC's pointing

Auto increment:  $(R1) + \rightarrow [R1] = [R1] + 1$

\* Adds later

Auto decrement:  $-(R1) \rightarrow$   
used in stack

\* Subs first

# Ch - 3

## I/O Stuffs



# I/O Devices or Something.

## Basic Input Output Operation

\* Data In / Data Out  $\rightarrow$  8 bit Buffer register

// In single bus, the speed of processor etc is fast, there is a chance of data getting replaced

\* S IN / S OUT  $\rightarrow$  Control flags



- \* \* When S IN = 1, Data can be read from DATAIN
- \* \* " S OUT = 1, " " sent to DATAOUT

READWAIT

Branch to READWAIT if SIN=0

Input from DATAIN to R1

WRITEWAIT

Branch to WRITEWAIT if SOUT=0

Output from R1  $\xrightarrow{\text{to}}$  DATAOUT

\*this is not efficient ~~and~~ as it keeps the processor busy.

\*SIN and SOUT are kept along with other control/status flags

TOPIC NAME:

To read a character

READWAIT Testbit #3, INSTATUS

Branch = 0 READWAIT

MoveByte DATAIN, R1

WRITEWAIT Testbit #3, OUTSTATUS

Branch = 0 ~~READ~~ WRITEWAIT

MoveByte R1, DATAOUT

To read a line (NOT string)

Move #LCC, R0

READ Testbit #3, INSTATUS

Branch = 0 READ

MoveByte DATAIN, (R0)

#3, OUTSTATUS

ECHO Testbit

Branch = 0 ECHO

MoveByte (R0), DATAOUT

Compare #CR, (R0)+ ; compare A, B = B - A

Branch ≠ 0 Read READ

#CR = Last value of carriage

Return  $\rightarrow$  start of line

\* Subroutine: Thing that needs to be executed multiple times

\* Instruction that performs the branching operation is named as Calling Instruction

\* Link Register is used to store value of PC when branching.

\* Subroutine Linkage → whole method

\* Subroutine Nesting:

→ Subroutine inside subroutine

→ Makes link registers hard

→ It is better to use SP (Stack Pointer) here

→ Stack Pointer:

→ Decreasing Memory Location

→ Push() → SUB #4, SP ; 1020  
MOV NI, (SP)

→ MOV NI, - (SP)

1000 [ TOS ]

1020 [ ]

→ we need to move  
TOS to 1020 and  
move data there

→ Pop() → MOV (SP), LOC }  
ADD #4, SP } MOV (SP)+, LOC

Parameters Passing:Using Register

## Calling Program

Move M, R1 ; R1 is a counter

Move #NUM1, R2 ; R2 pointing to list (first number)

call LISTADD ; call subroutine

Move R0, SUM ; Save result

## Subroutine

LIST ADD  
LOOP

Clear R0 ; Initialize sum as 0

ADD Add (R2)+, R0 ; Add entry from list

Decrement R1

Branch >0 LOOP

Return ; Return

Using Stack

SP at 10204

Move #NUM1, -(SP); 10201

Move N, -(SP); 10126

Call LISTADD; 1012 ← Ret Add

Move 4(SP), SUM; SP is at 1016  
Indexes 1020

Add #8, SP; Back to 1024

LIST ADD Move Multiple R0 → R2, -(SP); 1008 → 1000

Move 16(SP), R1; 1016 (SP is at 1000)

Move 20(SP), R2; 1020 ( " )  
↑ this just offsets

clear R0

Add (R2)+, R0

LOOP

Decrement R1

Branch &gt;0 LOOP

Move R0, 20(SP); Replace 1020

Move Multiple (SP)+, R0 → R2; 1000 → 1008

↳ actually R2 → R0

Return; 1012 is removed → Moves SP

|                |      |
|----------------|------|
| [R2]           | 100  |
| [R1]           | 1004 |
| [R0]           | 1008 |
| Return Address | 1012 |
| #N             | 1016 |
| NUM1           | 1020 |
| .              | 1024 |

↓ stackframe

## Label:

\* Refers to another line

\* Label - Name:

## Conditional Jump:

\* JMP → unconditional jump → no limitations  
 can only jump between 128 bytes

\* JG/JNLE →  $\geq$

JGE/JNL → "  $\geq$ "

JL/JNGE → "  $<$ "

JNE/JGL → "  $\leq$ "

use slide page 10 for  
 $\rightarrow$  6-8

\* Jump → LABEL if true  
 LABEL if false



## CF CMP ; Compare

\* Dest - Src



## Floating Pointers, Stack Pointer

→ points to the address right above  
the return address → Good for indexing



# Self Study : Carol Harrelson book page (78-80)

## ④ Encoding Instruction: Converting instr. into binary

\* 3 bits to represent addressing mode



$$\frac{\text{Move}, 20(\text{R0})}{8}, \frac{\text{R5}}{4} + 6 = 22$$

can only  
be expressed  
in  $2^{10}$  range

10 bits  
bits remaining

\* 1 WORD is used in memory addresses

\* RISC can use <sup>only</sup> 1 word

- To solve the issue of ~~RG~~ using memory locations,  
~~we set~~ → We set memory locations means instructions  
→ We can set the value using relative addresses  
→ We move that to a register using indirect  
addressing  
→ win

## 4. Arithmetic

- $\&$  Multiplication  $\rightarrow$  Right shift  
Multiplication  $\rightarrow$  Left Shift } Until Multiplication is 0

OR

- |                          |               |               |        |
|--------------------------|---------------|---------------|--------|
| * Multiplicand           | $\rightarrow$ | Left shift    | 1000   |
|                          |               |               | x 1001 |
|                          |               |               | <hr/>  |
|                          |               |               | 1000   |
| # For n bit multiplicand |               | 0.000         |        |
| m " multipliers          |               | 0 0 0 0       |        |
|                          |               | <hr/>         |        |
| = 1000 multiplicand      |               | 1 0 0 0       |        |
|                          |               | <hr/>         |        |
|                          |               | 1 0 0 0 0 0 0 |        |

Final shifted multiplicand

will be  $(n+m)$  bits

4) 3 basic steps for each bit

no. of bits in multiplier → no. of bits in multiplier and decide whether

1. Check LSB of multiplicand and decide whether to add the multiplicand to product

If LSB = 1 → add

If LSB = 0 → do nothing

2. Left shift the multiplicand by 1 bit  
" multiplicand by 1 bit

3. Right "

↓ ① (i) (ii) (iii) (iv)

# 1000  
100111  
-----  
↓

100000

0100

↓

1000000

0010

↓

10000000

0601

↓

100000000  
0000

Product

0  
↓ 0 + 1000

0 1000  
↓ Do Nothing

1000  
↓ Do nothing

1000  
↓ 1000 + 1000000

1001000

Hardware:



(i) Control test decides checks LSB

(ii) Lets addition happen

(iii) Write the acquired result in Product

(iv) Left shifts multiplicand

(v) Right shifts Multiplier

TOPIC NAME :

DAY:

TIME:

DATE:



Product



Product = 00100011

# Self Study → Patterson & Hennessy

Ch - 7Basic Unit Processing Unit

② Instruction Set Processor:

Mainly the processor

& Control Unit Generates the signals of instructions



## Single Bus Data path:





## (b) Single bit representation of Register:



\* ALU is a combinational ~~seq~~ Circuit

## CA Quiz - 2



- \* No theoretical part
- \* Code for specific systems
- \* Specified code using specific addressing modes

### Executing an Instruction:

- \* All Instructions

#ADD R1, R2, R3

\* Only one clock cycle is needed  
for internal instructions.

1. R1<sub>out</sub>, Y<sub>in</sub>  
→ Different clock cycles to prevent errors
2. R2<sub>out</sub>, Select Y, Add, Z<sub>in</sub>
3. Z<sub>out</sub>, R3<sub>in</sub>

## Affecting a word from Memory

1.  $R1_{out}$ ,  $MAR_{in}$ } { One clock cycle  
 2. Read\_F<sub>1</sub> can be combined  
 3. WMFC  
 4.  $MDR_{inF}$ } { can be combined { One clock cycle  
 5.  $MDR_{out} \rightarrow R2_{in}$
1.  $R1_{out}$ ,  $MAR_{in}$ , Read  
 2. WMFC,  $MDR_{inF}$   
 3. MDR



~~#MOV R1, (R2)~~

Step 1: ~~R1~~ Place R2 onto the internal bus

- Load contents onto MDR

- Place

Step 2: ~~WMFC~~ ~~W~~

Steps:

1.  $R2 \rightarrow \text{bus} \rightarrow MAR$

$R2_{out}, MAR_{in}$

2.  $R1 \rightarrow \text{bus} \rightarrow MDR$

$R1_{out}, MDR_{in}$

3. ~~WB~~ Activate Write Operation

Write

4. WMFC MDR  $\rightarrow$  Mem  $\rightarrow$  Bus

$MDR_{out F}$

- 5 WMFC

WMFC

$\rightarrow MDR_{in}$  and  $MDR_{out E}$   
& write and WMFC

CANNOT be in the same clock cycle

## #Mov (R1), R2

Steps:

1. R1 → Bus# → MAR
2. Activate Read Operation
3. WMFC
4. Mem bus → MDR
5. MDR → Bus → R2

Control Sequence:Restrictions for combining:

1. Multiple Out switches cannot be open
2. Read/Write and MFC cannot be in the same cycle
3. Same clock cycle cannot have any registers in and out switch activated.

~~PC~~ ADD (R1), R2 → Complete Execution

### Steps :

1. PC → Bus → MAR :  $PC_{out}, MAR_{in}$

2. Free Activate Read Operation : Read

3.  $PC \leftarrow [PC] + 4$  : Select Y, Add,  $Z_{in}$

4. WMFC : WMFC → Can be multiple or single step

5. Mem Bus → MDR :  $MDR_{out}, Y_{in}$

$MDR \rightarrow Bus \rightarrow IR : MDR_{out}, IR_{in}$

6 [5+6. MDR → Y :  $MDR_{out}, Y_{in}$

7.  $R1_{out}, MAR_{in}$

8. Read Signal : Read

9. WMFC : WMFC

10. Mem bus → MDR :  $MDR_{out}$

[5+7.  $R2 \rightarrow Bus; R2_{out}$

7 [5+8. Select Y, ADD operation  
Select Y, Add

[5+9. Result → Z :  $Z_{in}$

8 [5+10. Z → bus → R2 :  $Z_{out}, R2_{in}$

GOOD LUCK

→ IMP → ~~Free~~ Free Branch: Self-study

## Multiple Bus Organization:

### 3-bus Organization:

+ There is an incrementor which automatically stores points to next instrn.



#ADD R4, R5, R6

Steps:1. PC  $\rightarrow$  BUS<sub>B</sub>2. BUS<sub>B</sub>  $\rightarrow$  ALU  $\rightarrow$  BUS<sub>C</sub>  $\rightarrow$  MAR

3. Increment PC

4. Activate Read Operation

5. WMFC for MFC <sup>Wait</sup> <sub>7 and 8 cannot be merged because 7 is instruction fetch and 8 is for operation</sub>6. Mem Bus  $\rightarrow$  MDR <sub>7 is instruction fetch and 8 is for operation</sub>7. MDR  $\rightarrow$  BUS<sub>B</sub>8. BUS<sub>B</sub>  $\rightarrow$  ALU  $\rightarrow$  BUS<sub>C</sub>  $\rightarrow$  IR9. R4  $\rightarrow$  BUS<sub>A</sub>10. R5  $\rightarrow$  BUS<sub>B</sub>11. Select BUS<sub>A</sub>

12. ADD operation

13. Res  $\rightarrow$  R6Control Signals

1. PCout

2. R=B, MAR<sub>in</sub>

3. Inc PC

4. Read

5. WMFC

6. MDR<sub>in</sub>7. MDR<sub>outB</sub>, R=B, IR<sub>in</sub>8. R4<sub>outA</sub>9. R5<sub>outB</sub>

10. select A

11. ADD

12. PG in, END

## Hardwired Control Unit

CU  
↓  
Handwired

Microprogrammed

### HCU:

- Step Counter
- Instruction Register
- Condition Code Flags
- MFC



GOOD LUCK™

PC → Bus → MDR

MA → Bus → MDR

Z:

PC → Bus → MDR : PC, MA, R

2 → # Read

PC & PC7 + 4 → Select 4, Add, 2 in

: MFC

Mem → Bus → MDR : MDR in E



→ To make sure clock cycle doesn't  
increase step counter

## Logic Instructions:

AND: Works like multiplication

$$\rightarrow CF = OF = 0$$

$\rightarrow$  Effects SF, ZF and PF

|            | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
|------------|-----|----|----|----|---|---|---|---|
| 'a' = 97 = | 0   | 1  | 1  | 0  | 0 | 0 | 0 | 1 |
| 'A' = 65 = | 0   | 1  | 0  | 0  | 0 | 0 | 0 | 1 |

to convert = AND 'a', ODFH = 'A'

\* AND is used for clearing Bits

To make

\* Use AND to go lower

\* Use the bits to convert it to hex  
AND OR that hex

OR: Works like

$\rightarrow$  Same effect as AND

\* OR is used to check sign on register value

$$\begin{array}{l} \text{XOR: } 0 \oplus x = x \\ \hline 1 \oplus x = x' \end{array}$$

\* Used for clearing registers

$$x \oplus x = 0$$

NOT: 1's complement any <sup>numbers</sup>

XOR complements bit  
NOT " whole thing

TEST: dest AND src  
→ CMP bit AND

Check whi bit

## Shift Instructions:

$\text{SHL} = \text{Left Shift}$  } For 1(1) shift,  
 $\text{SHR} = \text{Right Shift}$  } we can  
↳ this moves directly

## SHL / SAL

~~SHL~~ SHL dest, src  $\rightarrow$  dest = dest  $\times 2^{\text{src}}$

For Multiplication of non  $2^n$ ,

Do it separately

& shifting multiple times <sup>might</sup> do not give the right  
 CF and OF

**TOPIC NAME :**

DAY

TIME:

DATE: / /

$SAR + SAR:$

$$\begin{aligned} \overbrace{ACF_B} &= LSB \\ A MSB &= 0 \\ \overbrace{ACF} &= \end{aligned}$$

SBR dest, sonc  $\Rightarrow$  dest / 2<sup>sonc</sup>

$$\boxed{SAR} \quad MSB = MSB$$

1000  
1100



## Rotate Instructions:

Rotates AND send value to CF

ROL: CF = MSB  
LSB = MSB

ROR: CF = LSB  
MSB = LSB

RCL: ~~MSB~~ CF = MSB  
~~CF~~ MSB

RCR: MSB = CF  
CF = LSB

## Hand Wired Control Unit



Advantages: Fast

Disadvantages: Complex

## Microprogrammed CU:

- \* Control Signals are generated using Programs
- \* A control word is used where each individual bit represents various control signals.

| Micro Instruction | R <sub>in</sub>   | R <sub>out</sub>  | . | . | : | - | 1 |
|-------------------|-------------------|-------------------|---|---|---|---|---|
| 1                 | 0 if word in step | 1 if word in step |   |   |   |   |   |
| 2                 |                   |                   |   |   |   |   |   |
| 1                 |                   |                   |   |   |   |   |   |
| .                 |                   |                   |   |   |   |   |   |
| 1                 |                   |                   |   |   |   |   |   |
| .                 |                   |                   |   |   |   |   |   |
| 1                 |                   |                   |   |   |   |   |   |

Each sequence of control word is microroutine  
 Micro routines are stored in ~~micro~~ control store



+ Control Signal of instruction → Execute it

# Microoperations

Online - 7AND

$$\begin{array}{r}
 0 \quad 0 \\
 0 \quad 1 \\
 \hline
 1 \quad 0
 \end{array}
 \quad
 \begin{array}{r}
 0 \quad 0 \\
 0 \quad 1 \\
 \hline
 1 \quad 0
 \end{array}
 \quad
 \begin{array}{r}
 \text{XOR} \\
 0 \quad 0 \\
 0 \quad 1 \\
 \hline
 1 \quad 1
 \end{array}$$

$$\begin{array}{r}
 0 \quad 0 \\
 0 \quad 1 \\
 \hline
 1 \quad 0
 \end{array}
 \quad
 \begin{array}{r}
 0 \quad 0 \\
 1 \quad 0 \\
 \hline
 1 \quad 1
 \end{array}$$

$$\begin{array}{r}
 1 \quad 1 \quad | \quad 1 \\
 | \quad | \quad | \quad | \\
 0 \quad 1 \quad | \quad 0 \\
 | \quad | \quad | \quad | \\
 0 \quad 1 \quad 0 \quad 1
 \end{array}$$

$$\begin{array}{r}
 \text{AND} \\
 0 \quad 1 \quad 0 \quad 1
 \end{array}$$

$$\begin{array}{r}
 F_4 \quad 1 \rightarrow 31H
 \end{array}$$

$$\begin{array}{r}
 \text{XOR} \quad X' \\
 0 \quad 1 \quad 0 \quad 1
 \end{array}$$

$$\begin{array}{r}
 2H
 \end{array}$$

$$\begin{array}{r}
 0 \quad 1 \quad 0 \quad 1 \quad 0 \quad 1 \quad 0 \quad 1 \\
 \hline
 5
 \end{array}$$

$$\begin{array}{r}
 9A \rightarrow 39H
 \end{array}$$

$$\begin{array}{r}
 0 \quad 0
 \end{array}$$

$$\begin{array}{r}
 128
 \end{array}$$

$$\begin{array}{r}
 A
 \end{array}$$

$$\begin{array}{r}
 n
 \end{array}$$

$$\begin{array}{r}
 9H
 \end{array}$$

$$\begin{array}{r}
 1000 \quad 0000 \quad 0000 \\
 \hline
 0101 \quad 0101
 \end{array}$$

$$\begin{array}{r}
 0 \quad 1 \quad 1 \quad 1
 \end{array}$$

$$\begin{array}{r}
 0010
 \end{array}$$

$$\begin{array}{r}
 0001
 \end{array}$$

$$\begin{array}{r}
 \leftarrow 18H
 \end{array}$$

$$\begin{array}{r}
 0000
 \end{array}$$

$$\begin{array}{r}
 \leftarrow 1001B
 \end{array}$$

$$\begin{array}{r}
 128 \rightarrow 1000 \quad 0000 \quad 0000 \\
 40H
 \end{array}$$

$$\begin{array}{r}
 0 \quad 1 \quad 1 \quad 0 \\
 0 \quad 0 \quad 1 \quad 0
 \end{array}$$

$$\begin{array}{r}
 0 \quad 1 \quad 1 \quad 0
 \end{array}$$

$$\begin{array}{r}
 9H
 \end{array}$$

$$\begin{array}{r}
 128
 \end{array}$$

$$\begin{array}{r}
 3C
 \end{array}$$

$$\begin{array}{r}
 1000 \quad 0000 \quad 0000 \\
 \times 0 \quad 1 \quad 0 \quad 0 \quad 1 \quad 0 \\
 \hline
 0 \quad 0 \quad 1 \quad 0 \quad 0 \quad 0
 \end{array}$$

$$\begin{array}{r}
 1000 \quad 0010 \quad 1000 \\
 \hline
 \leftarrow
 \end{array}$$

Ques 7 Soln:

$\Rightarrow$  Mov 128D in BH. ( $8H \rightarrow 40H$ )

~~XOR OAAH.~~ OAAH  $\oplus$  BH ( $2AH$ )

$\rightarrow CX, 8$

COUNT

$\rightarrow$  SHL BH, 1;

JZLC ZERO

CONTINUE

ZERO:

INC ZER

CONTINUE

LOOP COUNT.

(Ans.)

## STACK:

- Each instruction reduces SP
- PUSHF → PUSHes all the <sup>flag</sup> value
- Can PUSH # [const]  
PUSH AL/BL(CL/DL...)
- Can PUSH AX

## PROCEDURE:

[Name] PROC [type] → we use small model  
so by default its NEAR

⋮  
⋮

ENDP [Name]

CALL PROC → Return address is stored in stack

2/2  
No.

TOPIC NAME

W - 9

DAY

C - 21 + 22

TIME:

DATE: 6/25

## Ch - 8

### Pipelining

Speed can be increased in two ways

↓  
Faster circuit technology



↓  
Arrange hardware to simultaneously run multiple instruction

Throughput: Instr. exec/~~sec~~ unit time

Latency: Time taken to complete instr.

Throughput improves in pipelining  
latency does not change

GOOD LUCK

## Pipelining:

Overlapping instructions of previous stages

resources  
stages



Ideal case: all resources are used at a time



For fetch and execution

clock cycle      1      2      3      4

Instruction

I<sub>1</sub>

I<sub>2</sub>

I<sub>3</sub>

I<sub>4</sub>



TOPIC NAME:

Four fetch  $\rightarrow$  decode  $\rightarrow$  Execute  $\rightarrow$  Write



C. C 1 2 3 4 5 6 7

Ins

|   | F <sub>1</sub> | D <sub>1</sub> | E <sub>1</sub> | W <sub>1</sub> |  |
|---|----------------|----------------|----------------|----------------|--|
| 1 |                |                |                |                |  |
| 2 | F <sub>2</sub> | D <sub>2</sub> | E <sub>2</sub> | W <sub>2</sub> |  |
| 3 | F <sub>3</sub> | D <sub>3</sub> | E <sub>3</sub> | W <sub>3</sub> |  |
| 4 | F <sub>4</sub> | D <sub>4</sub> | E <sub>4</sub> | W <sub>4</sub> |  |

Throughput is the total instruction / time

So, for  $F \rightarrow E$ : Throughput =  $\frac{1}{2}$  instruction/cycle

Latency =  $2 \xrightarrow{\text{cycles}} 2$  unit time  
for one  $F-E$

### Role of Cache Memory:

\* Cache Memory stores repetitive instructions as fetching them from memory requires much time

\* Pipelining can increase throughput if they all required same time and there is no hazard.

↳ Any interruption to pipelining is hazard "stall"  
A condition which causes the pipeline to "stall" is called hazard.

TOPIC NAME

## Data Hazard : Availability of sequentiality thing

Hazard:

$$A \leftarrow 3 + R_1$$

$$B \leftarrow 4 \times A$$

Occurs when:

\* Dest 1 = Src 2

\* Dependency

Multi  
Add R2, R3, R4

Add R5, R6, R7

Not Hazard:

$$A \leftarrow 5 \times C$$

$$B \leftarrow 20 + C$$

| F              | D <sub>1</sub> | F <sub>1</sub> | W <sub>1</sub>  |
|----------------|----------------|----------------|-----------------|
| F <sub>2</sub> | D <sub>2</sub> |                | P <sub>2A</sub> |
| F <sub>3</sub> |                |                |                 |
|                |                | B              | E <sub>2</sub>  |
| F <sub>4</sub> | D <sub>4</sub> | F <sub>4</sub> | W <sub>4</sub>  |

## Control on Instruction hazard:

If instructions are not available



## Structural hazard:

If mult. ins uses the same resources.



Cache Miss

## Instruction Flowchart

If write miss and the write is write-back



## Handling So Data Hazard

$$A = S + R_1 \rightarrow W \text{ first}$$

$$B = U \times A \rightarrow E \text{ then}$$

We handle it using operand forwarding

But, if we can forward the result obtained in  $E$ , we won't need the  $W-E$  dependency

\* handled using hardware.



# MUL DIV

\* MUL src → only 1 operand  
the other is stored in AX

MUL ; X \* Y  
 MOV AX, X } AX = [AX] \* [Y]  
 MUL Y }

But for words,

DX : AX  
 first 8      last 8  
 bits      digits

# For 16 bit result → AX  
 # " 32 "      " → DX : AX

FFFF  
 \* 1  
 -----  
 FFFF → for unsigned (Use MUL)  
 @@ FFFF FFFF → for signed (Use IMUL)

TOPIC NAME : .....

DAY : .....

TIME : .....

DATE : .....

FFFF

\* 1

---

FFFF FFFF

this happens as DX is filled with MSB of AX

CF and OF = 0 if AX

= 1 " DX:AX

GOOD LUCK™

# DIV / IDIV

For 8 bit

$$\text{Quotient bits} = \text{Remainder bits} = \text{Divisor bits}$$

↓                    ↓                    ↓  
 AL                  AH                  ARL

For 16 bit:

Quotient  $\rightarrow$  AX  
 Remainder  $\rightarrow$  DX  
 Divisor  $\rightarrow$  AX

$$\begin{array}{r}
 01111111 \\
 10000000 \\
 \hline
 10000000 \\
 \hline
 00000011
 \end{array}
 \quad \begin{array}{l}
 \text{FD} \\
 \text{1111101} \\
 00000010 \\
 \hline
 00000011 \\
 3
 \end{array}$$

Remainder sign = Dividend sign

CWD  $\rightarrow$  Convert word to } double word } Moves AX to DX

## Handling Data Dependency: (software edition)

I<sub>1</sub>: MUL R2, R3, R4

NOP } No operation  
NOP } to prevent clogging

I<sub>2</sub>: Add R3, R4, R5

## Super Scalar Operation:

\* Multiple Processors

∴ Increase in throughput

\* Multiple Processors = Multiple Issue handling Processors



- \* Can't handle sequential floating point / integer calculations
- \* High performance = hardware unit can be kept busy  
the ~~most~~ of using compilers



\* Floating point takes 3 cycles  
 \* " " uses three stage pipelining

| Clock Cycle  | 1     | 2     | 3     | 4     | 5     | 6     | 7 |
|--------------|-------|-------|-------|-------|-------|-------|---|
| $I_1$ (FAdd) | $F_1$ | $D_1$ | $E_1$ | $F_1$ | $E_1$ | $W_1$ |   |
| $I_2$ (Add)  | $F_2$ | $D_2$ | $E_2$ | $W_2$ |       |       |   |
| $I_3$ (FSub) | $F_3$ | $D_3$ | $E_3$ | $E_S$ | $E_3$ | $W_3$ |   |
| $I_4$ (Sub)  | $F_4$ | $D_4$ | $E_4$ | $W_4$ |       |       |   |

Quiz - 3

Slide - 5

16/2/25

GOOD LUCK

Quiz - 3 : Wednesday

→ Control Sequence

→ Hand written / Microprogrammed

Unconditional Branching

\* Discard next step ~~and rep~~



|                         | 1              | 2              | 3               | 4              | 5             | 6 |
|-------------------------|----------------|----------------|-----------------|----------------|---------------|---|
| I <sub>1</sub>          | F <sub>1</sub> | D <sub>1</sub> | D <sub>E1</sub> | W <sub>1</sub> | not mandatory |   |
| I <sub>2</sub> (Branch) | F <sub>2</sub> | D <sub>2</sub> | E <sub>2</sub>  | W <sub>2</sub> |               |   |
| I <sub>3</sub>          | F <sub>3</sub> | D <sub>3</sub> | X               | X              |               |   |
| I <sub>n</sub>          | F <sub>n</sub> | D <sub>n</sub> | X               | X              |               |   |
| I <sub>k</sub>          | F <sub>k</sub> | D <sub>k</sub> |                 |                |               |   |
| Branch<br>Penalty = 2   |                |                |                 |                |               |   |

Fetch Unit → Identify Branch instruction  
+ Compute the branch target

1 2 3 4 5 6 7 8

 $I_1$  $I_2 (B_2)$  $I_3$  $I_u$  $F_k$ 

|       |       |       |       |
|-------|-------|-------|-------|
| $F_1$ | $D_1$ | $E_1$ | $w_1$ |
|       |       | $D_2$ |       |
|       |       | $F_3$ |       |

Bottleneck Penalty =

 $\beta$ 

|       |       |       |       |
|-------|-------|-------|-------|
| $F_u$ | $D_u$ | $E_u$ | $w_u$ |
|-------|-------|-------|-------|

102

## Conditional Branching

\* More hazard

\* Dynamic instruction count = (condition) \* (lines to be executed)

→ To handle:

i) Delayed Branch      (ii) Branch Prediction

↳ check sfide

\* The instruction after branch is a branch delay slot

## Branch Prediction

\* Predicts if there ~~is another~~ will be a branch.

### Speculative execution:

executes before the processor is certain that they are correct instruction sequence.

→ Next steps are executed assuming Branch will

be ignored

→ We use some branch instructions to predict

→ If Branch condition at top → better to predict false  
 " " bottom → " " true

- SPARC takes a bit to check

↳ Scalable Processor ARC architecture

# dynamic prediction: Carl Hamacher - Ch - 4.

# Performance

$$T = \frac{N \times S}{R} \quad P_s = \frac{R}{S} = 1 \text{ in ideal case}$$

$$P_p = \frac{R}{T_i} = \frac{R}{1 + S_{\text{miss}}}$$

where,

$T$  = Time of execution

$N$  = dynamic instruction count

$S$  = cycles/instruction = 1 in ideal case

$R$  = Clock Rate =  $\frac{1}{\text{Clock Period}}$

$P_s$  = Throughput

$$S_{\text{miss}} = ((1 - h_i) + d(1 - h_d)) \times M_p$$

↓                      ↓                      ↓                      ↓  
 instr. hit            data hit            cache miss      cycles  
 rate                    percentage        penalty

Average memory  
in  $T_i$   
of instructions with  
data operands

N!

input  $\rightarrow n$ 

$\xrightarrow{\text{mov } n \text{ bx}}$   
~~mul n~~

 $12/10 \rightarrow Q = 1$   
 $R = 2$ 
 $AL \rightarrow 8 \text{ bits}$  $AX \rightarrow 16 \text{ bits}$  $DX:AX \rightarrow 32 \text{ bits}$ 

MUL  
if multiplied with between  
bytes and answer comes out  
word  $\rightarrow OF/CF = 1$   
in signed if MSB(AL)  $\neq AH$ ,  
 $OF/CF = 1$

DIV: Division decides what type of division

Rem : Quotient

Division DB  $\longrightarrow$  ~~AX~~: ALDivision DW  $\longrightarrow$  DX: AX

# Arrays and Addressing Mode

Ans DW   .

$SI + 1 \rightarrow$  byte

$SI + 2 \rightarrow$  word

DUP: Duplicates

name DW times DUP(value)

If offset  $\longrightarrow$  segment  
BX, SI, DI  
BP

DS  
SS

~~ADD~~

pointers

LEA dest  src~~[AX]~~

AX → value

[value]

## Based and Indexed Addressing Mode

### Based Addressing:

Using BP, BX

### Indexed Addressing:

Using SI, DI

MOV AX, W[BX]

PTR → Always moves a byte  
value from address automatically

LABEL → Helps to move stuffs

XLAT → Gets address of AL + BX

## Number of Pipeline Stages

To reduce cache miss, we use secondary cache memory

→ This secondary cache stays between main and primary cache memory

$$S_{\text{miss}} = \left( (1 - h_s) + d(1 - h_d) \right) (h_s \times M_s + (1 - h_s) \times M_p)$$

Online - 10 :

SUM =

[DI]  $\rightarrow$ 

MOV CX, 10

DW DW + DW

String

DF  $\rightarrow$  Direction Flag : Decides direction of storing  
 $\downarrow$  for ABCDE

DF = 0  $\rightarrow$  A (inc)DF = 1  $\rightarrow$  E (dec)

STD  $\rightarrow$  Set Direction Flag : (Set DF to 1)  
 CLD  $\rightarrow$  Clears " " : (Sets DF to 0)

## MOVSB : Mov str Byte

Source

Dest

ES:DT

Dest

Src

DS:SI

↓  
only one byte at a time

if

to counter this

MOVSB

REP: ~~repeats~~ CX times

∴ To replicate n bytes:

LEA SI, ST1

LEA DI, ST2

MOV CX, N

REP MOVSB

For reversing:

STD 1

LEA SI, ST1

ADD SI, N  
MOV CX, N

LOOP:

MOVSB DI

ADD DT, 2  
LOOP

STOSB: Stores String Byte

$AL \rightarrow ES:DI$

LODSB: Loads String Byte

$DS:DI \rightarrow AL$

SCASB: Scan String Byte

checks by comparing AL with  $ES:DI$

if ZF  $\rightarrow$  found

else  $\rightarrow$  not found



REPNE/REPNE: Repeats CX times until equal

REPZ/REPZ:  $\therefore$  if CAN stop before CX is 0

CMPSB: CMP but for strings : 0

TOPIC NAME :

DAY : / /  
TIME : / /  
DATE : / /

General Form :

Check slide



## The Memory System

### Ch - 5

#### Cache :

+ based on "locality of reference"

Temporal  
(currently executed  
is most likely to be  
executed again)

Spatial  
(currently executed  
is close to the  
next one)

→ gets a "block" of memory

contiguous addresses  
locations of some  
size

16 words  $\rightarrow$  1 block

128 blocks  $\rightarrow$  2048 words (cache)  
4k  $\rightarrow$  844  $\times$  (memory)



Headache killing me  
Gd future me



GOOD LUCK

## Protocol

↓  
Write through  
↓  
updates both  
cache and main  
memory

↓  
Write Back or Copy Back

↓  
if a block is replaced,  
if writes the whole  
block to memory  
(Modified bit decides if it should  
be replaced)

## Associative Mapping



cost is higher as every tag needs to be linearly checked

## Set Associative Mapping

Mixture of direct and associative mapping

2 blocks = 1 set; Each block in memory is connected to a set in cache



\* Valid bit: 1 if a block contains valid data  
0 " " .. invalid "

Cache Coherence Problem: In writeback if data in  
memory & ~~data~~ in cache.

Sol<sup>n</sup>: Cache → Mem → Cache

## Replacement Algorithm

\* LRU

→ Oldest "

→ \* Direct Mapping DOES NOT need replacement algorithm

\* Least Recently Used: Least O2

The oldest recent used } 1, 2, 3, 1  
→ If least recently used } ↑  
1, 2, 3, 1, 2, 3  
↑

size & speed

but we want higher speed with higher size

RAM:

Cells



One row = one memory word

All cells are connected to a single line





SRAM  $\rightarrow$  6 transistors

DRAM  $\rightarrow$  1 transistor, capacitor

Latency:

Time taken to transfer a single word or data <sup>to</sup> from memory

Bandwidth:

Time to transfer a single block of data

But,

- since block is can be of any size,

Time to transfer of bits

Number of bits transferred in one second

TOPIC NAME :

DAY:

TIME:

DATE:

ROM:

PROM → Programmable

↳ Write only once

EPROM → Erasable and Programmable

EEPROM → Electrically Erasable Programmable

~~Flash~~

Flash → Similar to EEPROM

but has higher density meaning higher

- capacity

→ read one bit

→ write entire contents



TOPIC NAME :

DAY : \_\_\_\_\_

TIME : \_\_\_\_\_

DATE : / /

## Memory Hierarchy:

| Speed ↑                                                                                                                                                                     | Size ↑                                                                                                                                                              |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ol style="list-style-type: none"><li>1. Registers</li><li>Cache 2. Prim. Cache (L1)</li><li>3. Sec. Cache (L2)</li><li>4. Main memory/RAM</li><li>5. sec. Memory</li></ol> | <ol style="list-style-type: none"><li>1. Sec. Memory</li><li>2. Main memory/RAM</li><li>3. Sec. Cache (L2)</li><li>4. Main Cache(L1)</li><li>5. Registers</li></ol> |

\* SDRAM → Single Inline Memory Module

GOOD LUCK

## Measuring Cache Performance

For ideal:

We assume ~~full memory~~ cache as single memory.

& we assume it will always hit

$$t_{avg} = hC + (1-h)M \quad \begin{matrix} \rightarrow \text{Miss penalty} \\ \text{from memory} \end{matrix}$$

$\downarrow$  Cache access time  $C$

$$t_{avg} = h_1 C_1 + (1-h_1)C_1 + h_2 C_2 + (1-h_2)(1-h_1)M$$

## #Derivation of performance

### Virtual Memory

Main Mem + Part of Secondary Memory



Logical Address = (seg : offset)

↑  
virtual address

blocks of memories from secondary memory are  
called pages.



Ch - 4I/O organization

- \* Each I/O device has a unique address
- Unique addresses are sent through address bus



- \* Parts of memory are reserved for I/O devices
  - Memory Mapped → uses same instruction to access both
- \* Different addresses for I/O device
- I/O mapped → uses less mem space

polls → continuously checks

## 4) Interrupts:

- Sends a signal to pause
- Uses Interrupt Request Line (IRQ)
- \* Interrupt Service Routine → can be excluded from main program
- \* Interrupt Latency: IRQ → <sup>value</sup> save → ISR
  - ↳ save other register values
- ↳ causes time called Interrupt Latency
- ↳ to reduce this, we store PC and Status Flags

Assignment → 4.4, 4.5 |

Interrupt Acknowledgment:

lets the device know there is an interrupt

↓  
i) ACK/IACK  
control signal

↓  
ii) when addressing I/O,  
it lets the other know

Interrupt Enable : For letting I/O interrupt  
.. Disable; " not "

\*Three approaches from slide

TOPIC NAME:

## For Multiple devices:

\* which device calls interrupt:

→ Polling by ISR:

→ checks IRQ for every device

→ Self Identification (vectorized Identification):

sends 4-8 bits of ISR using data bus

↳ this identifies which IRQ is called

\* Can others device interrupt:

ISR time is very small so it doesn't stop

But some devices do

→ Priority Structure:

Priority is set  
Higher priority interrupts lower.

[Fig from slide]

## How to handle simultaneous:

### Polling:

Continuously checks and priority is decided  
on the basis of connection

### Daisy Chain:

Connected like a chain

(Sends ACK like a linked list)

Priority based off of chain position

### Prioritized Daisy Chain

Mixed  
Combined structure of Daisy and Prioritized

Some priority has same daisy chain

## Direct Memory Access.

- \* If I/O ~~is~~ needs to do anything it directly accesses from memory
- \* Processor is not involved
- \* Controllers (DMA Controller) does these instructions of processor
- \* Processor initiates DMA transfers.  
with →
  - i) Starting address of memory location
  - ii) Number of words in block
  - iii) Direction of transfer
- \* IR knows which these things →
  - Something something block state
  - Something something CPU scheduling algorithm
- \* Block st. After DMA's work is finished, it sends interrupt back to processor

DMA has 3 registers:



- \* Priority of DMA > Priority of Processors
  - \* Thus, DMA steals cycle from Processors
- \* Bus Arbitration: decides which ~~bus~~ DMA is bus master

## Interface Circuit

- \* Internal Side → Towards bus device
  - \* External Side → " " device
- ↓  
(Port)



↳ to make parallel into series, we use <sup>in</sup>/output shift registers

Send all bit at once  
↳ <sup>in</sup> sends parallelly

\* 8 bits depending on design  
\* double buffering: at a time 8 bits are used  
is serial interface

\* Something slide

Standards for expansion bus:



ISA

Expansion  
Bus

PCI

SCSI

USB

PCI (Peripheral Component Interconnect):

\* Plug and play capability

\* three address spaces



\* address is kept until target is selected

\* device configuration

## SCSI (Small Computer System Interface):

- \* SCSI controller uses DMA to send data packets



- \* We can overlap data transfer requests to perform high performance



- \* SCSI controller = Initiator
- \* " " requests for bus → using distribution
- \* Selects target → sends Bus Control to Target
- \* Target does data transfer

## SCSI Phases

- i) Arbitration
- ii) Selection
- iii) Information Transfer
- iv) Reselection

~~To read blocks of data from disk drive~~



\* DONT DO: USB, 4.5 → Multiple Clock Cycle