

1 01 0 1

**SIMATS**  
**School of Engineering**

# **Computer Architecture**

**Computer Science and Engineering**



Saveetha Institute of Medical And Technical Sciences,Chennai.

## CSA12 Computer Architecture

| S.NO                                                       | TITLE                                                   | PAGE NO |
|------------------------------------------------------------|---------------------------------------------------------|---------|
| <b>UNIT I BASIC PROCESSOR ARCHITECTURE</b>                 |                                                         |         |
| 1                                                          | FUNCTIONAL UNITS                                        | 1       |
| 2                                                          | FACTORS OF PERFORMANCE                                  | 2       |
| 3                                                          | EVOLUTION OF COMPUTERS                                  | 3       |
| 4                                                          | NUMBER SYSTEMS                                          | 3       |
| 5                                                          | 8085 ARCHITECTURE – INSTRUCTION SET, FORMATS, AND TYPES | 4       |
| 6                                                          | 8086 ARCHITECTURE – INSTRUCTION SET, FORMATS, AND TYPES | 8       |
| <b>UNIT II<br/>COMPUTER ARITHMETIC AND PROCESSING UNIT</b> |                                                         |         |
| 7                                                          | ADDRESSING MODES                                        | 12      |
| 8                                                          | INTEGER ARITHMETIC                                      | 13      |
| 9                                                          | MULTIPLICATION AND DIVISION                             | 14      |
| 10                                                         | FLOATING POINT REPRESENTATION                           | 15      |
| 11                                                         | FLOATING POINT ARITHMETIC OPERATIONS                    | 16      |
| <b>UNIT III<br/>CONTROL DESIGN</b>                         |                                                         |         |
| 12                                                         | FUNDAMENTAL CONCEPTS OF PROCESSING                      | 18      |
| 13                                                         | PROCESSOR ARCHITECTURE                                  | 19      |
| 14                                                         | INSTRUCTION HAZARDS                                     | 20      |
| 15                                                         | SUPER SCALAR PROCESSING                                 | 21      |

| S.NO                                                  | TITLE                                  | PAGE NO |
|-------------------------------------------------------|----------------------------------------|---------|
| <b>UNIT IV<br/>MEMORY ORGANIZATION</b>                |                                        |         |
| 16                                                    | CLASSIFICATIONS OF MEMORY              | 22      |
| 17                                                    | CACHE MEMORY                           | 23      |
| 18                                                    | PERFORMANCE MEASURE                    | 24      |
| 19                                                    | MEMORY HIERARCHY AND MEMORY ALLOCATION | 24      |
| 20                                                    | VIRTUAL MEMORY                         | 25      |
| <b>UNIT IV<br/>I/O ORGANIZATION &amp; INTERFACING</b> |                                        |         |
| 21                                                    | INPUT OUTPUT ORGANIZATION              | 26      |
| 22                                                    | DIRECT MEMORY ACCESS (DMA)             | 27      |
| 23                                                    | BUS ARBITRATION                        | 27      |
| <b>ANALYTICAL PROBLEMS</b>                            |                                        |         |
| 22                                                    | CPU PERFORMANCE                        | 28      |
| 23                                                    | NUMBER SYSTEMS                         | 29      |
| 24                                                    | FLOATING POINT OPERATIONS              | 30      |
| 25                                                    | PIPELINING                             | 31      |
| 26                                                    | VIRTUAL MEMORY                         | 32      |
| 27                                                    | CACHE MEMORY                           | 33      |

# CSA12

## COMPUTER ARCHITECTURE:

Computer architecture is the organisation of the components which make up a computer system and the meaning of the operations which guide its function.

### FUNCTIONAL



### BASIC OPERATION OF COMPUTER:

COMPUTER:



### FUNCTIONAL UNITS

GROUP OF LINES

Connection Path

BUS

STRUCTURE

May be  
line / Wires

Lines carry  
data or  
signals

### SINGLE BUS STRUCTURE:



### BUS STRUCTURES

#### SINGLE

- \* One transfer
- \* Low cost
- \* Low performance
- \* flexible
- \* Can attach peripheral.

#### MULTIPLE

- \* One/more transfer
- \* high cost
- \* Performance high

**BUFFER REGISTER:**  
Holds the information during data transfer.

Prevent high speed processor from being locked.

Allows the processor to switch rapidly from one device to another.

Interworking with several devices.

### DIFFERENCE BETWEEN SYNCHRONOUS AND ASYNCHRONOUS BUS

#### SYNCHRONOUS

- \* Works at a fixed clock rate
- \* Affected by clock skew.
- \* Data transfer takes place in the block

#### ASYNCHRONOUS

- \* Not dependent on a fixed clock rate.
- \* Not affected by clock skew.
- \* Data transfer is character-oriented.

### COMPUTER TYPES:

1. DIGITAL COMPUTER
2. PERSONAL COMPUTER
3. DESKTOP COMPUTER
4. NOTEBOOK COMPUTER
5. SUPER COMPUTER
6. MAINFRAMES

### TYPES OF BUS STRUCTURE:

- \* Address BUS
- \* Data BUS
- \* Control BUS

### Address BUS:

- Carry data
- Bidirectional
- fetch instruction from memory

### Data BUS:

- Carry Data.
- Bidirectional.
- fetch inst → Memory.
- Store in memory.

### Control BUS:

- Memory Read/Write
- I/o Read/Write
- Ready state.

Performance  $\rightarrow$  how fast the computer executes

The best way to measure performance is using time. The following aspects are used to measure the performance of the computer.

### ① Execution time (Response time):

The time between program starts and the completion of the program execution.

Elapsed time:

Time for execution of the program

Processor time:

Period when processor active.

Processor clock: Processor circuit controlled by a timing signal

Clock cycle: Regular time interval

Hertz (Hz): cycles per second

Throughput: total amount of work done in given time

Best performance: i) Less execution time  
ii) More throughput

Million denoted by mega (M)

Billion denoted by Giga (G)

500 million cycles per second = 500 mega Hertz

1250 million cycles/sec = 1.25 Giga Hertz

Clock periods = 2 and 0.8 nano seconds (ns)

### Formula - I FACTORS OF PERFORMANCE

$$\text{Performance} = \frac{1}{\text{Execution time}} \quad \text{--- (1)}$$

Reduction in execution time will increase throughput.

Performance of a computer is directly related to throughput and hence it is reciprocal of execution time.

$$\text{Performance}_A > \text{Performance}_B \quad \text{--- (2)}$$

can be written as

$$\frac{1}{\text{Execution time}_A} > \frac{1}{\text{Execution time}_B}$$

$\Rightarrow$  A is faster than B (or) B is slower than A

$$\frac{\text{Performance}_A}{\text{Performance}_B} = n$$

$$\Rightarrow \text{Performance}_A = n \times \text{Performance}_B$$

$\Rightarrow$  Performance of computer A is n times faster than performance of computer B

$$\frac{\text{Performance}_A}{\text{Performance}_B} = \frac{\frac{1}{\text{Execution time}_A}}{\frac{1}{\text{Execution time}_B}} = \frac{\text{Execution time}_B}{\text{Execution time}_A} = n$$

### Formula - II

CPU execution time / CPU time:

- It is the time spent by the CPU to complete task

CPU time = CPU clock cycles for program  $\times$  clock cycle time

$$\text{Clock cycle time} = \frac{1}{\text{Clock rate}}$$

$$\text{CPU time for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Clock rate}}$$

② Computer A runs a program in 12 sec with 3GHz clock. We have to design a computer B such that it can run the same program within 9 sec. Determine the clock rate for computer B. CPU design of computer B requires 1.2 times as many clock cycles as computer A.

A

12 seconds

3 GHz

CPU clock cycle of program in A

$$= \text{CPU time} \times \text{Clock rate}$$

$$= 12 \times 3 \text{ GHz}$$

$$= 36 \times 10^9 = 36 \text{ GHz}$$

$$\text{CPU clock cycle of B} = \frac{1.2 \times \text{CPU clock cycle of A}}{\text{CPU time B}}$$

$$= \frac{1.2 \times 36 \text{ GHz}}{9}$$

$$= \frac{1.2 \times 36 \times 10^9}{9} = 4.8 \text{ GHz}$$

Problem:

Consider 3 processor P<sub>1</sub>, P<sub>2</sub>, P<sub>3</sub> executing same instruction set

| Processor      | Clock rate | CPI |
|----------------|------------|-----|
| P <sub>1</sub> | 3 GHz      | 1.5 |
| P <sub>2</sub> | 2.5 GHz    | 1   |
| P <sub>3</sub> | 4 GHz      | 0.9 |

i) Which processor has the highest performance?

ii) All the processor execute a program in 10 sec. Find the number of cycles required and number of instruction required for the same program

Example:

i) If a computer A runs a program in 10 sec and computer B runs the same program in 25 sec. Who is faster and how much faster?

$$\frac{\text{Performance}_A}{\text{Performance}_B} = \frac{\frac{1}{\text{Execution time}_B}}{\frac{1}{\text{Execution time}_A}} = n$$

$$= \frac{25}{10} = 2.5$$

A is 2.5 times faster than B

## EVOLUTION OF COMPUTERS.

| Generation | Period         | Technology used     |
|------------|----------------|---------------------|
| 1st Gen.   | 1946 - 1959    | Vacuum tubes        |
| 2nd Gen.   | 1959 - 1965    | Transistors         |
| 3rd Gen.   | 1965 - 1971    | Integrated circuits |
| 4th Gen    | 1971 - 1980    | Microprocessors     |
| 5th Gen    | 1980 - present | AI & VLSI           |

## HISTORY OF COMPUTERS



## GENERATION OF COMPUTER.

### FIRST GENERATION [1940 - 1956]

- \* Vacuum Tubes - Core Technology
- \* Machine language - Low level eg:- ENIAC, UNIVAC, EDVAC, EDSAC, IBM, MARK 1, MARK 2, MARK 3
- \* Calculation in Milli Sec.
- \* Large in size \* High cost
- \* Storage was very less.

### SECOND GENERATION [1956 - 1963]

- \* Transistors
- \* Early version of High level language
- \* COBOL, FORTRAN
- \* Store instructions in Memory
- \* Faster \* Reliable
- \* Calculations in Micro Seconds
- \* Cost was high \* Cooling system.

### THIRD GENERATION [1964 - 1971]

- \* Integrated Circuits [ICs]
- \* Smaller Size than I & II GENERATION
- \* Input - output devices used.
- \* High Level Language such as BASIC, COBOL, PASCAL
- \* Calculations in Nano Seconds
- \* More storage \* less heat
- \* Cooling System.

## EVOLUTION OF COMPUTERS

### FOURTH GENERATION [1971 - 1980]

- \* Micro processor \* Smaller
- \* Powerful \* Highly Reliable
- \* High level language Like C, C++, DBASE, FOXPRO, etc.
- \* Time Sharing, Network based
- \* Reduced heat \* less cost
- \* High performance.

### FIFTH GENERATION [1980 - PRESENT]

- \* ULSI \* AI software
- \* Parallel processing Software
- \* Distributed operating Systems.
- \* HLL includes C, C++, JAVA, .NET, PYTHON, etc.
- \* Eg:- Desktop, Laptop, Tablet, Mobile phone, ultra book, etc.

## DATA REPRESENTATION

The form in which data is stored, processed, and transmitted.

### TYPES OF DATA REPRESENTATION

Decimal - Base 10  $\rightarrow 0, 1, 2, \dots, 9$ .

Eg:-  $(29)_{10}$

Binary - Base 2  $\rightarrow 0, 1$

Eg:-  $(110)_2$

Octal - Base 8  $\rightarrow 0, 1, 2, \dots, 7$  Eg:-  $(17)_8$

Hexadecimal - Base 16  $\rightarrow 0, 1, 2, \dots, 9, A, B, C, D, E, F$   
Eg:-  $(1234)_{16}$

ONE'S Complement  
 $\rightarrow 0-1, 1-0$

Two's Complement  
Add one (1) to one's complement

Bit :-  
A smallest possible unit of data.

Byte  
Group of eight bits is called a byte.

Nibble  
Half a byte is called Nibble.

| Byte Value       | Bit Value    |
|------------------|--------------|
| 1 Byte           | 8 Bits       |
| 1024 Bytes       | 1 Kilobyte   |
| 1024 Kilobytes   | 1 Megabyte   |
| 1024 Megabytes   | 1 Gigabyte   |
| 1024 Gigabytes   | 1 Terabyte   |
| 1024 Terabytes   | 1 petabyte   |
| 1024 petabytes   | 1 Exabyte    |
| 1024 Exabytes    | 1 Zettabyte  |
| 1024 Zettabytes  | 1 Yottabyte  |
| 1024 Yottabytes  | 1 Brontobyte |
| 1024 Brontobytes | 1 Geopbytes  |

### TEXT CODE

A format used to represent alphabets, punctuation marks and other symbols.

• EBCDIC - Extended Binary Coded Decimal Interchange Code.

• ASCII - American Standard Code for Information Interchange.

• Extended ASCII - Specifies character values from 128 to 255.

• Unicode - Uses 4 to 32 bits to represent letters, numbers and symbols.

# MICROPROCESSOR 8085 Pin Diagram & Registers

What is microprocessor?

\* Computer's Central Processing (CP) built on single IC (Integrated circuit) is called a microprocessor.

\* Microprocessor consists of ALU (Arithmetic & Logic unit), Control unit and Registers.

8085 Pin Diagram:-



Some important Pins are:

AD<sub>0</sub>-AD<sub>7</sub> : multiplexed Address & Data lines.

A<sub>8</sub>-A<sub>15</sub> : Higher order Address lines.

ALE : Address latch enable is an output signal. It goes high (1) when operation started.

S<sub>0</sub>, S<sub>1</sub> : status signal used to indicate type of operation.

RD : used to read data from I/O devices on memory.  
Read → Active Low

WR : used to write data on I/O Devices on memory  
write → Active Low

READY : used to check the status of output Devices.  
Low → MP will wait till (1) high (0)

TRAP : After Enabled, restart occurs and execution starts from address 0024H  
→ Highest Priority interrupt  
→ Non maskable interrupt

RSTS.5 } ; \* Maskable interrupt  
RST6.5 } ; \* Low Priority than TRAP  
RST7.5 }

INTR } ; INTR → interrupt request Signal after MP generates INTA (Interrupt Acknowledge) Signal. (Lowest Priority Interrupt)

X<sub>1</sub>, X<sub>2</sub> : Two modes of operation  
1) I/O mode ( $X_1 = 1$ )  
2) memory mode ( $X_1 = 0$ )

Clk(out) : Crystal oscillator used for internal clock generation.  
Frequency divided by two.

RESET IN : Reset the MP by setting the Program Counter to zero.

RESET OUT : Reset all devices which are connected with MP.

HOLD : Another master requesting to use the address & data bus.

HLDA : \* HOLD Acknowledge  
\* Indicates the CPU has received HOLD Request & it will relinquish the bus in next clock cycle.

SOD : HLDA set to Low after HOLD Signal removed.

SID : \* Serial output data lines  
\* Set/Reset Specified by SIM Instruction.

Vcc & Vss : \* Serial Input data lines.  
\* Data is loaded into Accumulator whenever RIM instruction executed

Vcc → ± 5V  
Vss → Ground

status codes of 8085 :

| S <sub>1</sub> | S <sub>0</sub> | operations |
|----------------|----------------|------------|
| 0              | 0              | HALT       |
| 0              | 1              | WRITE      |
| 1              | 0              | READ       |
| 1              | 1              | FETCH      |

Registers of 8085 :

\* Used for temporary storage and manipulation of data & instruction.

\* Data remain in the register till sent to I/O devices or memory.

i) Accumulation (A) :

\* 8 bit Register associated with ALU

\* Hold one operand of ALU operation.

2) General Purpose Registers

\* Contains 6 general purpose registers - (B, C, D, E, H & L)

\* Combination of two 8 bit registers called register pair. (Holds 16 bit data). (B-E, D-F & H-L)

3) Program Counter :-

\* 16 bit special purpose register.

\* Holds address of memory of next instruction to be executed.

\* Track the instruction in a program when executed.

\* Increment the content of next Program Counter (PC) during execution.

4) Stack Pointer (SP) :

\* 16 bit special purpose register used as memory pointer.

\* Portion of RAM (controls the address of stack).

5) Instruction Register (IR) :

\* Holds opcode of the instruction which is being decoded & executed.

6) Temporary Register : (8 bit register)

\* Holds data during ALU operation.

\* Not accessible to programmer.

Flag Register :

\* Five flipflops to serve as status flag.

\* Set/Reset according to ALU operations



## 8085 Architecture & features



## Features of 8085 HP:

- \* 40 pin IC package fabricated on a single chip
  - \* Uses a single  $\pm 5$  v.d.c
  - \* clock speed about 8MHz, clock cycle of 320ns
  - \* 8 bit Data bus
  - \* 16-bit Address Bus (address upto 64 kB)

- \* 16 bit stack pointer
  - \* 16 bit program Counter (pc)
  - \* Six 8 bit registers (B, C, D, E, H, L)  
Register pair (B-C, D-E, H-L)
  - \* consists of 80 basic instruction sets & 246 opcodes

**Application:** Mobile phones, Microwave ovens, etc.

## Architecture :-

- \* Interrupt control unit consists

- a) INTR
  - b) INTA
  - c) RST 5.5
  - d) RST 6.5
  - e) RST 7.5
  - f) TRAP

- Arithmetic & logic unit perform operations such as

- 1) Addition
  - 2) Subtraction
  - 3) Logical AND
  - 4) Logical OR
  - 5) Logical EXOR
  - 6) Logical NOT
  - 7) Increment
  - 8) Decrement
  - 9) shift, etc

#### 4) Status

- a)  $S_0, S_1$   
 b)  $\bar{r}_0 | \bar{M}$

5) DMA

- a) HOLD
  - b) HLDA

## Flag Register

|                   |                |                |                |                |                |                |                |
|-------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| 89:-              | AC             |                |                |                |                |                |                |
|                   | $\swarrow$     |                |                |                |                |                |                |
| 1 0 1 1 1 0 1 0   |                |                |                |                |                |                |                |
| + 0 1 1 0 1 0 0 1 |                |                |                |                |                |                |                |
| <hr/>             |                |                |                |                |                |                |                |
| 1 0 0 1 0 0 0 0 1 |                |                |                |                |                |                |                |
| CY                |                |                |                |                |                |                |                |
| B <sub>7</sub>    | B <sub>6</sub> | B <sub>5</sub> | B <sub>4</sub> | B <sub>3</sub> | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> |
| 0                 | 0              | X              | 1              | X              | 0              | X              | 1              |
| S                 | Z              | -              | AC             | -              | P              | -              | CY             |

| Machine cycle                               | Status            |       | Control |                 | INTA            |   |
|---------------------------------------------|-------------------|-------|---------|-----------------|-----------------|---|
|                                             | $\overline{RQIN}$ | $S_1$ | $S_0$   | $\overline{RD}$ | $\overline{WR}$ |   |
| opcode Fetch (OF)                           | 0                 | 1     | 1       | 0               | 1               | 1 |
| Memory Read ( $\overline{MEMR}$ )           | 0                 | 1     | 0       | 0               | 1               | 1 |
| Memory write ( $\overline{MEMW}$ )          | 0                 | 0     | 1       | 1               | 0               | 1 |
| $\overline{I/O}$ Read ( $\overline{IOR}$ )  | 1                 | 1     | 0       | 0               | 1               | 1 |
| $\overline{I/O}$ write ( $\overline{IOW}$ ) | 1                 | 0     | 1       | 0               | 0               | 1 |

# 8085 INSTRUCTION SET

## Instruction format:

\* One Byte Instructions

Eg: RRC, CMA, MOV B, C

\* Two Byte Instructions

Eg: MVI B, 45H

\* Three Byte Instructions

Eg: CALL, JMP

## Instruction set:

### 1) Data transfer Instruction set:

\* No flags are affected

| Instruction set                     | Explanation                     | Addressing | Example  |
|-------------------------------------|---------------------------------|------------|----------|
| MOV R <sub>1</sub> , R <sub>2</sub> | R <sub>1</sub> ← R <sub>2</sub> | Register   | MOV A, B |

|                       |                       |                   |          |
|-----------------------|-----------------------|-------------------|----------|
| I → Machine Cycle(MC) | R <sub>1</sub> ← [HL] | Register Indirect | MOV B, M |
|-----------------------|-----------------------|-------------------|----------|

|                       |                       |                   |          |
|-----------------------|-----------------------|-------------------|----------|
| MOV M, R <sub>1</sub> | [HL] ← R <sub>1</sub> | Register Indirect | MOV M, C |
|-----------------------|-----------------------|-------------------|----------|

|                           |                       |           |            |
|---------------------------|-----------------------|-----------|------------|
| MVI R <sub>1</sub> , Data | R <sub>1</sub> ← data | Immediate | MVI B, 45H |
|---------------------------|-----------------------|-----------|------------|

|                           |                                                                                          |           |              |
|---------------------------|------------------------------------------------------------------------------------------|-----------|--------------|
| LXI R <sub>P</sub> , Data | R <sub>P</sub> ← 16 bit data<br>R <sub>H</sub> ← 8 bit msb<br>R <sub>L</sub> ← 8 bit LSB | Immediate | LXI H, 2000H |
|---------------------------|------------------------------------------------------------------------------------------|-----------|--------------|

|          |            |        |           |
|----------|------------|--------|-----------|
| LDA addr | A ← [addr] | Direct | LDA 2000H |
|----------|------------|--------|-----------|

|          |            |        |           |
|----------|------------|--------|-----------|
| STA addr | [addr] ← A | Direct | STA 3000H |
|----------|------------|--------|-----------|

|           |                                         |        |            |
|-----------|-----------------------------------------|--------|------------|
| LHLD addr | C <sub>L</sub> ← [addr]<br>CH ← [addrH] | Direct | LHLD 2500H |
|-----------|-----------------------------------------|--------|------------|

|           |                                                    |        |            |
|-----------|----------------------------------------------------|--------|------------|
| SHLD addr | [addr] <sub>L</sub> ← L<br>[addr] <sub>H</sub> ← H | Direct | SHLD 2600H |
|-----------|----------------------------------------------------|--------|------------|

|      |             |          |      |
|------|-------------|----------|------|
| XCHG | [HL] ← [0E] | Register | XCHG |
|------|-------------|----------|------|

\* Similar to LDA & STA, LOAX  
for 16 bit notation (2 → MC)

### 2) Arithmetic Instruction:-

| Instruction set | Explanation        | Flags | Addressing        | Example |
|-----------------|--------------------|-------|-------------------|---------|
| ADD A           | A ← A + R          | All   | Register          | ADD B   |
| ADD M           | A ← A + [H-L]      | All   | Register indirect | ADD M   |
| ACC n           | A ← A + n + CY     | All   | Register          | ACC B   |
| ADC M           | A ← A + [H-L] + CY | All   | Register indirect | ADC M   |
| ADI data        | A ← A + Data       | All   | Immediate         | ADI 45H |
| ACI data        | A ← A + data + CY  | All   | Immediate         | ACI 50H |
| DAD RP          | H-LC ← H-L+RP      | CY    | Register          | DAD B   |

Similar to Addition, Subtraction can be expressed as  
SUB R, SVB M, SBBM SVI data, SB1 data

|       |                      |               |                   |       |
|-------|----------------------|---------------|-------------------|-------|
| INR R | R ← R+1              | All except CY | Register          | INR B |
| INR M | [HL] ← E<br>[HL] ← I | All except CY | Register indirect | INRM  |

|        |           |      |          |       |
|--------|-----------|------|----------|-------|
| INX RP | RP ← RP+1 | None | Register | INX H |
|--------|-----------|------|----------|-------|

|     |                    |   |   |     |
|-----|--------------------|---|---|-----|
| DAA | Decimal Adjust Acc | - | - | DAA |
|-----|--------------------|---|---|-----|

Similar to increment, Decrement instruct is  
expressed as DCR R, DCRM, DCX RP.

### 3) Logical Instruction:

| Instruction set | Explanation        | Flags | Addressing        | Example |
|-----------------|--------------------|-------|-------------------|---------|
| AN A R          | A ← (A) AND (R)    | All   | Register          | ANAC    |
| AN A M          | A ← (A) AND ([MC]) | All   | Register indirect | ANAM    |
| ANI data        | A ← (A) AND (data) | All   | Immediate         | ANI 22H |

Similar AND operation, OR and EXOR operations can be expressed as  
1) ORA R, ORAM, ORI data Eg: (A ← (A) OR (R))  
AT&A N (M)  
2) XRA R, XRAM, XRI data Eg: (A ← (A) XOR (R))  
AT ← (A) XOR (M)

|       |                                              |      |          |      |
|-------|----------------------------------------------|------|----------|------|
| CMA   | A ← ~A                                       | None | Implicit | CMA  |
| CNC   | CY ← CY                                      | CY   | Implicit | CNC  |
| STC   | CY ← 1                                       | CY   | Implicit | STC  |
| SETC  | Set carry                                    |      |          |      |
| CMP R | Compare A and R register<br>(A>B, A=CB, A=B) | All  | Register | CMPB |

|          |              |     |           |         |
|----------|--------------|-----|-----------|---------|
| CMP M    | CMP A & M    | All | Indirect  | CMP M   |
| CPI data | CMP A & data | All | Immediate | CPI 22H |

|     |                                                         |    |          |     |
|-----|---------------------------------------------------------|----|----------|-----|
| RLC | AN+1 ← AN<br>A ← A+ <sub>1</sub><br>CY ← A <sub>1</sub> | CY | Implicit | RLC |
|-----|---------------------------------------------------------|----|----------|-----|

|     |                                                         |    |          |     |
|-----|---------------------------------------------------------|----|----------|-----|
| RRC | AN ← AN+1<br>A ← A- <sub>1</sub><br>CY ← A <sub>1</sub> | CY | Implicit | RRC |
|-----|---------------------------------------------------------|----|----------|-----|

|     |                                 |    |          |     |
|-----|---------------------------------|----|----------|-----|
| RAL | AN+1 ← AN<br>CY ← AT<br>AT ← CY | CY | Implicit | RAL |
|-----|---------------------------------|----|----------|-----|

\* Similarly RAR with Reverse steps of RAL

# BRANCHING, LOOPING, COUNTING and INDEXING OPERATIONS

7

## JUMP:

### CONDITIONAL JUMP INSTRUCTIONS:-

| Instruction | Explanation                           | Example  |
|-------------|---------------------------------------|----------|
| JC          | Jump on Carry if Carry flag is '1'    | JC 2000  |
| JNC         | Jump on No Carry If Carry flag is '0' | JNC 2050 |
| JZ          | Jump on Zero if Zero flag is '1'      | JZ 2000  |
| JNZ         | Jump on NO zero, If zero flag is '0'  | JNZ 2050 |
| JPE         | Jump on Parity Even ; PE is '1'       | JPE 2000 |
| JPO         | Jump on Parity ODD ; PF is '0'        | JPO 2050 |
| JM          | Jump on Sign 'minus' If SF is '1'     | JM 2000  |
| JP          | Jump on Sign 'plus' If SF is '0'      | JP 2050  |

## CALL:

### CONDITIONAL CALL INSTRUCTIONS:-

| Instruction | Explanation                   | Example. |
|-------------|-------------------------------|----------|
| CC          | Call on Carry flag CF: '1'    | CC 2000  |
| CNC         | Call on No Carry flag CF: '0' | CNC 2050 |
| CZ          | Call on Zero flag ZF: '1'     | CZ 2000  |
| CNZ         | Call on No Zero flag ZF: '0'  | CNZ 2050 |
| CPE         | Call on Parity Even PF: '1'   | CPE 2000 |
| CPO         | Call on Parity ODD PF: '0'    | CPO 2050 |
| CM          | Call on Minus SF: '1'         | CM 2000  |
| CP          | Call on Plus SF: '0'          | CP 2050  |

## Three Types of Branching:-

- ① JUMP (Conditional & unConditional)
- ② CALL (Conditional & unConditional)
- ③ RET (Conditional & unConditional).

### UNCONDITIONAL JUMP :-

JMP address ; jump to the address  
Ex : JMP 2000.

### UNCONDITIONAL CALL :-

CALL address ; Call to the address

### UNCONDITIONAL RETURN :-

RET address ; Return to the address

Ex: CALL 2000

Ex: RET

### Looping in 8085 :-

→ Instructs MicroProcessor to repeat tasks.

#### ① Continuous Loops

#### ② Conditional Loops.

Continuous Loops :- Repeats a task Continuously.

Conditional Loop :- Repeats a task if Some conditions are satisfied



Flowchart for Continuous loop

## RETURN:-

### CONDITIONAL RET INSTRUCTIONS

| Instruction | Explanation                   | Example |
|-------------|-------------------------------|---------|
| RC          | Return on Carry CF: '1'       | RC      |
| RNC         | Return on No Carry CF: '0'    | RNC     |
| RZ          | Return on Zero ZF: '1'        | RZ      |
| RNZ         | Return on No zero ZF: '0'     | RNZ     |
| RPE         | Return on Parity EVEN PF: '1' | RPE     |
| RPO         | Return on Parity ODD PF: '0'  | RPO     |
| RM          | Return on Minus signFlag: '1' | RM      |
| RP          | Return on Plus signFlag: '0'  | RP      |

### INPUT-OUTPUT & STACK INSTRUCTIONS:-

\* IN Port address ; Input to accumulator from I/O Port

\* OUT Port address ; output from accumulator to I/O Port

\* PUSH Rp ; Push Content of Rp to stack.

\* PUSH PSW ; Push processor status word.

\* POP Rp ; Pop Content from stack.

\* POP PSW ; Pop Processor Status Word.

### MACHINE CONTROL INSTRUCTIONS:-

\* HLT ; Halt

\* XTHL ; Exchange stack stop with H-L.

\* SPHL ; HL-pair to stack pointer.

\* EI ; Enable Interrupt

\* DI ; Disable Interrupt

\* SIM ; Set Interrupt Mask

\* RIM ; Read Interrupt Mask

\* NOP ; no operation.

# 8086 MICROPROCESSOR

## FEATURES

- Design : Intel - 1976 → Updated Version of 8085 Microprocessor
- 16 bit, N-channel High Speed Metal Oxide Semiconductor
- Built on Single semiconductor chip and packed in 40 pin IC pack (Dual Inline Package)
- Uses 20 Address lines & 16 Data lines
- Addresses up to  $2^{20}$  = 1 Mbyte Memory
- Size of I/O :  $2^{16}$  = 64 kB

## MEMORY INTERFACE



→ 8086 Microprocessor consists of two units

### 1. Bus Interface Unit

↳ Contains Segment Registers, Instruction Pointer and 6 Byte Instruction Queue

### 2. Execution Unit

↳ Contains Programmable Registers, Index Registers, Pointer Register and Flag Registers & ALU

## REGISTERS

|    |    |    |
|----|----|----|
| AX | AH | AL |
| BX | BH | BL |
| CX | CH | CL |
| DX | DH | DL |

|    |
|----|
| CS |
| SS |
| DS |
| ES |

|      |
|------|
| FLAG |
| PSW  |

|    |
|----|
| SP |
| BP |
| SI |
| DI |
| IP |

### GENERAL PURPOSE REGISTERS

AX - Accumulator  
BX - Base  
CX - Counter  
DX - Data

CS - Code Segment  
SS - Stack Segment  
DS - Data Segment  
ES - Extra Segment

Two Operating modes : Maximum mode  
Minimum mode

### FLAGS

|    |    |    |    |    |    |   |   |   |   |                |   |   |   |                |   |
|----|----|----|----|----|----|---|---|---|---|----------------|---|---|---|----------------|---|
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5              | 4 | 3 | 2 | 1              | 0 |
| X  | X  | X  | X  | 0  | 1  | T | S | Z | X | A <sub>c</sub> | X | P | X | C <sub>y</sub> |   |

### Conditional Flags

- \* Carry Flag (C<sub>y</sub>)
- \* Auxiliary Flag (A<sub>c</sub>)
- \* Parity Flag (P)
- \* Zero Flag (Z)
- \* Sign Flag (S)
- \* Overflow Flag (O)

### Control Flags

- \* Trap Flag (T)
- \* Interrupt Flag (I)
- \* Direction Flag (D)



## PIN CONFIGURATION

|                  |    |    |                                  |
|------------------|----|----|----------------------------------|
| GND              | 1  | 40 | V <sub>cc</sub>                  |
| AD <sub>14</sub> | 2  | 39 | AD <sub>15</sub>                 |
| AD <sub>13</sub> | 3  | 38 | A <sub>..</sub> / S <sub>3</sub> |
| AD <sub>12</sub> | 4  | 37 | A <sub>..</sub> / S <sub>4</sub> |
| AD <sub>11</sub> | 5  | 36 | A <sub>18</sub> / S <sub>5</sub> |
| AD <sub>10</sub> | 6  | 35 | A <sub>19</sub> / S <sub>6</sub> |
| AD <sub>9</sub>  | 7  | 34 | BHE / S <sub>7</sub>             |
| AD <sub>8</sub>  | 8  | 33 | MN / MX                          |
| AD <sub>7</sub>  | 9  | 32 | RD                               |
| AD <sub>6</sub>  | 10 | 31 | RD / GT <sub>0</sub> (HOLD)      |
| AD <sub>5</sub>  | 11 | 30 | RQ / GT <sub>1</sub> (HLDA)      |
| AD <sub>4</sub>  | 12 | 29 | LOCK (WR)                        |
| AD <sub>3</sub>  | 13 | 28 | S <sub>2</sub> (MIO)             |
| AD <sub>2</sub>  | 14 | 27 | S <sub>1</sub> (DT/R)            |
| AD <sub>1</sub>  | 15 | 26 | S <sub>0</sub> (DEN)             |
| AD <sub>0</sub>  | 16 | 25 | QS <sub>0</sub> (ALE)            |
| NMI              | 17 | 24 | QS <sub>1</sub> (INTA)           |
| INTR             | 18 | 23 | TEST                             |
| CLK              | 19 | 22 | READY                            |
| GND              | 20 | 21 | RESET                            |

\* V<sub>cc</sub> → Power Supply ⇒ +5 V

\* AD<sub>0</sub> - AD<sub>15</sub> → A 16 Bit Address Data Bus

\* A<sub>16</sub> - A<sub>19</sub> → Higher Order Address lines & Multiplexed with status signals

\* BHE / S<sub>7</sub> → Bus High Enable / Status

\* RD → Read , GND - Ground

\* RESET → System Reset

\* CLK (i/p) → Clock 5, 8, 10 MHz

\* INTR → Interrupt Request

\* NMI → Non Maskable Interrupt Request

# 8086 INSTRUCTION SET

## INSTRUCTION SETS OF 8086 MICROPROCESSOR

### TYPES OF INSTRUCTION SETS

- \* DATA TRANSFER
- \* ARITHMETIC
- \* LOGICAL
- \* STRING MANIPULATIONS
- \* CONTROL TRANSFERS
- \* PROCESSOR CONTROL

### \* DATA TRANSFER

MOV = MOVE

Register / Memory to / from Register

Immediate to Register / Memory

Immediate to Register

Memory to Accumulator

Accumulator to Memory

Register / Memory to Segment Register

Segment Register to Register / Memory

PUSH = Push :

Register / Memory

Register

Segment Register

POP = Pop :

Register / Memory

Register

Segment Register

XCHG = Exchange

Register / Memory with Register

Register with Accumulator

IN = Input from:

Fixed port

Variable port

XLAT = Translate Byte to AL

LEA = Load EA to Register

LDS = Load pointer to DS

LAHF = Load AH with Flags

LES = Load pointer to ES

SAHF = Store AH into Flags

PUSHF = Push Flags

POPF = Pop Flags

### \* ARITHMETIC

ADD = Add :

Reg / Memory with Register to Either

Immediate to Register / Memory

ADC = Add with carry :

Reg / Memory with Register to Either

Immediate to Register / Memory

Immediate to Accumulator

INC = Increment :

Register / Memory

Register

AAA = ASCII Adjust for Addition

DAA = Decimal Adjust for Addition

SUB = Subtract

Reg / Memory and Register to Either

Immediate from Register / Memory

Immediate from Accumulator

SBB = Subtract with Borrow

Reg / Memory and Register to Either

Immediate from Register / Memory

Immediate from Accumulator

DEC = Decrement :

Register / Memory

Register

NEG = Change sign

CMP = Compare :

Register / Memory and Register

Immediate with Register / Memory

Immediate with Accumulator

AAS = ASCII Adjust for Subtract

DAS = Decimal Adjust for Subtract

MUL = Multiply (Unsigned)

IMUL = Integer Multiply (Signed)

AAM = ASCII Adjust Multiply

DIV = Divide (Unsigned)

IDIV = Integer Divide (Signed)

AAD = ASCII Adjust for Divide

CBW = Convert Byte to Word

CWD = Convert Word to Doubleword

### \* LOGICAL

NOT = Invert

SHL / SAL = Shift Logical / Arithmetic

Left

SHR = Shift Logical Right

SAR = Shift Arithmetic Right

ROL = Rotate Left

ROR = Rotate Right

RCL = Rotate through Carry Flag, Left

RCR = Rotate through carry Right

AND = And :

Reg / Memory and Register to Either

Immediate to Register / Memory

Immediate to Accumulator

TEST = And Function to Flags, No Result:

Register / Memory and Register

Immediate Data and Register / Memory

Immediate Data and Accumulator

OR = OR :

Reg / Memory and Register to Either

Immediate to Register / Memory

Immediate to Accumulator

XOR = Exclusive OR :

Reg / Memory and Register to Either

Immediate to Register / Memory

Immediate to Accumulator

### \* STRING MANIPULATIONS

REP = Repeat

MOVSB = Move Byte / word

CMPSB = Compare Byte / word

SCASB = Scan Byte / word

LODSB = Load Byte / word to AL / AX

STOSB = Store Byte / word from AX /

# BRANCHING AND LOOPING OF 8086

## INSTRUCTION SETS OF 8086 MICROPROCESSOR

### \* CONTROL TRANSFER

CALL = call :

Direct within segment

Indirect within segment

Direct Intosegment

Indirect Intosegment

JMP - Unconditional Jump:

Direct within segment

Direct within segment - short

Indirect within segment

Direct Intosegment

Indirect Intosegment

RET = Return from CALL:

within segment

within seg Adding Immediate to sp

Intosegment

Intosegment Adding Immediate to sp

JE/JZ = Jump on equal / zero

JL/JNGE = Jump on less / Not

Greater or Equal

JLE/JNG = Jump on less or  
equal / Not greater

JB/JNAE = Jump on Below / Not Above  
or equal

JBE/JNA = Jump on Below or equal /  
Not above

**JPL/JPE** = Jump on parity | parity even  
**JO** = Jump on overflow  
**JS** = Jump on sign  
**JNE/JNZ** = Jump on Not Equal / Not zero.  
**JNL/JGE** = Jump on Not Less / Greater  
or Equal  
**JNLE/JG** = Jump on Not less or  
equal / Greater  
**JNB/JAE** = Jump on Not Below/Above  
or Equal  
**JNBE/JA** = Jump on Not Below or  
equal / Above  
**JNP/JPO** = Jump on Not pos/par odd  
**JNO** = Jump on Not overflow  
**JNS** = Jump on Not sign  
**LOOP** = LOOP CX Times  
**LOOPZ/LOOPNE** = Loop while zero/  
Equal  
**LOOPNZ/LOOPNE** = Loop while Not  
zero / equal  
**JCXZ** = Jump on CX zero  
**INT** = Interrupt  
**Type Specified**  
**Type 3**  
**INTO** = Interrupt on overflow  
**IRET** = Interrupt Return  
**Eg: INT 02H** 

### \* PROCESSOR CONTROL

**CLC** = clear carry

**CMC** = complement carry

**STC** = set carry

**CLD** = clear direction

**STD** = set direction

**CLI** = clear interrupt

**STI** = set interrupt

**HLT** = Halt

**WAIT** = wait

**ESC** = Escape (to external device)

**LOCK** = Bus Lock prefix

### Operation:

\* **STC** → To set carry flag  
to '1'

\* **CLC** → clear/revert carry  
flag to '0'.

\* **CMC** → put complement  
at the stat of CF

\* **STD & CLD**

\* Set Direction flag to '1'  
→ Clear / revert direction  
flag to '0'.

\* Directing the CPU  
Execution Sequence  
in an order  
\* Controlling CPU

### \* CONDITIONAL BRANCH INSTRUCTION

1. **JZ/JE** → zero

2. **JNZ/JNE** → NOT zero / 6

3. **JS** → sign

4. **JNS** → NOT sign

5. **JO** → overflow

6. **JNO** → NOT overflow

7. **JPL/JPE** → parity / parity even

8. **JNP** → NOT parity

9. **JB/JNAE/JC** → Below, Not Above  
(or) Equal

10. **JNB/JAE/JNC** → Not Above, Above  
(or) equal

11. **JBE/JNA** → Above Below / equal

12. **JNBE/JA** → Above

13. **JL/JNGE** → less, Not greater (or)  
equal

14. **JNL/JGE** → Not less, Greater (or)  
equal

15. **JLE/JNC** → less / equal

16. **JNLE/JE** → Not less / equal

### UNCONDITIONAL CALL JUMP,

### RETURN:

Eg: **Jmp 2050H**

**RET**

**CALL 2000H**

INSTRUCTION SET:

- \* Set of codes that computer processor can understand.
- \* Code is 1s and 0s, or machine language.
- \* Contains instruction or tasks that control the movement of bits and bytes within the processor.

EXAMPLES OF SOME INSTRUCTION SETS -

- ADD - Add two numbers together.
- JUMP - Jump to designated RAM address.
- LOAD - Load information from RAM to the CPU.

TYPES OF INSTRUCTION SET1. Reduced Instruction Set Computer (RISC)

The concept of RISC involves an attempt to reduce execution time by simplifying the instruction set of computers

- Relatively few instructions
- Relatively few addressing modes
- Memory access limited to load and store instructions.
- All operations done within the register of the CPU.
- Single-cycle instruction execution
- Fixed length, easily decoded instruction format.
- Hardwired rather than micro programmed control.

2. Complex Instruction Set Computer (CISC)

CISC is a computer where a single instruction can perform numerous low-level operations like a load from memory and a store from memory, etc.

- A large number of instructions typically from 100 to 250 instructions.
- Some instruction that performs specialized tasks and are used infrequently.
- A large variety of addressing modes typically from 5 to 20 different modes.
- Instruction that manipulate operands in memory.

INSTRUCTION FORMATS

Groups of bits used to perform a particular operation on the data

ELEMENTS OF INSTRUCTIONSGENERAL INSTRUCTION FORMATINSTRUCTION TYPESINSTRUCTION TYPES:

- ⇒ Data processing - ALU instruction
- ⇒ Data storage - Memory instruction
- ⇒ Data Movement - Data transfer instruction
- ⇒ Control - Test and Branch instruction

ACCORDING TO NO. OF ADDRESS\* 3 address instruction

3 operands used

Eg: ADD C, A, B [C ← A + B]

\* 2 address instruction

2 operands used

Eg: ADD A, B [A ← A + B]

\* One address instruction

Only one operand used

Eg: ADD B [AC] ← B + [AC]

AC = Accumulator

\* ZERO ADDRESS INSTRUCTION

Eg: CMA [AC] ← [AC]

MEMORY REFERENCE INSTRUCTION

| MODE  | OPCODE | MEMORY ADDRESS |
|-------|--------|----------------|
| 15 14 | 12 11  | 0              |

OPCODE = 000 to 110

REGISTER REFERENCE INSTRUCTION

| MODE  | OPCODE | REGISTER |
|-------|--------|----------|
| 15 14 | 12 11  | 0        |

OPCODE = 111 AND Addressing Mode I = 0

INPUT & OUTPUT -

| MODE | OPCODE |
|------|--------|
|      | I = 1  |

OPCODE =

INSTRUCTION FORMAT

$$X = (A+B) * (C+D)$$

ZERO ADDRESS FORMAT

PUSH A TOP = A

PUSH A TOP = B

ADD TOP = A + B

PUSH C TOP = C

PUSH D TOP = D

ADD TOP = C + D

MUL TOP = (C \* D) \* (A + B)

POP X M[X] = TOP

ONE ADDRESS FORMAT

LOAD A AC = M[A]

ADD B AC = AC + M[B]

STORE T M[T] = AC

LOAD C AC = M[C]

ADD D AC = AC + M[D]

MUL T AC = AC \* M[T]

STORE X M[X] = AC

TWO ADDRESS FORMAT

MOV R1, A R1 = M[A]

ADD R1, B R1 = R1 + M[B]

MOV R2, C R2 = C

ADD R2, D R2 = R2 + D

MUL R1, R2 R1 = R1 \* R2

MOV X, R1 M[X] = R1

THREE ADDRESS FORMAT

ADD R1, A, B R1 = M[A] + M[B]

ADD R2, C, D R2 = M[C] + M[D]

MUL X, R1, R2 M[X] = R1 \* R2

## ADDRESSING MODES

- The way in which the operand of an instruction is specified.

### Effective address (EA)

- The memory address obtained from the computation dictated by the given addressing mode.

## Types of Addressing Modes:

### i.) Register addressing mode.



\*. Operand is in the register.

\*. Name of register is specified in the instruction.

Eg: MOV R1, R2

- Copies constants of register R2 to R1.  
EA - NO, Use: Fast Access.

### ii.) Absolute or Direct Addressing mode.

\*. Operand address is given explicitly as part of the instruction.



L> Copies the contents in memory location 2000 into register.

### iii.) Immediate Addressing Mode:

→ Operand is explicitly given in the instruction.

→ # symbol indicates value is immediate operand.

Eg: MOV A, #20

↳ Copies Operand 20 into register A.



### iv.) Indirect Addressing mode:

→ Instruct contains address of memory which refers - address of operand.

Eg: MOV A, [1000].

[ ] - Memory use: for pointers



### v.) Register Indirect Addressing Mode:

- Register Contains address of Operand.



### vi.) Displacement Addressing Mode:



- Combines direct addressing and register indirect addressing.

$$EA = \text{Base value} + (R)$$

Contents of referenced register.

## ADDRESSING MODES

Eg:- MOV R2, 20 [R1].

20 + [1000].

$$EA \Rightarrow 1020$$

3 Variations:

↳ Relative addressing

↳ Base register

↳ Index addressing.

### vii.) Relative Addressing Mode:

- referenced register is program counter (Pc) → also known as Pc-relative addressing

$$EA = PC + \text{Address part of instruction}$$

Eg:- JNZ BACK

→ Causes program execution to go to the branch target location (BACK)  
- if condition is satisfied.

### viii.) Base register addressing:

- The referred register contains main memory address

- Address field contains -displacement

$$EA = R + (R)$$

R - Base address

A - Displacement

$$\begin{aligned} \text{Eg: } & \text{MOV B, } [R+8] \\ & R \rightarrow 1000 \\ & \Rightarrow 1000 + 8 = 1008. \end{aligned}$$

### ix.) Index Addressing Mode:

$$EA = \text{memory address} + (R).$$

Eg: MOV R1, [R1 + RI]

↳ Reference register RI gives the displacement.

↓  
Main memory address given by R1.



### x.) Auto Increment Addressing Mode:

- After accessing the operand, the contents of this register are auto incremented.

Eg: MOV R0, (R2) +

↳ Copies R0 into memory location whose address is specified in R2.  
↳ R2 is incremented then.

### xi.) Auto-decrement Addressing Mode:

- Contents of a register specified in the Inst are decremented and then used as EA

Eg: MOV R1, -(R0)

### xii.) Stack Addressing Mode:

→ SP

↳ contains the address of Top of stack (Tos)

↳ Operand stored.

Eg: PUSH R

↳ decrement sp and copies Content of R on Tos pointed by stack pointer.

## ASSEMBLY LANGUAGE

A type of low-level programming language that is intended to communicate directly with a computer's hardware.

## Integer Arithmetic - Addition & Subtraction

**Relation between Addition & Subtraction:**

$$(\pm A) - (\mp B) = (\pm A) + (-B)$$

$$(\pm A) - (-B) = (\pm A) + (+B)$$

**3 major representation of signed integers:-**

- \* Sign and magnitude

- \* One's complement (1's)

- \* Two's complement (2's)

**Binary Signed integer representation:-**

| b <sub>3</sub> b <sub>2</sub> b <sub>1</sub> b <sub>0</sub> | Sign & magnitude | 1's complement | 2's complement |
|-------------------------------------------------------------|------------------|----------------|----------------|
| 0000                                                        | +0               | +0             | +0             |
| 0001                                                        | +1               | +1             | +1             |
| 0010                                                        | +2               | +2             | +2             |
| 0011                                                        | +3               | +3             | +3             |
| 0100                                                        | +4               | +4             | +4             |
| 0101                                                        | +5               | +5             | +5             |
| 0110                                                        | +6               | +6             | +6             |
| 0111                                                        | +7               | +7             | +7             |
| 1000                                                        | -0               | -7             | -8             |
| 1001                                                        | -1               | -6             | -7             |
| 1010                                                        | -2               | -5             | -6             |
| 1011                                                        | -3               | -4             | -5             |
| 1100                                                        | -4               | -3             | -4             |
| 1101                                                        | -5               | -2             | -3             |
| 1110                                                        | -6               | -1             | -2             |
| 1111                                                        | -7               | -0             | -1             |

**Addition in 2's complement :-**

$$\begin{array}{r}
 (4) \quad 0100 \\
 + (3) \quad 0011 \\
 \hline
 (-7) \quad 0111
 \end{array}$$

**Subtraction in 2's complement:-**

Step 1:- Take 2's complement of B

Step 2:- Result  $\Rightarrow A +$  Step 1 Answer

Step 3:- If carry generated, result  $\Rightarrow +ve$

(Ignore carry)

Step 4:- If no carry, result  $\Rightarrow -ve$

**Ex:-**

$$(28)_{10} - (15)_{10}$$

$$(28)_{10} \quad 011100$$

$$\begin{array}{r} \text{2's complement} \\ \text{of } (15)_{10} \end{array} \quad \begin{array}{r} 110001 \\ \hline 001101 \end{array} \quad (13)_{10}$$

**Adders:**

overflow:  $(+) + (+) = -ve$   
 condition  $(-) + (-) = +ve$

(i) Half adder :-

| Inputs |   | Outputs |     |
|--------|---|---------|-----|
| A      | B | Carry   | Sum |
| 0      | 0 | 0       | 0   |
| 0      | 1 | 0       | 1   |
| 1      | 0 | 0       | 1   |
| 1      | 1 | 1       | 0   |



(ii) Full adder

| Inputs |   | Outputs         |       |     |
|--------|---|-----------------|-------|-----|
| A      | B | C <sub>in</sub> | Carry | Sum |
| 0      | 0 | 0               | 0     | 0   |
| 0      | 0 | 1               | 0     | 1   |
| 0      | 1 | 0               | 0     | 1   |
| 0      | 1 | 1               | 1     | 0   |
| 1      | 0 | 0               | 0     | 1   |
| 1      | 0 | 1               | 1     | 0   |
| 1      | 1 | 0               | 1     | 0   |
| 1      | 1 | 1               | 1     | 1   |

$$\text{Sum} = A \oplus B \oplus C_{in}$$

$$\text{Carry} = AB + BC + CA$$



**Carry Look Ahead Adder (CLA):**

\* CLA improves Speed by reducing the amount of time required to determine carry bits.

**CLA concept:-**

\* Create two Signals:

1.) P (Carry Propagation)

2.) G<sub>i</sub> (Carry generator)

\* Carry propagator (P) is propagated to the next level

\* Carry generator (G<sub>i</sub>) is used to generate the output carry regardless of input carry.

$$P_i = A_i \oplus B_i \quad \text{--- (1)}$$

$$G_i = A_i \cdot B_i \quad \text{--- (2)}$$

$$\text{Sum} = P_i \oplus C_i \quad \text{--- (3)}$$

$$C_{i+1} = G_i + (P_i \cdot C_i) \quad \text{--- (4)}$$

\* For designing 4-bit CLA, we need i = 0, 1, 2, 3

Substitute i = 0, 1, 2, 3 in

$$C_1 = G_0 + P_0 \cdot C_0$$

$$C_2 = G_1 + P_1 \cdot C_1 = G_1 + P_1 \cdot G_0 + P_1 \cdot P_0 \cdot C_0$$

$$C_3 = G_2 + P_2 \cdot C_2$$

$$= G_2 + P_2 \cdot G_1 + P_2 \cdot P_1 \cdot G_0 + P_2 \cdot P_1 \cdot P_0 \cdot C_0$$

$$C_4 = G_3 + P_3 \cdot C_3$$

$$= G_3 + P_3 \cdot G_2 + P_3 \cdot P_2 \cdot G_1 + P_3 \cdot P_2 \cdot P_1 \cdot G_0$$

$$A_3 \ B_3 \quad A_2 \ B_2 \quad A_1 \ B_1 \quad A_0 \ B_0 \quad C_0$$



Figure: 4 bit CLA using Full Adder (FA)

Similarly,

$$C_{i+1} = G_i + P_i \cdot G_{i-1} + P_i \cdot P_{i-1} \cdot G_{i-2} + \dots + P_i \cdot P_{i-1} \dots P_1 \cdot G_0 + P_i \cdot P_{i-1} \dots P_1 \cdot P_0 \cdot C_0$$



## Multiplication of unsigned numbers:-

\* Unsigned multiplication can be viewed as addition of shifted versions of the multiplicand.

\* Product of 2 n-bit numbers is at most a  $2^n$ -bit number.

Ex:-

$$\begin{array}{r}
 1101 \text{ (13 - multiplicand)} \\
 \times 1011 \text{ (11 - multiplier } M) \\
 \hline
 1101 \\
 1101 \\
 0000 \\
 1101 \\
 \hline
 10001111 \text{ (143 - Product)}
 \end{array}$$

## Signed multiplication:-

\* Take 2's complement if number is negative

\* padding of 5 one's if msb of first multiplication is one, otherwise zero's should be padded.

Ex:-  $(-13) \times 6^{11}$

$$\begin{array}{r}
 10011 \text{ (-13)} \\
 \times 01011 \text{ (+11)} \\
 \hline
 1111110011 \\
 1111110011 \\
 00000000 \\
 1110011 \\
 000000 \\
 \hline
 1101110001 \text{ (-143) - Product}
 \end{array}$$

## Multiplications & Divisions

### Booth Algorithm :-

\* In general, Booth Algorithm has 3 schemes

- (i) 0 to 1  $\Rightarrow$  add "-1" time Shifted 'M'
- (ii) 1 to 0  $\Rightarrow$  add "+1" time Shifted 'M'
- (iii) 0 to 0 or 1 to 1  $\Rightarrow$  add 0

Ex:-  $(-13) \times (-6)$

$$\begin{array}{r}
 01101 \text{ (+13)} \\
 \times 11010 \text{ (-6)} \\
 \hline
 \end{array}$$



$$\begin{array}{r}
 01101 \text{ (n)} \\
 +1-1+1-10 \\
 \hline
 0000000000 \\
 111110011 \\
 00001101 \\
 1110011 \\
 000000 \\
 \hline
 1110110010 \text{ (-78) - Product}
 \end{array}$$

### Booth Multiplier table:-

| multiplier | version of multiplicand |               |
|------------|-------------------------|---------------|
|            | i                       | i-1           |
| 0          | 0                       | $0 \times M$  |
| 0          | 1                       | $+1 \times M$ |
| 1          | 0                       | $-1 \times M$ |
| 1          | 1                       | $0 \times M$  |

Good multiplier :-

ordinary multiplier :-

worst case multiplier :-

01010101010101

### Division :-

#### Decimal :-

$$\begin{array}{r}
 21 \\
 13 \overline{)274} \\
 26 \\
 \hline
 14 \\
 13 \\
 \hline
 1
 \end{array}$$

$$\begin{array}{r}
 10101 \\
 1101 \quad \downarrow \\
 10000 \\
 1101 \quad \downarrow \\
 1110 \\
 1101 \\
 \hline
 1
 \end{array}$$

### Circuit Arrangement for binary division:-



### (i) Restoring Division:-

Step 1:- Shift A and Q left one binary position

Step 2:- Subtract M from A, & place result back in A

Step 3:- If the sign of A is 1, Set  $q_0$  to 0 and M back to A (restore A); otherwise Set  $q_0$  to 1

Step 4:- Repeat above steps n times.

Ex:-

8/3

$$\begin{array}{r}
 10 \\
 11 \overline{)1000} \\
 11 \\
 \hline
 10
 \end{array}$$

8:- 1000

3:- 0011

#### Binary :-

Initially 00000

00011

00001

11101

Set  $q_0$  11110

Restore 11

00001

00010

11101

Set  $q_0$  11111

Restore 11

00010

00100

11101

Set  $q_0$  00001

Shift 00010

Subtract 11101

Set  $q_0$  11111

Restore 11

00010

00010

001

11111

Set  $q_0$  11111

Restore 11

00010

00010

001

11111

Set  $q_0$  00001

Shift 11100

Add 00011

Set  $q_0$  11111

Shift 11110

Add 00011

Set  $q_0$  00001

Shift 00010

Subtract 11101

Set  $q_0$  11111

Restore 00011

Quotient 0010

1000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

000

Quotient 0010

first cycle.

Second cycle

Third cycle

Fourth cycle.

(ii) Non Restoring Division!

Step 1:- (Repeat N times)

$\rightarrow$  If sign of A is '0', Shift A & left one bit position & Subtract M from A; otherwise Shift A & left one bit position & Add M to A.

If sign of A is a  $\Rightarrow$  Set  $q_0$  to 1; otherwise '0'

Step 2:- If sign of A is '1', add M to A

Ex:- Initially 00000 1000 ] 1st cycle

Shift 00011 000 ] 2nd cycle

Subtract 00001 000 ] 3rd cycle

Set  $q_0$  00001 000 ] 4th cycle

Shift 00010 000 ] 5th cycle

Subtract 11101 000 ] 6th cycle

Set  $q_0$  00011 000 ] 7th cycle

Quotient 0010

Remainder 00010



## Floating Point Operations:-

It consists of 4 types

Addition, Sub, multiplication, Div

## Floating Point Arithmetic operation

### Floating point Addition:

Add the following two decimal numbers in scientific notation:  $8.70 \times 10^1$  with  $9.95 \times 10^1$

Step:1 Rewrite the smaller number such that its exponent matches with the exponent of larger number

$$8.70 \times 10^1 = 0.87 \times 10^2$$

Step:2 Add the mantissas

$$9.95 + 0.87 = 10.82$$

$$\text{Sum} = 10.82 \times 10^1$$

Start

Compare the exponents of two numbers shift the smaller number to the right until its exponent would match the exponent.

Add the significants

normalize the sum either shifting right and incrementing the exponent  
shifting left decrementing the exponent

Overflow or underflow

Yes

No

Round the significant to the number of bits

No

Set normalized

Done

4cs

Step:3 Put the result in normalized form  
 $10.82 \times 10^1 = 1.082 \times 10^2$  [shift mantissa adj. exponent]

Check for overflow/underflow of the exponent after normalization

Step:4 Round the result

If the mantissa does not fit in the space reserved for it, it has to be rounded off.

Eg: Only 4 digits are allowed for mantissa

$$1.082 \times 10^2 \rightarrow 1.004 \times 10^2$$

Example 1: Addition in binary:

$$\text{Perform } 0.5 + (-0.4375) \quad 0.5 = 0.1 \times 2^0$$

$$1.000 \times 2^{-1} \text{ [normalised]}$$

### Floating point addition / subtraction hardware:

Hardware:



**FLOATING POINT OPERATION**

-0.4375 + 0.0110  $\times 2^0 = -1.110 \times 2^{-2}$  [normalized]

Step:1 Subtract the following two decimal numbers in scientific notation  
 $8.70 \times 10^1$  with  $9.95 \times 10^1$

Step:1 Rewrite the smaller number such that its exponent matches with the exponent of large number

$$-1.110 \times 2^{-2} = 0.110 \times 2^{-1} = 0.001 \times 2^{-1}$$

Step:2 Add the mantissas:

$$1.000 \times 2^{-1} + 0.110 \times 2^{-1} = 0.001 \times 2^{-1}$$

Step:3 Normalise the sum, checking for overflow/underflow:  $0.001 \times 2^{-1} = 1.000 \times 2^{-4}$

$$-126 < -4 < 127 \Rightarrow \text{no overflow or underflow.}$$

Step:4 Round the sum, The sum fits in 4 bits so rounding i.e. not required

$$\text{Check: } 1.000 \times 2^{-4} = 0.0625 \text{ which is}$$

equal to  $0.5 - 0.4375$  Correct!

## Floating Point Subtraction:



Step:1 Subtract the following two decimal numbers in scientific notation  
 $8.70 \times 10^1$  with  $9.95 \times 10^1$

Step:2 Rewrite the smaller number such that its exponent matches with the exponent of larger number

$$-1.110 \times 2^{-2} = -0.1110 \times 2^{-1}$$

Step:3 Normalise the sum, checking for overflow/underflow

$$0.001 \times 2^{-1} = 1.000 \times 2^{-4}$$

Step:4 Sub the mantissas

$$9.95 - 0.087 = 9.863 \times 10^1$$

$$= 9.863 \times 10^1$$

Step:5 Round the result

$$9.863 \times 10^1$$

## Floating point multiplication

Step:1 Add exponents & subtract bias

Step:2 Multiply the significants

Step:3 Normalize the product

Step:4 Overflow if yes, raise exception

Step:5 Round the significant to appropriate # of bits

Step:6 If not still normalized go back to step 3

Step:7 Set the sign of the result

### Example-1

Multiply the following two numbers in scientific notation

$$1.110 \times 10^0 \times 9.200 \times 10^{-5}$$

## FLOATING POINT MULTIPLICATION

- Add the exponents to find new exponent =  $10 + (-5) = 5$
- If we add biased exponents, bias will be added twice. Therefore we need to subtract it once to compensate:  $(10+127) + (-5+127) = 259$   
 $259 - 127 = 132$  which is  $(5+127)$ . bias new exponent

- multiply the mantissas:

$$1.110 \times 9.200 = 10.212000$$

Can only keep three digits to the right of the decimal point so the result is  $10.212 \times 10^5$ .

- Normalise the result

$$1.0212 \times 10^6$$

- Round it  $1.021 \times 10^6$



## Example: Multiplication in binary

$$1.000 \times 2^{-1} \times -1.110 \times 2^{-2}$$

- Add the exponents

$$(-1) + (-2) = -3$$

$-3 + 127 = 124$  which is within the limit

- Multiply the mantissas

$$\begin{array}{r} 1.000 \\ \times 1.110 \\ \hline 0000 \\ 1000 \\ 1000 \\ \hline 1110000 \rightarrow 1.110000 \end{array}$$

The product is  $1.110000 \times 2^{-3}$  need to keep it to 4 bits  $1.110 \times 2^{-3}$

- Normalise (already normalised)

At this step check for overflow/underflow by making sure that  $-126 \leq \text{exponent} \leq 127$

$1 \leq \text{biased exponent} \leq 254$

- Round the result  $1.110 \times 2^{-3}$

- Adjust the sign.

Since the original signs are different,

the result will be negative

$$\rightarrow 1.110 \times 2^{-3}$$

## Floating Point Division

- Subtract the exponents and add bias

- Divide the significants

- Normalise the quotient

- If there is a overflow,

, raise exception

- Round the significant to the appropriate number of bits.

- If not still normalised go back to step 3

- Set the sign of the result.

Start

subtract the biased exponent of the two numbers adding the bias from the sum to get the new bias exponent

Divide the significants

normalize the Quotient if necessary, shifting it right & incrementing the exponent



Round the significant to the appropriate no. of bits

Still normalised?

Set the sign of the product

Done

1017

## Example:

Divide the following two numbers in scientific notation

$$1.110 \times 10^{10} \div 9.200 \times 10^{-5}$$

- Subtract the exponent to find the new exponent

$$10 - (-5) = 10 + 5 = 15$$

- Divide the significants

$$1.110 \div 9.200$$

$$= 0.120652173$$

- Normalise the result

$$0.120652173 \times 10^{15}$$

$$\Rightarrow 1.20652173 \times 10^{14}$$

- Round it

$$1.2065 \times 10^{14}$$

## Fundamental Concepts of Processing

\* A Unit which executes Machine instructions and Coordinates the activities of other Units is Called Instruction Set Processor (ISP)

### Steps Involved in Instruction Execution:

1. Fetch the Contents of Memory location Pointed to by the PC.

$$IR \leftarrow [PC]$$

2. Increment the Value of PC by 4.

$$PC \leftarrow [PC] + 4$$

3. Carryout the action according to IR.

### Single Bus Organization:



### 1. Register Transfers

- \* For each register 2 control signals are used.
- \* Ex:  $R_1 \rightarrow R_{1in}, R_{1out}$ .  
 $MAR \rightarrow MAR_{in}, MAR_{out}$ .

- \* Example for Register Transfer: 1 - open  
Move  $R_1, R_4$  0 - close

- \* Equivalent Control signals are  
 $R_{1out} = 1, R_{1in} = 1$

### 2. Performing an ALU operation



### 3. Fetching a Word from Memory:



Example: Move (R1), R2

- Control signals:
1. R1out, MARin, Read
  2. MDRinE, WMFC
  3. MDRout, R2in

### Multiple Bus Organization

18

\* One bus — one data transfer.

\* Multiple bus — multiple data transfer at same time.

- \* Register File — All general-purpose registers are combined into a single block.



### Execution of a Complete Instruction:

Example: Add (R3), R1

Steps involved:

1. Fetch the instruction
2. Fetch the first operand
3. Perform the addition
4. Load the result into R1.

### Control Sequence for instruction Add (R3), R1

| Step | Action                                |
|------|---------------------------------------|
| 1.   | Pcout, MARin, Read, SelectA, Add, Zin |
| 2.   | Zout, Pcin, Yin, WMFC                 |
| 3.   | MDRout, IRin                          |
| 4.   | R3out, MARin, Read,                   |
| 5.   | R1out, Yin, WMFC                      |
| 6.   | MDRout, SelectY, Add, Zin             |
| 7.   | Zout, R1in, End.                      |

### Control Sequence for Instruction - Add R4, R5, R6

| Step | Action                                   |
|------|------------------------------------------|
| 1    | Pcout, R=B, MARin, Read, Inc Pc          |
| 2    | WMFC                                     |
| 3    | MDRoutB, R=B, IRin                       |
| 4    | R4outA, R5outB, selectA, Add, R6in, End. |

## PROCESSOR ARCHITECTURE

- \* It has instruction unit which fetches instructions from main memory.
- \* Has separate processing units to deal with integer data & floating-point data.
- \* Data Cache & instruction Cache is inserted between these units & Main Memory.
- \* Processor is connected to the system bus.
- \* Can include several units to increase the potential for concurrent operations.



- \* Most of the Modern Processors use separate caches for instruction and data.
- \* Bus interface used to connect the processor with external environment.

## Basic Concepts of Pipelining:

- \* One way to improve performance is to use faster circuit technology to build processor.
- \* Another way - to change the hardware so that more than one operation can be performed.
- \* Pipelining - Effective way of organizing concurrent activity in a computer system.

### 2-Stage Pipelining:



### (a) Sequential Execution



### (b) Hardware Organization



### (c) Pipelined Execution

## 4-Stage Pipelining



### (a) Hardware Organization



### (b) 4-Stage Pipelined Execution

## Pipeline Performance:

- \* Pipeline Performance is directly proportional to the number of stages.

## Hazard:

- \* For a variety of reasons, one of the pipeline stages may not be able to complete its processing tasks in the time allotted.

## Types of Hazards:

1. Data Hazard
2. Instruction Hazard
3. Structural Hazard.

## Structural Hazard:

- \* It will occur when two instructions require the use of a given hardware resource at the same time.

### Example - 1. Access to Memory

2. Printing multiple documents using a shared printer.

## Data Hazard:

- \* It is a situation in which the pipeline is stalled because the data to be operated on are delayed for some reason.

### Example : 1

$$A \leftarrow 3 + A$$

$$B \leftarrow 4 \times A$$

### Example : 2

$$A \leftarrow 5 \times C$$

$$B \leftarrow 20 + C$$

### Example : 3

$$\text{Mul } R_2, R_3, R_4$$

$$\text{Add } R_5, R_6, R_7$$

## Operand Forwarding Technique

- \* Delay can be reduced, or possibly eliminated if we arrange for the result of one instruction to be forwarded directly for use in step E2 of 2nd instruction.



## Handling Hazards using Software

- \* Compiler automatically introduces NOP (No operation) when it detects a dependency between instructions.

## Side Effects:

- \* When a location other than one explicitly named in an instruction as a destination operand is affected, the instruction is said to have a side effect.

## INSTRUCTION HAZARDS

- \* Occurs if instruction stream is interrupted.
- \* Branch instruction may also cause the pipeline to stall.

### Unconditional Branches:

- \* Time lost as a result of branch instruction is known as branch Penalty.
- \* Reducing the branch Penalty requires the branch address to be computed earlier.
- \* IF Unit has dedicated hardware to identify branch instruction & compute target address.



Branch address computed in Execute stage

### Instruction Queue and Prefetching Technique:

- \* Instruction queue can store several instructions.
- \* Dispatch unit takes instructions from the front of the queue and sends them to the execution unit.
- \* IF unit keeps the instruction queue filled at all times to reduce delay.



\* Branch Folding: IF will execute the branch instruction concurrently with the execution of other instructions.

\* Instruction Queue helps in dealing with Cache misses.

### Conditional Branches & Prediction:

\* Introduces added hazard caused by the branch condition & evaluation.

\* Branch instructions occur frequently (20%), which increases branch Penalty.

### Delayed Branching Technique:

\* Location following a branch instruction is called a branch delay slot.

\* Delayed branching can minimize the penalty occurs due to conditional branches.

\* Instructions in the delay slots are always fetched. Allow them to be fully executed whether or not the branch is taken.

|      |            |                                 |
|------|------------|---------------------------------|
| Loop | shift-left | R <sub>1</sub>                  |
|      | Decrement  | R <sub>2</sub>                  |
|      | Branch = 0 | Loop                            |
| Next | Add        | R <sub>1</sub> , R <sub>3</sub> |

(a) Original Program Loop

### INSTRUCTION HAZARDS (b) Reordered Instructions

|      |            |                                 |
|------|------------|---------------------------------|
| Loop | Decrement  | R <sub>2</sub>                  |
|      | Branch = 0 | Loop                            |
|      | Shift-left | R <sub>1</sub>                  |
| Next | Add        | R <sub>1</sub> , R <sub>3</sub> |

|             |   |   |   |   |   |   |   |
|-------------|---|---|---|---|---|---|---|
| clock cycle | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Instruction | F | E |   |   |   |   |   |

→ Time



\* Effectiveness of delayed branching depends on how instructions are reordered.

### 2. Branch Prediction Technique:

\* Predict whether or not a particular branch will be taken.

\* Simplest form of branch prediction is to assume that branch will not take place & continue to fetch instructions in order.

\* Speculative Execution - Instructions are executed before the processor is certain that they are in correct execution sequence.

### A. Static Branch Prediction:

\* Branch prediction decision is always the same every time a given instruction is executed.

### B. Dynamic Branch Prediction: 20

\* Prediction decision may change depending on execution history.

\* Objective: To reduce the probability of making a wrong decision.

### B1 - 2 State Algorithm

LT - Branch is likely to be taken

LNT - Branch is likely not to be taken.  
Branch taken (BT)



### B2 - 4 State Algorithm

ST - Strongly likely to be taken

LT - Likely to be taken

LNT - Likely not to be taken

SNT - Strongly likely not to be taken.



### Influence on Instruction Sets:

\* Two key aspects of machine instructions.

#### 1. Addressing Modes:

\* Must consider the effect of addressing modes on instruction flow in the pipeline.

#### 2. Condition Codes:

\* Either set or cleared by many instructions.  
\* Provides flexibility in reordering instructions.  
\* Effectively utilized by compilers.

## SUPERSCALAR PROCESSING

- \* Equip the Processor with multiple Processing Units to execute in parallel.
- \* Multiple Issue: Several Instructions Start Execution in the same clock cycle.
- \* Superscalar Processor: Instruction execution throughput more than one instruction per cycle.
- \* Many modern high-performance processors use this approach.
- \* Instruction queue should be filled always.
- \* Processor should be able to fetch more than one instruction at a time.



A Processor with Two execution Units

## SUPERSCALAR PROCESSING

- Instruction Fetch Unit - Capable of reading 2 instructions & store in IQ.
  - Dispatch Unit - Retrieves and decodes upto 2 instructions from the front of the Queue.
  - \* No Hazards if more processing units are there.
- |                       |                |                |                 |                 |                 |                |   |      |
|-----------------------|----------------|----------------|-----------------|-----------------|-----------------|----------------|---|------|
| clock cycle           | 1              | 2              | 3               | 4               | 5               | 6              | 7 | Time |
| I <sub>1</sub> (Fadd) | F <sub>1</sub> | D <sub>1</sub> | E <sub>1A</sub> | E <sub>1B</sub> | E <sub>1C</sub> | W <sub>1</sub> |   |      |
| I <sub>2</sub> (Add)  | F <sub>2</sub> | D <sub>2</sub> | E <sub>2</sub>  | W <sub>2</sub>  |                 |                |   |      |
| I <sub>3</sub> (Fsub) | F <sub>1</sub> | D <sub>1</sub> | E <sub>3A</sub> | E <sub>3B</sub> | E <sub>3C</sub> | W <sub>3</sub> |   |      |
| I <sub>4</sub> (Sub)  | F <sub>4</sub> | D <sub>4</sub> | E <sub>4</sub>  | W <sub>4</sub>  |                 |                |   |      |

- \* Instruction Execution Flow in the Processor
- \* Compiler can avoid many Hazards through judicious selection & ordering of instructions.

## PROCESSING

### out-of-order Execution:

- \* Instruction execution is done out of order.
- \* Problems / Side effects?
  - \* Independent instructions - No Problem.
  - \* Dependent instructions - Creates Hazards.

\* Exceptions will arise

clock cycle 1 2 3 4 5 6 7 Time

I<sub>1</sub> (Fadd) F<sub>1</sub> D<sub>1</sub> E<sub>1A</sub> E<sub>1B</sub> E<sub>1C</sub> W<sub>1</sub>

I<sub>2</sub> (Add) F<sub>2</sub> D<sub>2</sub> E<sub>2</sub> W<sub>2</sub>

I<sub>3</sub> (Fsub) F<sub>3</sub> D<sub>3</sub> E<sub>3A</sub> E<sub>3B</sub> E<sub>3C</sub> W<sub>3</sub>

I<sub>4</sub> (Sub) F<sub>4</sub> D<sub>4</sub> E<sub>4</sub> W<sub>4</sub>

(a) Delayed Write

clock cycle 1 2 3 4 5 6 7 Time

I<sub>1</sub> (Fadd) F<sub>1</sub> D<sub>1</sub> E<sub>1A</sub> E<sub>1B</sub> E<sub>1C</sub> W<sub>1</sub>

I<sub>2</sub> (Add) F<sub>2</sub> D<sub>2</sub> E<sub>2</sub> T<sub>W2</sub> W<sub>2</sub>

I<sub>3</sub> (Fsub) F<sub>3</sub> D<sub>3</sub> E<sub>3A</sub> E<sub>3B</sub> E<sub>3C</sub> W<sub>3</sub>

I<sub>4</sub> (Sub) F<sub>4</sub> D<sub>4</sub> E<sub>4</sub> T<sub>W4</sub> W<sub>4</sub>

(b) Using Temporary Registers

- \* To guarantee consistency - Results of instructions must be written in program order.

- \* By using Delayed write technique (or) by using temporary registers, we can execute out-of-order & update results in-order.

## Execution Completion:

- \* Instructions are executed out-of-order to make effective use of processing units.
- \* But instructions must be completed in program order to allow precise exception.
- \* Commitment Step - Usage of Tw.

### Commitment Unit:

- \* Special control unit to guarantee in-order commitment.

### Reorder Buffer:

- \* It is a queue to determine which instructions should be committed next.

### Dispatch Operations:

- \* Will ensure whether all the resources needed for the execution of an instruction are available.

- \* Should instructions be dispatched out-of-order?

- \* If allowed - deadlock will arise.

- \* Passing instructions out-of-order → increases the complexity of dispatch unit.

- \* Most processors use only in-order-dispatching

- \* Conclusion - Superscalar processing increases the throughput of the execution of instructions which in turn increases the speed of the computer.

## RANDOM ACCESS MEMORY:

- \* Accessed for a Read or Write operation.
- \* Consists of memory cells.
- \* Semi-Conductor Materials (VLSI)
- \* More - reliable.
- \* Works faster.
- \* More Cost.



## Internal Organization of Memory chips:

- \* Memory cells organized as arrays.
- \* Each cell stores one-bit information.
- \* Row - Word Line
- \* Column - bit line
- \* Sense/write circuit - Sense /read data
- \* Control signals - R/W, CS.

## Types Of RAM

### 1. Static RAM (SRAM)

- \* Retains their state as long as power is applied.

\* Two inverters are cross-connected - Latch.



- \* Read operations.
- \* Write operations.

### Advantages of SRAM:

- \* Consumes very low power.
- \* Refreshing Not required.
- \* Can be accessed very quickly.
- \* 8-16 times faster and costlier than DRAM.

### 2. Dynamic RAM (DRAM)

- \* Do not retain their state indefinitely.
- \* Less Expensive - Simpler cells are used.
- \* Information stored - charge on capacitor.
- \* Periodic refreshing is required.
- \* Problem - Capacitor charge Leakage.
- \* Read operation.
- \* Write operation.



## Types of DRAMS:

### 1. Asynchronous DRAM:

- \* Operations not Synchronized with clock.
- \* Special memory controller - RAS, CAS.
- \* Need refresh circuit.
- \* High density and low cost.

### 2. Synchronous DRAM:

- \* operations directly synchronized with clock.
- \* Has built-in refresh circuit.
- \* Different modes of operation.



## READ - ONLY MEMORY (ROM)

- \* Retain information even it is switched off.
- \* Special writing process is needed.
- \* Used to "boot" the system.



## Types of ROM

### 1. ROM:

- \* WORM device - Write Once Read Many.
- \* Data is Written by Manufacturer.
- \* Not reversible - break the fuse once.

### 2. PROM:

- \* Data can be loaded by the User.
- \* Initially empty - All cells contains '0'
- \* Break the fuse as per requirement
- \* Requires high - Current pulse

### 3. EPROM:

- \* Stored data can be erased.
- \* New data can be loaded/written.
- \* For erasing - expose to UV light.
- \* Physically removed from the circuit.

### 4. EEPROM:

- \* Electrically Erasable
- \* Need different voltages for erasing, writing and reading stored data.

- \* No need to remove physically.

- \* Selective contents can be erased

- \* More flexible and costly.

# CACHE MEMORY

## CACHE MEMORY :-

- Speed of processors are high when compared with speed of main memory.
- The processor speed is the good performance parameter, so reduce waiting time by main memory.
- Cache memory is the efficient solution.

## EFFECTIVENESS OF CACHE MECHANISM

### \* Locality of reference (LOR)

Property of computer programs.  
Many instructions in localized areas of program executed repeatedly during time period.

Remainder of the program accessed infrequently.

### \* Types of LOR

1. Temporal LOR
2. Spatial LOR.

+ Temporal - Recently executed instruction is likely to be executed again soon.

+ Spatial - Instructions in close proximity to recently executed instruction will be executed soon.

\* By placing active segments of program in fast cache memory, then total execution time is reduced.

## OPERATION OF A CACHE MEMORY

- \* The memory control circuit is designed to use LOR property.
- \* Temporal LOR - When instruction/data needed, bring from Cache + Hopefully remain.
- \* Spatial LOR - Fetching several items reside at adjacent addresses (Blocks of data).
- \* Cache block / Cache line.



- Read request made by processor
- Contents of a memory block with a specified location transferred to cache one word at a time.
- Program references this block's location, then it is read from cache.
- Cache memory stores less number of blocks.
- Correspondence between main memory blocks and cache - mapping function.
- Cache is full, memory word not in cache referenced, cache control hardware decide which block to be removed from cache (replacement algorithm) → Least Recently Used (LRU).

- Processor issues Read/write requests → If it is available in cache read hit / write hit.

## READ / WRITE IN CACHE

- \* Main Memory not involved in Read.
- \* Two methods in Write

### ① Write-Through Protocol

- Cache and Main Memory updated simultaneously.

### ② Write-back / copy-back protocol:

- update only in cache
- Mark as updated using dirty or modified bit.
- Later updated in Main memory

## READ MISS / WRITE MISSES

- \* request word not in cache (read)

### (Read Miss)

- \* Block of words in requested word copied to cache.

- ① Particular word sent to processor.

- ② Particular word sent to processor immediately read from Main memory (Load through / Early restart)

- \* If requested word not in cache during write - Write Miss
- \* Writeback / Write-Through is used

## MAPPING FUNCTIONS

### ① Direct Mapping:-



- \* Consider cache has 128 blocks
- main memory 4096 blocks
- cache memory value = block  $j \% 128$

- \* 0, 128, 256 - block 0 of cache
- \* 1, 129, 257 - block 1 of cache

### ② Associative Mapping:-



- \* Tag bits
- \* cost is higher than direct mapped
- \* Associative Search method used.

### ③ Set-Associative Mapping:-



- \* Combination of Direct + Associative
- \* Simple method
- \* Valid bit.

## PERFORMANCE MEASURES

24

### Performance Measures:

- \* memory performance captured by two parameters, latency and bandwidth.
- \* latency is the time from the issue of a memory request to the time the data is available at the processor.
- \* Bandwidth is the rate at which data can be pumped to the processor by the memory system.



### Magnetic Surface Recording

**Magnetic Hard disk [S.M]** used to store various programs & files  
 • Data can be continuously read & written

**DVD RAM**

**DVD**  
 Digital video Disc store high capacity data.  
**CDRW**

• Read/ write speed is slower compared to hard disk  
**CDR**  
 • Metal disc into a plastic protective housing

**Floppy disk:** Magnetic storage medium for computer systems.  
 Composed of a thin, flexible magnetic disk sealed in a square plastic carrier

**RAID Level:** Redundant array of independent disks used to protect data in the case of a drive failure RAID 1



**Optical Disks:** A storage medium from which data is read and to which it is written by lasers.

### MULTI LEVEL MEMORY (or) MEMORY HIERARCHY



### MEMORY Hierarchy Design

- \* Memory with different speeds & sizes
- \* Faster memories are expensive
- \* Distances from Processor Increases
- \* Size and access time also increases

### Memory Allocation

A simple way to allocate memory

**First fit:** The process is allocated the first available memory space.

**Best fit:** The process is allocated the smallest memory block that is large enough for its purposes.

**Worst fit:** The process is allocated the largest memory block that is available.

#### • Replacement policy

FIFO : first in, first out

LRU : least recently used.

### External fragmentation

\* Dynamic memory allocation to allocate some memory

\* leave a small amount of unusable memory

\* memory available is reduced substantially

### Internal fragmentation

• memory is distributed into fixed sized blocks



### Internal fragmentation

#### Types:

##### Non pre-emptive allocation

\* memory allocation is necessary to find or create an available region.

Two algorithms are first fit & Best fit



b) Allocation of K4 & K5 by First fit



c) Allocation of K4 & K5 by bestfit



### Pre-emptive allocation

\* Re-allocation of the blocks by Compaction

\* memory are compressed into a single contiguous group

\* creates an available region of maximum size



b) memory allocation before compaction

b) memory allocation after compaction

# VIRTUAL MEMORY

25

## VIRTUAL MEMORIES :-

- \* Main Memory is small compared to processor.
- \* When program does not completely fit into main memory, not executed parts stored in secondary storage devices.
- \* When new segment of program coming inside main memory, need to replace existing content.
- \* Technique to move program & data into physical main memory is virtual Memory.
- \* Binary addresses generated by processor is virtual/logical address.



\* If virtual address refers to program, currently available in physical memory, accessed immediately or it should be brought into main memory.



- \* Logical address is converted to Physical address by combination of software & hardware.
- \* While accessing main memory, if data available, it is hit.
- \* A special hardware Memory Management Unit (MMU) translates virtual to Physical address.
- \* If data available, MMU tell OS to bring data into memory from disk. This is performed by Direct Memory Access (DMA).

## ADDRESS TRANSLATION:-

- \* All programs and data are composed of fixed length units → pages (Blocks of words) placed continuously in main memory.
- \* Pages should not be too small or too large.
- \* cache bridges gap of speed between the processor and main memory by hardware.
- \* Virtual Memory bridges the speed gap between main memory & secondary memory by Software.
- \* Virtual address generated consists of virtual Page Number, Offset.
- \* Information about main memory location of each page is in Page Table.
- \* Page Table information are used by MMU. So it should be in MMU only. MMU is implemented as processor chip along with primary cache so it is kept in main memory.



## CONTROL BITS:

- \* Describes the status of the page in main memory.

- \* Describes validity of a page.

## MODIFICATION BIT:-

- \* Convey whether the page is modified or not.

## RESTRICTION BITS:

- \* Describes the access permission whether full read / write.

- \* copy of small version of page table placed in MMU.

- \* This access the most recently accessed pages.

- \* It is implemented by a small cache called Translation Look Aside buffer (TLB)



## ADDRESS TRANSLATION PROCESS :

- ① Given the virtual address, MMU looks TLB for referenced page. If it is available, physical address got immediately.
- ② If miss in TLB, get content from page table + update TLB.
- ③ When program request a page not in main memory, page fault occurs, whole page has to be brought from disk to main memory.
- ④ MMU asks OS to raise exception, OS copies requested page and place in main memory.
- ⑤ To bring new page, if main memory is full, one of the resident page has to be replaced by LRU (Least Recently Used).
- ⑥ Modified pages has to be updated before removing it from main memory by write-through protocol.

Accessing I/O devices.

- Connect I/O devices
- Three sets of lines
  - a) Camy address,
  - b) Data,
  - c) Control signals
- Requested read or a write operation
- Requested data are transferred over the data lines.

I/O Interface.

Required to coordinate I/O transfers.



Two methods are used to address the device:

1. Memory-mapped I/O
2. I/O Mapped I/O

Memory Mapped I/O.

- Portions of address space are assigned to I/O devices -- reads and writes to those addresses are interpreted as commands to the I/O devices.
- I/O devices and the memory share the same address space, - any machine instruction that can access memory.
- To transfer data to or from an I/O device.
- Memory load and store instructions are used
- Transfer data words to or from I/O ports.

I/O Mapped I/O.Programmed I/O

- Processor repeatedly checks a status flag
- To achieve the required synchronization.
- Between the processor and an I/O device.
- Dedicated instruction is used
- To give a command to an I/O device
- Specifies both the device number and the command word.

Interrupts

- Signal generated by an external device requiring some urgent action to be taken by the CPU.
- Enabling and Disabling Interrupts
- Handling Multiple Devices
- Vectored Interrupt
- PCI Interrupt
- Pipeline Interrupt
- Interrupt Nesting
- Interrupt Service Routine
- Controlling Device Request
- Use of Interrupt in OS.

Enabling & Disabling Interrupts

- Programmer complete control over the events.

Handling Multiple Devices.

- Capable of initiating interrupts are connected to the processor.

Polling Scheme.

- Processor poll all the I/O devices
- Periodically checking status bits.

Vectored Interrupt.

- To reduce the time involved in the polling process
- Identify itself directly to the processor.

PCI Interrupt.

- Four interrupt pins defined for PCI
- PCI devices uses the PCI-Bus.

Pipeline Interrupt.

- Interrupt pipelining is a lightweight approach.
- High priority execution stage
- Running out-of-band interrupt handlers

Interrupt Nesting.

- Long delay in responding
- Interrupt request may lead to error in operation of Computer.

Interrupt Service Routine.

- Multiple-level priority organization.
- Depending upon the device's priority.
- Assign a priority level to the processor.

Controlling Device Request.

- Interrupt - enable a bit
- Control register determines.
- Allowed to generate an interrupt request
- Interrupt enable bit in the PS register.

Use of Interrupts in Operating System.

- Coordinating all activities within a computer.
- Execution of user programs.
- Implementing security and protection features.

Exception.

- Event causes the execution of one program to be suspended.

Recovery from Errors.

- Faults, Traps and Aborts etc.
- Computers include error checking code.
- Processor by raising an interrupt.

Debugging.

- Trace and Breakpoints etc.
- System software includes a program called Debugger.

Privilege Exception.

- To protect the operating system.
- Being corrupted by user programs.
- Processor is in the supervisor mode.
- Instruction will produce a privilege exception.

## Direct Memory Addressing

- \* Instruction to transfer I/O devices ready
- \* Transfer a block of data
- \* Without continuous intervention by the processor.

## DMA Controller

- \* Processor when accessing the main memory, keeps track of num of transfers

## Diagram



## DMA Registers

Address Register: To specify the desired location in memory  
CPU writes data from MDR to mem location

## Modes of DMA Transfer

### Single Transfer mode



### Block Transfer Mode



### Demand Transfer mode



Adv : very fast data transfer

DisAdv : CPU not idle for long

### Burst mode

- \* exclusive access to the main memory.
- \* Transfer a block of data without interruption

### Cycle Stealing mode

- \* Interweaving Technique Called cycle stealing

### Interleaving mode

- \* Design Compensates slow speed of DRAM

- \* Technique divides memory into a number of modules.

### Transparent mode

- \* Transfer of block data
- \* Most efficient mode
- \* DMA Controller Transfers data
- CPU is performing operations



### Distributed Arbitration

- \* All devices waiting to use the bus have equal responsibility
- \* Carrying out the arbitration process without using a central arbiter

- \* All devices participate in the selection of the next bus.

- \* Each device on the bus is assigned a 4-bit identification number



# CPU PERFORMANCE

i) If a computer A runs a program in 10 sec and computer B runs same pgm in 2sec. Who is faster and how much.

$$\frac{\text{Performance A}}{\text{Performance B}} = \frac{\text{Execution time B}}{\text{Execution time A}} = \frac{2}{10} = 0.2$$

Performance of computer A is 0.5 times greater than computer B

A is 2.5 times faster than B

ii) consider 3 processors P<sub>1</sub>, P<sub>2</sub>, P<sub>3</sub> executing same instruction set

| Processor      | Clockrate | CPI |
|----------------|-----------|-----|
| P <sub>1</sub> | 3GHz      | 1.5 |
| P <sub>2</sub> | 2.5GHz    | 1   |
| P <sub>3</sub> | 4GHz      | 2.2 |

(i) Which processor has highest performance?

(ii) All the processor executes a program in 10sec. Find the no. of cycles required and no. of instruction required for same program.

$$\text{SOL CPU Time} = \frac{N \times CPI}{R}$$

$$P_1 = \text{CPU Time} = \frac{N \times 1.5}{3\text{GHz}} = 500 \text{ picosec} \times N$$

$$P_2 = \text{CPU Time} = \frac{N \times 1}{2.5\text{GHz}} = 400 \text{ picosec} \times N$$

$$P_3 = \text{CPU Time} = \frac{N \times 2.2}{4\text{GHz}} = 550 \text{ picosec} \times N$$

∴ Processor P<sub>2</sub> has highest performance since it has lowest CPU execution time 400ps

## Addressing modes:

The memory location 1000, 1001 and 1020 have data value 18, 1 and 16 respectively before program is executed.

|       |              |                  |
|-------|--------------|------------------|
| MOV I | RS, 1        | move immediate   |
| Load  | Rd, 1000(RS) | load from memory |
| ADD I | Rd, 1000     | Add immediate    |
| Store | 0(Rd, 20)    | store immediate  |

Find memory location and its value

Before execution memory is

|    |      |
|----|------|
| 18 | 1000 |
| 1  | 1001 |
| 16 | 1020 |

## Lets execute:

i) MOV I, RS, 1

(\*) Immediate mode

(\*) RS ← 1

(\*) value = 1, moved to RS

## Load Rd, 1000 (RS)

\* User displacement addressing

\* Rd ← 1000 + [RS]

$$\begin{aligned} * \text{ value} &= [1000 + [RS]] = [1000 + 1] \\ &= [1001] = 1 \end{aligned}$$

\* ∴ value = 1, moved to Rd

## ADD I Rd, 1000

(\*) Immediate Addressing

(\*) Rd ← [Rd] + 1000

$$\begin{aligned} * \text{ value} &= [Rd] + 1000 = 1 + 1000 \\ &= 1001 \end{aligned}$$

(\*) ∴ value = 1001 moved to Rd

## STORE I 0(Rd, 20)

(\*) Displacement Addressing

(\*) 0 + [Rd] ← 20

$$\begin{aligned} * \text{ value of destination address} \\ &= 0 + [Rd] = 0 + 1001 = 1001 \end{aligned}$$

(\*) value = 20 moved to 1001

∴ program execution is completed  
memory location is 1001 has value 20

## Instruction format

$$x = (A+B) * (C+D)$$

## ZERO ADDRESS FORMAT

$$\text{PUSH A TOP} = A$$

$$\text{PUSH B TOP} = B$$

$$\text{ADD TOP} = A + B$$

$$\text{PUSH C TOP} = C$$

$$\text{PUSH D TOP} = D$$

$$\text{ADD TOP} = C + D$$

$$\text{MVE TOP} = [C+D] * [A+B]$$

$$\text{POP X M}[X] = TOP$$

## ONE ADDRESS FORMAT

$$\text{LOAD A AC} = M[A]$$

$$\text{ADD B AC} = AC + M[A]$$

$$\text{STORE T M}[T] = AC$$

$$\text{LOAD C AC} = M[C]$$

$$\text{ADD D AC} = AC + M[D]$$

$$\text{MVL T AC} = AC * M[T]$$

$$\text{STORE X M}[X] = AC$$

## TWO ADDRESS FORMAT

$$\text{MOV R}_1, A \ R_1 = M[A]$$

$$\text{ADD R}_1, B \ R_1 = R_1 + M[B]$$

$$\text{MOV R}_2, C \ R_2 = C$$

$$\text{ADD R}_2, D \ R_2 = R_2 + R_2 + D$$

$$\text{MVL R}_1, R_2 \ R_1 = R_1 * R_2$$

$$\text{MOV X, R}_1 \Rightarrow M[X] = R_1$$

## THREE ADDRESS FORMAT

$$\text{ADD R}_1, A, B \ R_1 = M[A] + M[B]$$

$$\text{ADD R}_2, C, D \ R_2 = M[C] + M[D]$$

$$\text{MUL X, R}_1, R_2 \ M[X] = R_1 * R_2$$

## NUMBER SYSTEM

1) Convert number  $5062_{10}$  to Binary System.

$$\begin{array}{r} 5062 \\ \hline 2 | 2531 \quad -0 \\ 2 | 1265 \quad -1 \\ 2 | 632 \quad -1 \\ 2 | 316 \quad -0 \\ 2 | 158 \quad -0 \\ 2 | 79 \quad -0 \\ 2 | 39 \quad -1 \\ 2 | 19 \quad -1 \\ 2 | 9 \quad -1 \\ 2 | 4 \quad -1 \\ 2 | 2 \quad -0 \\ 1 \quad -0 \end{array}$$

**ANS =  $(1001111000110)_2$**

2) Convert  $159_{10}$  to Octal Number

159 to octal

$$\begin{array}{r} 159 \\ \hline 8 | 19 \quad -7 \\ 8 | 2 \quad -3 \end{array}$$

**ANS =  $(237)_8$**

3) Convert  $380_{10}$  to Hexadecimal  
 $380_{10}$  to Hexadecimal

$$\begin{array}{r} 380 \\ \hline 16 | 23 \quad -12 \\ 16 | 1 \quad -7 \end{array}$$

**(ANS =  $17C$ )\_{16}**

4) Convert  $11001011_2$  to decimal number system.

$$\begin{array}{r} 11001011 \\ \hline 1 \times 2^0 = 1 \\ 1 \times 2^1 = 2 \\ 0 \times 2^2 = 0 \\ 1 \times 2^3 = 8 \\ 0 \times 2^4 = 0 \\ 0 \times 2^5 = 0 \\ 1 \times 2^6 = 64 \\ 1 \times 2^7 = 128 \\ \hline 203 \end{array}$$

**ANS =  $(11001011)_2 = (203)_{10}$**

5) Convert octal Number  $714_8$  to decimal Number

$$\begin{array}{r} 7 \ 1 \ 4 \\ \hline 4 \times 8^0 = 4 \\ 1 \times 8^1 = 8 \\ 7 \times 8^2 = 448 \\ \hline 460 \end{array}$$

**ANS =  $(714)_8 = (460)_{10}$**

6) Convert Hexadecimal Number  $2C4$  to decimal number system.

$$\begin{array}{r} 2 \ C \ 4 \\ \hline 4 \times 16^0 = 4 \\ C \times 16^1 = 192 \\ 2 \times 16^2 = 512 \\ \hline 708 \end{array}$$

**ANS =  $2C4_{16} = 708_{10}$**

7) Convert  $1001_2$  to octal Number System.

$$(1001)_2 = (?)_8$$

Step 1: Convert Binary to decimal.

$$\begin{array}{r} 1 \ 0 \ 0 \ 1 \\ \hline 1 \times 2^0 = 1 \\ 0 \times 2^1 = 0 \\ 0 \times 2^2 = 0 \\ 1 \times 2^3 = 8 \\ \hline (9)_{10} \end{array}$$

Step 2: Convert decimal to Octal Number.

$$\begin{array}{r} 8 | 9 \\ \hline 1 \quad -1 \uparrow \end{array}$$

**ANS =  $(11)_8$**

8) Convert  $(672)_8$  to Hexadecimal Number

Step 1: Octal to decimal.

$$\begin{array}{r} 6 \ 7 \ 2 \\ \hline 2 \times 8^0 = 2 \\ 7 \times 8^1 = 56 \\ 6 \times 8^2 = 384 \\ \hline 442 \end{array}$$

Step 2: Convert Decimal to Hexadecimal.

$$\begin{array}{r} 442 \\ \hline 16 | 27 \rightarrow 10(A) \\ 16 | 1 \quad -11(B) \end{array}$$

**ANS =  $(1BA)_{16}$**

9) Convert  $(2020.65625)_{10}$  to ~~Octal~~ Binary.

$$\begin{array}{r} 2 | 2020 \\ 2 | 1010 \quad -0 \\ 2 | 505 \quad -0 \\ 2 | 252 \quad -1 \\ 2 | 126 \quad -0 \\ 2 | 63 \quad -0 \\ 2 | 31 \quad -1 \\ 2 | 15 \quad -1 \\ 2 | 7 \quad -1 \\ 2 | 3 \quad -1 \\ 1 \quad -1 \end{array}$$

$= (1111100100)$

Take decimal part.

$$\begin{aligned} & 65625 \times 2 = 13125 \\ & 3125 \times 2 = 625 \\ & 625 \times 2 = 125 \\ & 125 \times 2 = 25 \\ & 25 \times 2 = 5 \\ & 5 \times 2 = 0 \end{aligned}$$

$\therefore 0.65625 = 0.10101$

$\therefore 2020.65625 = 1111100100.10101$

## Floating Point Operations:

1. Represent  $1259_{10}$ ,  $125_{10}$  in single precision and double precision formats.

$$(i) \text{ Integer} = 1259_{10}$$

$$\text{Fraction} = 0.125_{10}$$

|   |      |
|---|------|
| 2 | 1259 |
| 2 | 629  |
| 2 | 314  |
| 2 | 157  |
| 2 | 78   |
| 2 | 39   |
| 2 | 19   |
| 2 | 9    |
| 2 | 4    |
| 2 | 2    |
| 1 | 0    |

$$0.125 \times 2 = 0.25 \Rightarrow 0$$

$$0.25 \times 2 = 0.50 \Rightarrow 0$$

$$0.50 \times 2 = 1.00 \Rightarrow 1$$

$$1259_{10} = 10011101011_2$$

$$0.125_{10} = 0.001_2$$

$$\text{Binary} = 10011101011.001$$

(ii) Normalize the number

$$10011101011.001 = 1.0011101011001 \times 2^{10}$$

(iii) Single Precision

$$S=0 \quad E=10 \quad M=0011101011001$$

$$\text{Bias} = 127$$

$$E' = E + 127 = 10 + 127 = 137_{10}$$

$$= 10001001_2$$

Number in single precision format

|      |          |          |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |  |
|------|----------|----------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---|--|
| 31   | 30       | 29       | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |   |  |
| 0    | 1        | 0        | 0  | 0  | 1  | 0  | 0  | 1  | 0  | 0  | 1  | 0  | 0  | 1  | 0  | 0  | 1  | 0  | 0  | 1  | 0  | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |  |
| Sign | Exponent | Mantissa |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |  |

(iv) Double Precision

$$S=0, E=10, M=0011101011001$$

$$\text{Bias} = 1023$$

$$E' = E + 1023 = 10 + 1023 = 1033_{10}$$

$$= 10000001001_2$$



### Booth ALGORITHM

① Multiply  $(+11) \times (-13)$  by binary multiplication method:

$(-13) \Rightarrow$  convert to binary  $1101$   
for -ve number find 2's complement

$$1101 \Rightarrow \begin{array}{r} 0010 \\ 0011 \end{array}$$

$$\begin{array}{r} 1 & 0 & 0 & 1 & 1 & (-13) \\ \times & 1 & 0 & 1 & 1 & (+11) \\ \hline & \text{sign bit} \end{array} \Rightarrow Q$$

$$\begin{array}{r} 1 & 0 & 0 & 1 & 1 & (-13) \\ \times & 0 & 1 & 0 & 1 & (+11) \\ \hline 1 & 1 & 1 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \end{array}$$

2's complement of answer

$$0010001110+$$

$$\begin{array}{r} 0010001110 \\ 0010001110 \end{array}$$

$$= 128 + 8 + 4 + 2 + 1 = [-143]$$

② Multiply  $(+13) \times (-6)$  using Booth algorithm

$$M=+13 (01101)$$

$$Q=-6$$

$6 \Rightarrow 0110 \Rightarrow$  2's complement of  $Q$

$$\begin{array}{r} 1001+ \\ 1 \\ \hline 1010 \end{array}$$

$$Q = 11010 \boxed{0} \text{ append } 0 \text{ at end.}$$

### Recoding Table:

| Bit i | Bit $i-1$ | version of multiplicand selected by bit i |      |      |     |
|-------|-----------|-------------------------------------------|------|------|-----|
|       |           | 0xM                                       | +1xM | -1xM | 0xM |
| 0     | 0         |                                           |      |      |     |
| 0     | 1         |                                           |      |      |     |
| 1     | 0         |                                           |      |      |     |
| 1     | 1         |                                           |      |      |     |

$$Q = 11010 \boxed{0}$$

$$\begin{array}{r} 01101 \times \\ 0-1+1-10 \\ \hline 00000 \end{array}$$

$$\begin{array}{r} 00000 \end{array}$$

$$\begin{array}{r} 1111100011 \\ 0000110011 \\ 0000000000 \\ 1110110010 \end{array}$$

Take 2's complement as

$$\begin{array}{r} 0001001101+ \\ 1 \\ \hline 0001001110 \end{array} \quad (-78)$$

### RESTORING DIVISION:-

$$A=00000 \quad M=00011$$

$$Q \Rightarrow 1000$$

$$A \quad Q$$

$$\text{Initially} \quad 00000 \quad 1000$$

$$\text{shift} \quad 00001 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 11110 \quad 0000$$

$$A+M \quad 11111 \quad 0000$$

$$\text{shift} \quad 00010 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 11111 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

$$A-M \quad 11101 \quad 0000$$

$$\text{Restore} \quad 00001 \quad 0000$$

$$A+M \quad 00010 \quad 0000$$

$$\text{shift} \quad 00100 \quad 0000$$

## PIPELINING

1. Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns Given Latch delay is 10ns Calculate

- (i) pipeline Cycle time
- (ii) Non pipeline execution time
- (iii) Speed up ratio
- (iv) pipeline time for 1000 tasks
- (v) Sequential time for 1000 tasks
- (vi) throughput

### Solution:

Given : stages = 4  
Delay = 60, 50, 90, 80 ns  
Latch delay = 10ns

### (i) Pipeline Cycle Time

$$\begin{aligned} &= \text{Max. delay due to any stage} \\ &\quad + \text{Delay due to its register} \\ &= \text{Max}(60, 50, 90, 80) + 10\text{ns} \\ &= 90\text{ns} + 10\text{ns} \\ &= 100\text{ ns} \end{aligned}$$

### (ii) Non-pipeline Execution Time

$$\begin{aligned} &= \text{Sum of delay of 4 phases} \\ &= 60\text{ns} + 50\text{ns} + 90\text{ns} + 80\text{ns} \\ &= 280\text{ns} \end{aligned}$$

### (iii) Speed up ratio

$$= \frac{\text{Non pipeline execution time}}{\text{Pipeline Execution time}}$$

$$= \frac{280\text{ns}}{100\text{ns}} = 2.8$$

### (iv) Pipeline time for 1000 tasks

$$\begin{aligned} &= \text{Time taken for 1st task} \\ &\quad + \text{Time taken for remaining 999 tasks} \\ &= 1 \times 4 \text{ Clockcycles} + 999 \times 1 \text{ Clockcycle} \\ &= 4 \times 100\text{ns} + 999 \times 100\text{ns} \\ &= 400\text{ns} + 99900\text{ns} \\ &= 100300\text{ns} \end{aligned}$$

### (v) Sequential time for 1000 tasks

$$\begin{aligned} &= \text{Number of clock cycles / instruction} \\ &= 4 \text{ clock cycles} \\ &= 4 \times 0.4\text{ns} = 1.6\text{ns} \end{aligned}$$

### (vi) Throughput

$$\begin{aligned} &= \frac{\text{Number of instructions}}{\text{executed per unit time}} \\ &= \frac{1000 \text{ tasks}}{100300 \text{ ns}} \end{aligned}$$

2. A four stage pipeline has the stage delays. Consider a non-pipelined processor with a clock rate of 2.5 giga-hertz and average cycles per instruction of 4. The same processor is upgraded to a pipelined processor with 5 stages but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume there are no stalls in the pipeline.

The speed up achieved in the pipelined processor is

- ① 3.2 ② 3.0 ③ 2.8 ④ 2.0

Cycle time in Non-pipelined processor =  $\frac{1}{\text{Frequency}}$

Frequency of the clock = 2.5 GHz

Cycle Time =  $1/2.5 \text{ GHz}$

$$= \frac{1}{2.5} \times 10^{-9} \text{ Hertz} \\ = 0.4\text{ns}$$

Non-pipeline Execution time

$$\begin{aligned} &= \text{Number of clock cycles / instruction} \\ &= 4 \text{ clock cycles} \end{aligned}$$

$$= 4 \times 0.4\text{ns} = 1.6\text{ns}$$

Cycle time in pipelined processor

$$\begin{aligned} &= \frac{1}{\text{Frequency}} = 1/2 \text{ GHz} \\ &= \frac{1}{2 \times 10^9 \text{ hertz}} = 0.5\text{ns} \end{aligned}$$

Pipeline execution time

$$\begin{aligned} &= \text{No stalls in pipeline, so one instruction / clock cycle} \\ &= 1 \text{ clock cycle} = 0.5\text{ns} \end{aligned}$$

Speed up

$$\begin{aligned} &= \frac{\text{Non-pipeline Execution time}}{\text{Pipeline Execution time}} \\ &= \frac{1.6\text{ns}}{0.5\text{ns}} = 3.2 \end{aligned}$$

Option ① is the correct answer

③ In SuperScalar processor, find the in-order execution, out-of-order execution, In-order 2-way SuperScalar processor for the given instruction

- (1)  $r_1 \leftarrow r_4/r_7$
- (2)  $r_8 \leftarrow r_1+r_2$
- (3)  $r_5 \leftarrow r_5+1$
- (4)  $r_6 \leftarrow r_6-r_3$
- (5)  $r_4 \leftarrow r_5+r_6$
- (6)  $r_7 \leftarrow r_8 * r_4$

### Solution:

Data flow graph for the given instruction



Assume that division takes 20 cycles

### In-order execution



### In-order (2-way Superscalar)



### Out-of-order Execution



## VIRTUAL MEMORY

1. A Computer with Virtual Memory has an access time to Main Memory 50 ns, the time to transfer a block from the virtual into main memory is 10 ns. The probability for the page fault is  $10^{-6}$ . What is the average access time, if the page-table is in the main memory?

**Solution :**

$$t_{av} = 10 \text{ ns}$$

$$t_B = 10 \text{ ms}$$

$$(1 - H) = 10^{-6}$$

$$t_a = ?$$

$$t_a = t_{av} + t_B + (1 - H) \times t_B.$$

Access to the Page-table.

Access to Page-table  
in Main Memory

$$t_a = (50 \times 10^{-9}) + (50 \times 10^{-9}) + (10)^6 \times (10 \times 10^{-3})$$

$$= 100 \times 10^{-9} + 10 \times 10^{-9}$$

$$= 110 \times 10^{-9}$$

$$t_a = 110 \text{ ns}$$

The average access time = 110 ns.

2. Computer with virtual memory has the following features.

- \* length of virtual address is 38-bits.
- \* page size is 16 KB.
- \* length of physical address is 32-bits

(a) How many bits is the length of page descriptor, if in addition to the frame number (FN), additional parameters occupy another 6 bits?

**Solution:**  $n = 38, f = 32$

$$\text{Page Size} = 16 \text{ KB} = 2^P = 2^{14} \text{ B}$$

\* No. of Pages in Virtual Memory:

$$2^{n-p} = 2^n \div 2^P = 2^{38} \div 2^{14} = 2^{24}$$

\* Number of Page frames in Main Memory:

$$2^{f-p} = 2^f \div 2^P = 2^{32} \div 2^{14} = 2^{18} \cdot (\text{FN})$$

Page descriptor = Frame number (FN) + 6

$$= 18 + 6 = 24 \text{ bits} = 3 \text{ B}$$

|                       |               |
|-----------------------|---------------|
| 6 bits                | 18 bits       |
| Additional Parameters | Frame Number. |

(b) What is the maximum size of the page table in bytes?

Size of page table = No. of Pages X Page descriptor

$$= 2^{24} \times 3 \text{ B}$$

$$= 16777216 \times 3 \text{ B}$$

$$= 50331648 \text{ B}$$

$$\therefore \text{Size of Page Table} = 49152 \text{ KB} = 48 \text{ MB}$$

3. If a processor has 32-bit virtual address, 28-bit physical address, 2 KB pages. How many bits are required for the virtual, physical page number?

$$(A). 17, 21$$

$$(B). 21, 17$$

$$(C). 6, 10$$

$$(D). 10, 15$$

**Solution :**

Virtual Address = 32-bit

Virtual Address Space =  $2^{32} \text{ B}$

Physical Address = 28-bit

Physical Address Space =  $2^{28} \text{ bit}$

Page size = 2 KB =  $2^{11}$

No. of entries for Virtual Page Number

$$= 2^{\text{Virtual Address}} / \text{Page Size}$$

$$= 2^{32} / 2^{11} = 2^{21}$$

Number of entries for Physical Page Number

$$= 2^{\text{Physical Address}} / \text{Page Size}$$

$$= 2^{28} / 2^{11} = 2^{17}$$

4. In a Paging Scheme, Virtual **32** address space is 4 KB and Page table entry size is 8 bytes. what should be the optimal Page size?

**Solution :**

Given,

\* Virtual address space = 4 KB. (Processor size)

\* Physical Page Table entry = 8 Bytes

Optimal Page Size

$$= (2 \times \text{processor size} \times \text{Page table entry size})^{1/2}$$

$$= (2 \times 4 \text{ KB} \times 8 \text{ bytes})^{1/2}$$

$$= (2 \times 4096 \times 8 \text{ bytes})^{1/2}$$

$$= (8192 \times 8 \text{ bytes})^{1/2}$$

$$= 256 \text{ bytes}$$

Thus,

Optimal Page size = 256 Bytes.

$\therefore$  No. of bits for Virtual Page Number = 21,

$\therefore$  No. of bits for Physical Page Number = 17.

# CACHE MEMORY

## VIRTUAL MEMORY:

① A 32 bit address system uses a paged virtual memory, the page size is 1 KByte. What is the virtual page and the offset in the page for the virtual address  $0x00030f40$ ?

For a page size of  $N$  Bytes, the number of bits in the offset field is  $\log_2 N$ . In case of a 2KBytes page,  $\log_2 2^11 = 11$  bits.

Therefore, number of bits for the page number is

$$32 - 11 = 21, \text{ which means a total of } 2^{21} = 2 \text{ Mpages.}$$

Binary representation of the address is



Given virtual Address identifies the virtual page number

$$0x61 = 97_{10}$$

The offset inside the page is

$$0x740 = 1856_{10}$$

② A 32 bit address system has a 4 Mbytes main memory. The page size is 1 KByte. What is the size of the page Table?

The page table addressed with page number field in virtual address. Number of lines in page table equals the number of virtual pages.

$$\text{Virtual Pages} = \frac{\text{Address Space [Bytes]}}{\text{Page Size [Bytes]}}$$

$$= \frac{2^{32}}{2^{10}} = 2^{22} = 4 \text{ M Pages}$$

\* Width of each line in the page table equals width of the page number field in virtual address.

\* If no. of virtual pages is  $2^{22}$ , then line width is 22 bits.

Page table size = Number of Entries \* line size

$$= 2^{22} * 22 \text{ bits}$$

$$= 11.5 \text{ M byte}$$

## CACHE Memory

1. A block-set associate Cache memory consists of 128 blocks divided into four block sets. The main memory consists of 16384 blocks & each block contains 256 eight bit words.

1. How many bits are required for addressing the main memory?
  2. How many bits are needed to represent the TAG SET & word fields?
- Solution:

Given

No. of blocks in Cache memory = 128

No. of blocks in each set of Cache = 4

Main memory size = 16384 blocks

Block size = 256 bytes

1 word = 8 bits = 1 byte

## Main Memory Size

Size of main memory = 16384 blocks

$$= 16384 \times 256 \text{ bytes}$$

$$= 4194304 \text{ bytes}$$

$$= 2^{22} \text{ bytes}$$

The no. of bits required to address main memory = 22

Number of bits in block offset we have

$$\text{Blocksize} = 256 \text{ bytes} = 2^8 \text{ bytes}$$

Thus, number of bits in block } = 8 bits  
offset or word }

## Number of bits in set number:

$$\text{no. of sets in Cache} = \frac{\text{no. of lines in Cache}}{\text{Set size}}$$

$$= 128 \text{ blocks} / 4 \text{ blocks}$$

$$= 32 \text{ sets}$$

$$= 5 \text{ sets}$$

Thus, no. of bits in set number = 5 bits

## Number of bits in Tag number

$$\text{no. of bits in tag} = \text{no. of bits in physical address} - \left( \begin{array}{l} \text{no. of bits in set no.} \\ + \text{no. of bits in word} \end{array} \right)$$

$$= 22 \text{ bits} - (5 \text{ bits} + 8 \text{ bits})$$

$$= 22 \text{ bits} - 13 \text{ bits}$$

$$= 9 \text{ bits}$$

Thus, no. of bits in Tag = 9 bits

Therefore, the physical address is



1 01 0 1



Engineer to Excel

# SIMATS

SCHOOL OF ENGINEERING

Approved by AICTE | IET-UK Accreditation



Saveetha Nagar, Thandalam, Chennai - 602 105, TamilNadu, India