

## INTEL M-PROCESSORS

Processors :-

Bus :- (Bus can be defined as bunch of wires)

(i) Address Bus :-

By knowing the size of address bus, we can define the how much memory locations it provides.

- for Serial comb<sup>n</sup> (far distance transmission)
- By Parallel comb<sup>n</sup> (short distance transmission)

(ii) Data Bus :-

- func of data bus is to carry the data.

- Size of data bus (n-bit) = n bit processor

- data bus size always is in  $2^N$  (because data is in binary 0 & 1 so)



(iii) Control bus :-

its access to other Processor or memory.

as 4, 10, 16, 32... bit

### Classification of Processor :-



## ① Von Neumann:



- its having only one single memory

## ② Harvard:



- its having two different memory for data & program.

## ③ CISC:

Complex Instruction Set Computer  $\rightarrow$  CISC

- Different instruction of diff. length & format.

- limited general purpose register.

## ④ RISC:

Reduced Instruction Set Computer  $\rightarrow$  RISC

- large general purpose register.

- each instruction of fixed length & format.

- 64 bit RISC  $\rightarrow$  64 registers and each register having 64 bits.

## SC25 Processor:

$\hookrightarrow$  16-bit data bus

$\hookrightarrow$  40-bit Address bus

$\hookrightarrow$  16-bit control bus

### Architecture

#### ① Execution Unit (EU)

$\Rightarrow$  ALU (addition, subtraction, mul.)

$\hookrightarrow$  Conditional [CarryFlag, Parity, Overflow, Sign, Zero, NE]

$\Rightarrow$  Flag Register

(16 bit)

$\hookrightarrow$  Control flag [Interrupt, DirectionF, TrapF]

1011

↓

$\hookrightarrow$  Carry

#### ② Bus Interface Unit (BIU)

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

↓

1011

- data reg
- extra reg.
- stack reg.

⇒ Instruction pointer (IP) : is referred to as an offset for the code seg.



- If data seg is full only then extra seg can be used to store.
- stack seg. is a memory store aside to store a data (Backup) (Temporary Backup)
- 16 bit → 20 bit
- Code seg.      code segment Register

By adding 0000 → code seg. address

only starting code segment address is ended with 0000  
 CS    3 4 8 A 0 ← hardware zero  
 IP +    4 2 1 4  
 3 8 A B 4 ← new address of code segment

- A stack is a section of memory set aside to store address and data while a subprogram executes.
- BIU handles all transfer of data and addresses on the buses for the execution unit.

#### Register organizations

| Accumulator | AH | AL | AX |
|-------------|----|----|----|
| Base        | BH | BL | BX |
| counter     | CH | CL | CX |
| Data        | DH | DL | DX |

|              |    |
|--------------|----|
| code segment | CS |
| data         | DS |
| extra stack  | SS |
| extra        | ES |

|               |    |
|---------------|----|
| Instruction   | IP |
| source        | SI |
| destination   | DI |
| base pointer  | BP |
| stack pointer | SP |

- max. code segment memory = 64 Kbyte
- minimum code segment memory = 1 byte = 8 bit
- By the help of instruction point we can access the code segment memory.
- 16-bit processor → 16-bit word  
 $db \rightarrow$  data byte 05  
 $dw \rightarrow$  data word (16bit)
- $mov ax, data$  ? supply correct data to the code  
 $mov dx, ax$

Accumulator register : all the data is stored here

Counter register CX : gives counter values

Data register DX : used in multiplication & division

## Pin Diagram of 8086 :

In Digital IC's we supply only upto 5V. & In analog IC's we supply upto 12-15V.

- to Pin IC's [Digital IC]
- Clock (CLK) : 22<sup>1/2</sup> ON or OFF time
- Vcc & power supply 5V ±10%
- RESET : registers, segment & flag  
CS: FFFFH ; IP: 0000H  
when Reset = 1 ; all pins (all values) are reset.
- AD1-AD15 : Address & data Bus  
Address (20-bit)  
data (16-bit)  
which is used, this decided by the ALE (address latch enable)
- ALE & address latch enable  
when ALE is high that it provides to work on address.
- A16 to A19 : Address status Bus  
By S4(A17) & S5(A18) we can decide which segment is used.  
S4(A17) S5(A18)
 

|   |   |         |
|---|---|---------|
| D | 0 | → extra |
| D | 1 | → stack |
| I | 0 | → code  |
| I | 1 | → data  |
- S6(A19) : logic zero
- S5(A18) : indicate condition of (interrupt) IF flag bits
- 2 ground Pins - interrupt (8086 has 2 interrupt pins NMI, INTR)
- NMI & non maskable interrupt (cannot be disabled unless the instruction is completed.)
- INTR : interrupt request (other processor connected to 8086 wants to use it)

Date 21-july  
Page

- INTA : interrupt acknowledgement

- Direct Memory Access :  
HOLD : If memory 2 data is sent to memory 1 then if processor is involved then new & HOLD Acknowledge (HLDA) : other operations gets stopped.



- So, to avoid this DMA will send HOLD signal, then HLDA signal is sent to DMA ⇒ Data bus and address bus can be used by DMA ⇒ memory can be sent simultaneously, processor can do its internal tasks.

- BHE/S7 : bus high enable [enable most significant data bits during reading / write operations: work with accordance with A01 pin]

| A01 | BHE |                                 |
|-----|-----|---------------------------------|
| 0   | 0   | whole word                      |
| 0   | 1   | High byte to / from odd address |
| 1   | 0   | Low byte to / from even address |
| 1   | 1   | no selection                    |

- \* 1-clock cycle - 16 bits can be fetched if even address
- \* 2-clock cycle - 16 bits = 8-bit + 8-bit [1-for high byte (8-bit) 1-for low byte (8-bit)]
- \* even address - 00 → whole word (16-bit)
- \* odd address - 01 → [8-bit + 8-bit] - 16-bit
- RD : read operation ? - for peripherals
- WR : /lock : write operation
- MN/MX : [min & max mode]  
in minimum mode microprocessor do not associate with any co-processor
- S2 : status signal, SI, SO in max mode microprocessor associate with any co-processor

- LOCK % Used to lock the peripheral devices [analogous to sleep mode] internal processing is possible, peripheral can't use the processor
- RQ / GT0 / HOLD % request - DMA
- RQ / GT1 % Grant - DMA
- Queue status
 

|           |                                                  |
|-----------|--------------------------------------------------|
| CS0 / DS1 |                                                  |
| 0 0       | Queue is idle [some data is there but we're not] |
| 0 1       | first (instruction) byte of opcode               |
| 1 0       | queue is empty                                   |
| 1 1       | subsequent byte of opcode                        |

|           |    |    |               |
|-----------|----|----|---------------|
| VSS (GND) | 1  | 40 | VCC           |
| AD14      | 2  | 39 | AD15          |
|           | 3  | 38 | A16/53        |
|           | 4  | 37 | A19/54        |
|           | 5  | 36 | A1B/55        |
|           | 6  | 35 | A19/58        |
|           | 7  | 34 | BHE/S7        |
|           | 8  | 33 | MNIMTX        |
|           | 9  | 32 | RD            |
|           | 10 | 31 | RQ / GT0 HOLD |
|           | 11 | 30 | RQ / GT1 HLDA |
|           | 12 | 29 | LOCK WR       |
|           | 13 | 28 | S2            |
|           | 14 | 27 | ST            |
|           | 15 | 26 | SO            |
| ADD       | 16 | 25 | DS0 ALE       |
| NMI       | 17 | 24 | DS0 INTA      |
| INTR      | 18 | 23 | TEST          |
| CLK       | 19 | 22 | READY         |
| VSS (GND) | 20 | 21 | RESET         |

### (3.7) Instruction set of 8086 %

#### Classification %

- ① Data transfer instruction
- ② Bit manipulation instruction
- ③ Arithmetic
- ④ program execution transfer
- ⑤ String
- ⑥ Processor control

Avoid  
Mistakes

- By :
- (i) Source and destination operands must be of same size as example move AL, 08H
  - (ii) Can't transfer data between memory location with the help of single instruction.
  - (iii) Destination must be register or memory location.

#### ① Data transfer instruction %

Eg. MOV BX, [0301 H]  
 ↑ memory location → offset address  
 address:

base address + (0301) this value = gives the new location

and at new location value, which is stored in BX  
 Which is 16-bit.

- PUSH Operand: / POP Operand (it pops the operand from top it pushes the operand into top of stack last in first out)

E.g. PUSH BX  
 SS = 1230



It decrements the stack pointer by 2  
 PUSH BX = 1234  
 Push DS = 0324  
 POP DX (increment by 2 SP)  
 POP BX



(iii) SAL is same as SHL

(iv) SAR

MSB  $\rightarrow$  LSB  $\rightarrow$  carry  
SAR AL, 1 [In this msb bit represent the sign]  
AL =   
AL = 

\* Shift left by  $n \approx$  multiply by  $2^n$   
Shift right by  $n \approx$  divide by  $2^n$

Rotate instruction:

ROL rotate over left

ROR rotate over right

RCL rotate over left with carry flag

RCR rotate over right with carry flag

- ROL des, count

ROL AL, 1 

- ROR

MSB  $\rightarrow$  LSB  $\rightarrow$  c

- RCL

C  $\leftarrow$  MSB  $\leftarrow$  LSB  
CF = 0  
RCL AL, 1 AL = 

- RCR 

### ③ Arithmetic instructions:

Add MUL

Adc DIV

Sub INC

Subb DEC

CBW CWD

- ADD des, src 

carry is not considered

- Adc des, src 

carry is considered

- Subb.  $\rightarrow$  considers borrow

- MUL, DIV  $\rightarrow$  multiplication & division always performed with AL.

MOV AL, 03H  
MOV BL, 03H AL = 03  
MUL BL BL = x 03

Ax  $\rightarrow$  16-bit

If Ax = 1103  $\rightarrow$  16-bit

Bx = 3003

x  $\rightarrow$  32-bit

then Ax will have LEB(16-bit) + Dx will have MSB(16-bit) = 32-bit





- Near call } → IP changes  
Near jump } CS does not change.
- Far call } → IP changes  
Far jump } CS also changes



↳ condition jump instructions  
CMP

### ⑤ String Instructions:

MOVSB, MOVSBL, MOVSBL F → compare

SCAS

MHRS / MHRSBL / MHRSBL F

- SRC → data seg.
- DS → extra seg.
- offset word is stored in

REP → repeat

### ⑥ Processor Control Instructions:

- repetition happens till CX is 0.
- STC → sets CF = 1
- CLC → clear CF  $\Rightarrow$  CF = 0
- CMC → complement the state of the carry flag CF
- STD → sets direct flag = 1
- CLD → sets DF = 1, i.e., clears it

→ AAA → ASCII adjust after addition

Packed data / Unpacked data

### (2.19) ADDRESSING MODES:

① Implied Addressing mode:  
instruction does not have any operand.  
for ex - CLC, STD, CLD, STC

② Immediate Addressing mode:  
one of the operand is the immediate data.  
for ex: MOV AL, 05H

③ Register Addressing Mode:  
destination & source both are registers.  
for ex: MOV AX, BX

④ Direct Addressing Mode:  
direct address is available as part of instruction.  
for ex: MOV DL, 00H  
- memory add or offset value for memory add is given in the  
instruction directly.

⑤ Reg. Indirect Add.:  
if offset or memory add is coming through a register.  
for ex: MOV DX, [BX + 0003H]  
MOV AL, [BX]

⑥ Base add mode:

15BCE178

offset value is passing through the BX register.  
for exs.  $\text{mov AL, [BX]}$  &  $\text{mov AL, [BP]}$

⑦ Index Add. modes  
offset value is passing through the SI & DI.  
 $\text{mov AL, [SI]}$  &  $\text{mov AL, [DI]}$

⑧ Base Index Add. modes

$$\begin{array}{ll} \text{BX} = 0002 & \text{BX} + \text{SI} = 0102 \\ \text{SI} = 0100 & \end{array}$$

$\text{mov AL, [BX][SI]}$   
[0102]

⑨ Base Index Add. + displacement add. modes

$$\begin{array}{ll} \text{mov AL, [BX][SI] + 24} & 0102 \\ \quad [0126] & + 24 \\ & 0126 \end{array}$$

- 1K byte memory is covered by the only ISP address



To store one ISP address we required 4 bytes [ 16 bits  $\rightarrow$  2 bytes ]  
 $\text{CS} \rightarrow$  2 bytes  
So 8086 processor can handle 1000 interrupt at a time  
 $= 256$

The table which is stored the ISP address, that's called as IVT (interrupt vector table)

IVT (interrupt vector table) :-

|                                        |            |             |
|----------------------------------------|------------|-------------|
| Hardware<br>softwares<br>interruptions | Type - 255 | 3FF         |
|                                        | T          | 3CF         |
| Type - 32                              | 8          |             |
| Type - 31                              | 2          |             |
| Type - 5                               | 7          | Reserved    |
| Type - 4                               | 10         | Overflow    |
| Type - 3                               | OC         | Break Point |
| Type - 2                               | OB         | NMI         |
| Type - 1                               | 0+         | single step |
| Type - 0                               | 00         | divide by 0 |

- Type - 0 to Type - 4 are pre-defined for 8086 processor  
Type - 0 : divide by 0  $\text{CS} = 0002\text{h}$   $\text{IP} = 0000$   
Type - 1 : single step (-t)  $\text{CS} =$  ,  $\text{IP} =$   
Type - 2 : NMI (hardware interrupt)

### Interrupt (chapter - 8) :-

Hardcore  
 $\rightarrow$  NMI, INTR

software  
 $\rightarrow$  by writing  
program we  
utilize some  
error

Execution / Prog. errors  
 $\rightarrow$  divide by 0  
 $\rightarrow$  single step instruction  
 $\rightarrow$  break point  
 $\rightarrow$  overflow occur  
 $\rightarrow$  NMI, INTR

When interrupt will occurs:





Type-3 : Break point ( $-t_5$ )

Type-4, 5 : overflow

- If 2 interrupts occurs at a same time then a priority list is there for 8086 -  
INTO / INT / divide error  
NMI  
INTR  
Single step interrupt
- the same priority list is considered by other IC's
- reset is also an interrupt and has highest priority in advance IC's.

(Chapter-4 PARMAM)

### Performance

$$\text{Performance} = \frac{1}{\text{Execution time}}$$

$$\begin{aligned} \text{Execution time consist of} &= \text{instructions} \times \text{clock cycle per inst} \\ &\quad \times \text{second per clock cycle} \\ &= \text{Inst's} \times \text{CPI} \\ &\quad \text{Clock rate} \end{aligned}$$

To increase performance of processor

- Instruction Set Architecture (ISA)  
No. of cycles per instruction can be decreased.
- Designer of processor does not have any control over no. of instr'n written in a code

Date: 2-Aug-04  
Page:

Date:  
Page:

- second per clock cycle depends on technology or freq x power. So if oscillator freq is fixed power consumed also increases  
⇒ We have a limitation of (2-4 GHz) in this

- To increase no. of cycles per instruction, superscalar is used ⇒ no. of instructions in single cycle is increased

$$\begin{aligned} \text{Performance} &= \frac{\text{Clock rate}}{\text{CPI}} \\ \text{CPI (cycles per instruction)} &\approx 2 \text{ (ideally)} \end{aligned}$$

IPC → instruction per cycle

$$\text{Performance} = \text{IPC} \times \text{Clock rate}$$

### AMDHAL'S LAW (Performance enhancement)

$$\text{Speed up} = \frac{1}{\text{sequential part} + \text{parallel part}}$$

No. of Processor/functional unit

Instruction depends  
on others

Independent of all  
Instructions

- as code can not be 100% parallel part ⇒ no. of cores on a particular chip is limited.

Example A processor chip used for 1 application in which 30% of execution time is spent on floating point addition, 25% of time on floating point multiplication, 10% of time on floating point division. For enhancement of performance design team comes up with certain suitable solution. Find out which enhancement soln is best.

- (i) Re-design a floating point adder to make it twice as fast
- (ii) Re-design the multiplier which becomes 3 times faster
- (iii) Make your divider 10 times faster.

Soln

$$f_A = 0.3$$

$$f_M = 0.25$$

$$f_D = 0.1$$

$$f_A = \frac{1}{0.7 + 0.3/2} = 1.176$$

$$f_M = \frac{1}{0.75 + 0.25/2} = 1.2 \quad (\text{best})$$

$$f_D = \frac{1}{0.9 + 0.1/10} = 1.098$$

Performance =  $\frac{\text{Clock rate}}{\text{Avg. CPI}}$

LIPS → million instructions per second

Date: 3-Aug-09  
Page:

| Class         | M1<br>CPI | M2<br>CPI |
|---------------|-----------|-----------|
| (Floating)    | 5         | 4         |
| I             | 2         | 3.8       |
| (Numerical) N | 2.4       | 2         |

- (i) What is the Peak Performance for M1 & M2?

Peak Performance

$$= \frac{\text{Clock rate}}{\text{Avg. CPI}} = \frac{600}{\frac{5+2+4}{3}} = \frac{600}{3.13} = 192 \text{ (Performance for M1)}$$

$$M_1 \rightarrow \text{Peak Performance} = \frac{600}{2} = 300 \text{ MIPS (million instructions per second)}$$

(assume all of class I)

$$M_2 \rightarrow \frac{500}{2} = 250 \text{ MIPS (all of class N)}$$

- (ii) If 50% of all instructions executed in certain applications belonging to class N and rest are divided equally among classes F and I. Find out which machine is faster by what factor?

Soln

$$\text{Avg. CPI } M_1 = \frac{5}{4} + \frac{2}{4} + \frac{2.4}{2} = 2.95$$

$$\text{Avg. CPI } M_2 = \frac{4}{4} + \frac{3.8}{4} + \frac{2}{2} = 2.95$$

$$M_1 \text{ Performance} = \frac{600}{2.95} = 1.2$$

$$M_2 \text{ Performance} = \frac{500}{2.95}$$

$$M_1 \text{ Performance} = 1.2 \text{ M}_2 \text{ machines}$$

M1 is faster than M2 by 1.2

ISBCE178

(iii) M1 Someone want to plant redesigned for better performance with assumption of Part-2, which of the following redesign greater effect impact on performance.

(a) faster floating point unit which will double speed.

$$\text{Avg. CPI} = \frac{2.5}{4} + \frac{2}{4} + \frac{2.4}{8} = 2.325$$

$$\text{Performance} = \frac{600}{2.325} = 258$$

(b) add second ALU to reduce class I; CPI to 1.2

$$\text{Avg. CPI} = \frac{5}{4} + \frac{1.2}{4} + \frac{2.4}{2} = 2.75$$

$$\text{Performance} = \frac{600}{2.75} = 218$$

(c) used faster logic block that allows to increase the clock rate up to same CPI.

$$\text{Performance} = \frac{350}{2.95} = 254$$

till modification incosic highly impact on machine M1.

(d) the current CPI given can included the effect of cache misses at avg. rate of 5% : each cache miss will impose penalty of 10 clock cycle. Redesigning of M1 with larger cache that can accommodate large no. of instruction and ultimately reduce miss rate by 5%. to 3%. Compare performance with any above of the modif.

Sol 8

$$5\% \times 10 = 0.5$$

$$3Y \times 10 = 0.3$$

$$\text{Improvement} = 0.2$$

$$\text{Avg. CPI} = \frac{(5-0.2) + (2-0.2) + (2.4-0.2)}{3} = 2.75$$

$$\text{Performance} = \frac{600}{2.75} = 218$$

its modification is equal to (Modif) CPI to 1.2.



CACHE 8



- if data is not available in cache  $\rightarrow$  cache miss  
- cache size is fixed, miss rate fixed

- 0.2% improvement  $\Rightarrow$  performance is good  $\Rightarrow$  curr. CPI is decreased  $\Rightarrow$  0.2 is subtracted to CPI

Q. 9 (a) device and application that would run faster on M1 than M2

|   | M1          | M2         |
|---|-------------|------------|
| F | $5x$        | $4x$       |
| I | $2y$        | $3.8y$     |
| N | $24(1-x-y)$ | $8(1-x-y)$ |

$M1 > M2$

$$\begin{aligned} & \frac{600}{5x+3y+2.4(1-x-y)} > \frac{600}{4x+3.8y+2(1-x-y)} \\ & 1.2[4x+3.8y+2(1-x-y)] > 5x+3y+2.4(1-x-y) \\ & 4.8x+4.56y+2.4-2.4x-3.8y > 5x+3y+2.4-2.4x-2.4 \\ & 2.56x+0.76y > 0.2x \\ & 12.8y > x \\ & x < 12.8y \end{aligned}$$

(d)

The current CPI given has included the effect of cache misses at average rate of 5%.

ISRC E178