

## UNIT 2

Instructions

are binary bit pattern with specific meaning to microprocessor

Instruction size

size of (opcode + operand).

Control Unit Design.

Processor Unit

↓  
data processing unit

Functional units which  
can perform operation  
on Data.

- \* ALU
- \* Registers
- \* accumulator
- \* Flags .

↓  
control unit

Timing and sequencing of  
operations performed on  
data by various functional  
units

↓  
It decides which and  
when micro-operation is  
to be performed.





## Addressing Modes.

→ The various ways of specifying operand in an instruction,



### 1. Immediate Addressing Mode.



In this mode the value of the operand is specified in the instruction itself.

### 2. Register Addressing Mode



When the operand is present in a CPU register then the name of that register is given in the instruction.

### 3. Direct Addressing Mode.



Operand is present in some memory location whose address is directly specified in the instruction.

### 4. Register Indirect Addressing Mode.



The operand is present in memory whose address is stored in the CPU register, the name of this CPU register is given in the instruction.

### 5. Indirect Addressing Mode.

Opcode | memory address is given which contains memory address of operand



operand is stored in Memory whose address is also stored in some memory location. The address of this memory location is given in the instruction.

and location



## Single Bus Organization

Internal Bus Control Signals.



## Memory Read Operation

Internal Bus



for Operation MOVE R<sub>1</sub>, [R<sub>2</sub>] R<sub>2</sub> contains  
address of  
memory location

Pseudo code.

|                                  |
|----------------------------------|
| MAR $\leftarrow$ R <sub>2</sub>  |
| Read Memory                      |
| Wait for MFC                     |
| operand $\rightarrow$ MDR        |
| MDR $\rightarrow$ R <sub>1</sub> |

Signals:

- ① R<sub>2</sub>\_out, MAR\_in, Read
- ② Wait for MFC
- ③ MDR\_out, R<sub>1</sub>\_in

Timing Diagram:

OE  
→ Output  
in Enable

Que MOVE [R<sub>2</sub>], R<sub>1</sub>

MAR ← R<sub>1</sub>

MDR ← R<sub>2</sub>

Write

Wait for MFC

① R<sub>1</sub>-out, MAR-in

② R<sub>2</sub>-out, MDR-in

write

③ WMFC

- Complete Instruction for ADD R<sub>1</sub>, [R<sub>2</sub>]

(PC-out, MAR-in, Read, Select C, Add 12-in)

(Z-out, PC-in), WMFC.

(MDR-out, IR-in) - I Decode

(R<sub>2</sub>-out, MAR-in, Read)

R<sub>1</sub>-out, Y-in, WMFC

MDR-out, Select Y, Add, Z-in.

(Z-out, RI-in, END) Execute

## Control Unit Design.

**Hardwired Control unit**  
(complete set of Hardware)

It uses fixed logic circuit to interpret instructions and generate control signals.

**Micro - Programmed control unit**

control signal selection and sequencing information is stored in a memory called control memory. Hence the control signal to be activated at any time are specified by micro-instruction



Decoder.

Encoder.



| I <sub>1</sub> | I <sub>0</sub> | Q <sub>3</sub> | Q <sub>2</sub> | Q <sub>1</sub> | Q <sub>0</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|
| 0              | 0              | 0              | 0              | 0              | 1              |
| 0              | 1              | 0              | 0              | 1              | 0              |
| 1              | 0              | 0              | 1              | 0              | 0              |
| 1              | 1              | 1              | 0              | 0              | 0              |

$$Q_3 = I_1 \cdot I_0$$

$$Q_2 = I_1 \cdot \overline{I_0}$$

$$Q_1 = \overline{I_1} \cdot I_0$$

$$Q_0 = \overline{I_1} \cdot \overline{I_0}$$

Made up using AND operations

Encoder



| I <sub>3</sub> | I <sub>2</sub> | I <sub>1</sub> | I <sub>0</sub> | Q <sub>1</sub> | Q <sub>0</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|
| 0              | 0              | 0              | 1              | 0              | 0              |
| 0              | 0              | 1              | 0              | 0              | 1              |
| 0              | 1              | 0              | 0              | 1              | 0              |
| 1              | 0              | 0              | 0              | 1              | 1              |

$$Q_1 = I_2 + I_3$$

$$Q_0 = I_1 + I_3$$

Made up using OR operations



Block Diagram of Hardwired Control Unit.

- 1 PC OUT, MAR in, Select C, Add, Z in, Read
- 2 WMFC, Z out, PC in
- 3 MDROUT, IR in
- 4 RIOUT, R2 in.

$\text{MOVE R}_2, \text{R}_1$  → One byte  
 operand was in processor register.  
 No memory Read/Write operation is required for execution.

micro-operation.



Machine cycle → Opcode fetch  
 → memory read  
 → Memory write  
 → I/O Read  
 → I/O write.

ADD B .  
1. Opcode fetch - 4T states

ADD M .

1. Opcode fetch
2. Memory Read

MOVE R<sub>2</sub>, [R<sub>1</sub>] ]

(4) R<sub>1out</sub>, MAR<sub>in</sub>, Read

(5) WMFC

(6) MDR<sub>out</sub>, R<sub>2in</sub>.

LDA 2000H .



④ Memory Read Execution cycle.

Size of fetch cycle depends on the size of instruction.

PC<sub>out</sub> depends on the timing / clock cycle .

PC<sub>out</sub> depends on T<sub>i</sub> when T<sub>i</sub> is ON PC<sub>out</sub> is ON  
MAR<sub>in</sub> depends on PC<sub>out</sub> + depends on IR .

Faster speed in Handwired control unit.  
 Disadvantage:  
 complexity of the circuit  
 Size becomes large.

logic design circuit  
 No programming req.  
 collection of instr.  
 microprogram → which control  
 signals need to be  
 set.

### micro instruction



sequencer generates the address of microprogram's in the control memory. Also known as micro program counter.



info of various control signals to execute a particular cycle of a instruction cycle.

# Performance Assessment Of Computer

## 1. SYSTEM CLOCK OR PROCESSOR CLOCK.



or  
one clock pulse.

ADD B

$\downarrow$   
4 T-states

1 T-state = 1 clock cycle.

Time req. to execute a program

$$\begin{array}{l}
 \text{Program} \\
 \left. \begin{array}{ll}
 I_1 - 4 & \text{CPI} \\
 I_2 - 4 & \\
 I_3 - 3 & \\
 I_4 - 3 & \\
 I_5 - 5 &
 \end{array} \right\} T = \text{TOTAL NO. of cycles required to execute complete program} \\
 \times \text{Time for 1 clock cycle.}
 \end{array}$$

$$\begin{aligned}
 &= 19 \times 1 \mu\text{sec} \\
 &= 19 \mu\text{sec.}
 \end{aligned}$$

$$f = 1 \text{ MHz}$$

$$T = 1 \mu\text{sec.}$$

$$\text{CPI} = \underline{\underline{19}}$$

We take average CPI so that we don't have to add all CPI's.

$$\text{CPI}_{\text{average}} = \frac{19}{5} \approx 4.$$

$$\begin{aligned}
 T &= \frac{\text{CPI}_{\text{average}}}{\text{No. of instruction}} \times \text{Cycle time} \\
 &= 5 \times 4 \times 1
 \end{aligned}$$

manufacturer directly provides the value of average CPI

|          |  |
|----------|--|
| PAGE No. |  |
| DATE     |  |

$$T = \frac{I_c \times CPI \times \tau}{f}$$

↓      ↓      ↓

No. of instructions per program    average CPI per instruction ,

basic Performance equation .

$$T = I_c \times CPI \quad \text{where } f = \frac{1}{\tau}$$

↓

frequency of system clock

$$T = I_c \times CPI \times \tau$$

↓

$$p + (m \times k)$$

cycles required by processor      cycles required by memory.

ratio of memory cycle time & processor cycle time .

memory slower than processor .

memory cycle time  $>$  processor cycle time .

$$\text{MIPS RATE} = \frac{I_c}{T \times 10^6} = \frac{\tau}{CPI \times \tau \times 10^6}$$

million Instructions  
per second .

$$= \frac{1}{CPI \times \tau \times 10^6}$$

$$= \frac{f}{CPI \times 10^6}$$

to add

A program has 2 million instruction which is to be executed on a processor with  $f = 400 \text{ MHz}$ . The program consists of 4 types of instruction whose

| CPI            | Instruction mix |
|----------------|-----------------|
| I <sub>1</sub> | 1      60%      |
| I <sub>2</sub> | 2      18%      |
| I <sub>3</sub> | 4      12%      |
| I <sub>4</sub> | 8      10%      |

Calculate MIPS rate

$$0.6 + 0.36 + 0.48 + 0.8 = 2.84 = \text{CPI}$$

$$\begin{aligned} &= 400 \times 10^6 \\ &2.84 \times 10^6 \\ &= 178 \end{aligned}$$

$$\text{MFLOPS rate} = \frac{f}{\text{CPI} \times 10^6} \quad (\text{same as MIPS})$$

CPU

Processing unit      control unit.  
OR  
data path

|          |  |
|----------|--|
| PAGE No. |  |
| DATE     |  |

## Parallel Processing.

### Pipelining

Data path pipelining  
OR

Arithmetic pipelining  
OR

Processing unit pipelining.

Instruction pipeline.



$R_i \rightarrow$  Registers or Buffers to store incomplete result

$C_i \rightarrow$  Computation unit stage.

### Instruction Pipeline.

No Pipeline

I<sub>1</sub>   F   D   E

I<sub>2</sub>

F   D   E } 6T States

I<sub>1</sub>   F   D   E

I<sub>2</sub>                F   D   E }

4T.

- Pipeline is a technique for increasing processor throughput.
- It can be applied in processor in 2 ways :
  - (i) Hardware pipeline (also processing pipeline or arithmetic or data path pipeline)

### (ii) Instruction pipeline.

In hardware pipeline the design of complex processing unit such as multipliers, floating point adders are divided into n number of intermediate stages. These stages can work simultaneously on a number of operands with each stage is made independent of other stages using buffers / registers as a result once the pipeline is full the result can be obtained after every cycle.

In case of instruction pipeline fetching of next instruction can be overlapped with decoding of first instruction which means when 1<sup>st</sup> instruction is being decoded next instruction can be fetched.

## Floating Point Addition

1. Exponent comparison
2. Mantissa shift
3. Mantissa addl
4. Exponent addl/sub, Mantissa shift.



Exponent compare    Mantissa shift    Mantissa addl

$$T \rightarrow A, B$$

$$2T \rightarrow C, D \quad A, B$$

$$3T \rightarrow E, F \quad C, D \quad A, B$$

$$4T \rightarrow G, H \quad E, F \quad C, D \quad A, B$$

Assume that the instruction cycle of a processor requires four steps i.e., instruction fetch, execute, store result. Draw the diagram to implement four stage pipelining. Also, indicate how pipelining increase throughput in this case.

I<sub>1</sub> F D E S

I<sub>2</sub>

I<sub>3</sub>

12 T states

F D E S

F I D E S

I<sub>1</sub> F D E S

I<sub>2</sub> F D E S

I<sub>3</sub> F D E S

6 T states



In branching achieving this type of pipelining is problematic

MOV A, B

ADD B

JNC AGAIN

STA 3000

AGAIN HLT

Branch Prediction

Whenever we encounter conditional statement there are two probabilities.