

## Instruction Set

Computer Organization

- Information representation
- Arithmetic op.
- Instruction formats.

1) Decimal  $\rightarrow$  Binary.

2) Signed Number representation.

$$+5 = 0101 \quad -5 = 1101$$

3) 1's Complement.

4) 2's Complement : 1's complement + 1

5) Addition

6) +0 and -0 have different value.

7) 2's complement conversion to Decimal  $\overset{\text{add}}{\text{add}} + \text{ve}$

8) Addition of 2's complement Number

$$\text{range: } -2^{n-1} \text{ to } 2^{n-1} - 1$$

9) Subtraction of 2's complement Numbers

10) Overflow in int Arithmetic

$$\downarrow \quad -2^{n-1} \leq V \leq 2^{n-1} - 1$$

occur when two numbers have same sign and  
different from sum.

Subtraction  
2's complement

2's complement of  $V + X$ .

11) Advantage's, Disadvantage. of each.

→ Char Representation

1) BCD Binary coded Decimal

2) Hexadecimal Number & Advanges.

→ Memory (store 1 bit number @ cell).

space of another  
techniques to achieve

## Context Switch

- One leaves the Running state

## Concurrent Process

- Two processes are concurrent if their execution overlap in time.
- In a uniprocessor environment, multiprogramming provides concurrency.
- In a multiprocessor, true parallel execution can occur.

## Forms of concurrency

- 1) Multiprogramming
- 2) Time sharing : An extension of multiprogramming
- 3) Multiprocessing

- 1) Creates logical parallelism by running several processes / threads in a time. The OS keeps several jobs in memory simultaneously.  
It selects a job from the ready state and starts executing it.

## Multiplexing

## Protection

When multiple processes exist at the same time, and execute concurrently, the OS must protect them from mutual interference.

## Processes and threads

- Traditional processes could only do one thing at a time - they are single-threaded.
- A thread is an "execution context" or "separately schedulable" entity.
- Multithreaded processes can do several things at once - they have multiple threads.

## Threads

- several threads can share the address space of a single process, along the resources such as files.
- Each thread has its own stack, PC, and TCB (thread control block)
  - i) Each thread ~~has its own stack~~ executes a separate section of the code and has private

## Process (Thread) scheduling

### Scheduling algorithms:

- 1) FCFS (First-come, first serve): non-preemptive
- 2) RR (round robin): preemptive FCFS, based on time slice
  - Time slice = length of time a process can run before being preempted
  - Return to ready state when preempted

### Scheduling Goals

- Optimize turnaround time and/or response time
- Optimize throughput
- Avoid starvation (be "fair")

→ Respect priorities

- static

- Dynamic

## Interprocess communication (IPC)

→ Processes (or threads) that cooperate to solve problems must exchange information.

→ Two approaches:

1) Shared memory

2) Message passing (copying information from one process' address space to another).

→ Shared memory is more efficient (no copying), but isn't always possible.

## Process/thread synchronization

→ Concurrent processes are asynchronous: the relative order of events within the two processes cannot

## Instruction streams

Process A:  $A_1, A_2, A_3, \dots, A_m$

Process B:  $B_1, B_2, B_3, \dots, B_n$

Sequential:

## Semaphore.

- Process Synchronization - 2 Types.
- Mutual Exclusion
  - A ~~critical~~ s
- Synchronization.
  - Semaphore.
- Deadlock
- Deadlock Vs Starvation
- Causes of Deadlock.
  - Mutual exclusion (exclusive access)
  - Wait while hold (hold and wait)
  - No preemption
  - Circular wait

## The Structure of Computer



- 1) Encoded Information
- 2) Memory
- 3) Memory Units
- 4) Addresses Assignment
- 5) Machine Instructions.
- 6) Data Transfers : Possible Location.
- 7) " " : Register Transfer Notation
- 8) " " : Assembly Language Notation
- 9) MOV instruction
- 10) Basic instruction type
- 11) 3 Address Instruction

Advantage & less number of instruction

- 12) Two Address instruction
- 13) One - Address Instruction.
- 14) Processors Registers.
- 15) Addressing Modes.
- 16) Program sequencing and control.
- 17) specific Machine Levels

why OS?

- 1) Act as interface b/w software and hardware
- 2) OS is resource manager

(8) Translating Languages

(9) Assembly Language

1)                   = \* =

2) 3-addresses

4) 2-addresses.

5) 1-addresses

6) 0-addresses

7) Addressing modes

8) Implied addressing

9) Immediate addressing

10) Direct       "

11) Indirect      "

12) Register     "

Register Indirect addressing.

Pros: large address space

Cons: extra memory reference.

Displacement addressing.

Pros: flexible

Cons: complex

Stack addressing:

Pros: short and fast fetch

Cons: limited by FLD order.

13) index addressing

14) Based Index Addressing

15) The instruction cycle: intermediate re

18) Translating languages

19) Assembly language

1)

= \* =

2) 3-addresses

4) 2-addresses.

5) 1-addresses

6) 0-addresses

7) Addressing modes

8) Implied addressing

9) Immediate addressing

10) Direct "

11) Indirect "

12) Register "

Register indirect address

Pros: large address space

Cons: extra memory reference

Displacement addressing

Pros: flexible

Cons: complex

Stack addressing

Pros: short and fast fetch

Cons: limited by FLD order

13) Index addressing

14) Based Index Addressing

15) The instruction cycle: Intermediate

## Control Design

- 1) Data Processing Unit (DPU)
- 2) Control Unit (CU): issues control signals on instructions to DPU.  
• DPU is logically reconfigured by the CU to perform certain set of micro-signals.

### Functions of CU

- 1) Instruction Sequencing: Methods by which instruction are selected for execution or control of the processor i.e transferred from one instruction to another.
- 2) Instruction Interception:

### Instruction sequencing

- Each instruction can explicitly specify the address of its successor.
- It may increase the instruction length, which in turn increases the cost of memory where it is stored.
- If I2 is stored in location A, and I2 is its successor.
- Branch instruction: specifies implicitly/explicitly an instruction address X.
- Unconditional branch always alters the flow of control by causing  $PC \leftarrow X$

### Program control transfer

Often it is required to transfer of control from P1 to P2 due to subroutine call or interrupt.

## Control stack

- Stack is used to store return address
- Also used to store pass variables and variables that are local to P2.

## Stack implementation

- Smaller stack : Shift Register, Larger stack : RAM
- Push : memory write, and pop = memory read
- Stack pointer (SP) : this register contains address of the memory location currently acting as top of the stack.
- Limit register (L) : contains highest stack address
- Base register (B) : contains lowest stack address
- SP > L (SP < B) : overflow (underflow)

## Instruction interpretation

control  $\rightarrow$  signals  $\rightarrow$  via control lines  $\rightarrow$  outside



- Control and data are relative, rather than absolute
- Same physical lines, either data or control at diff. levels.

$C_{out}$  : Control the operation of D.P. unit (main function of C.U.)

$C_{in}$  : enables the data being processed to influence the C.U., allowing data-dependent decision to be made, e.g.: errors, overflow.

$C^{out}$ : transmitted to other CU, indicating the status such as busy or operation completed.

$C^{in}$ : received from the other CU. e.g. supervisory controller - start/stop / timing information.

$C^{in}$  &  $C^{out}$  to synchronize with other CU.

Behaviour of CU :-

Flowchart / Description language or both.

Implementation methods :-

1) Hardwired CU: sequential logical circuit, which generates specific sequences of control signals.

- minimizes the no. of ~~tempate~~ components
- maximizes the speed of operation.

Once constructed, changes in behaviour can be implemented by re-designing and physically ...

2) Microprogramming:

- Maurice V. Wilkes (1950)
- flexible & systematic
- A (micro) instruction stored in a special addressable memory called a Control Memory (CM)

Microprogram: sequence of micro instructions (MI) needed to execute a particular operation.

- fetch the MI from CM one at a time, and use them to activate the control lines directly.

- Micro-programmed control unit.

- firmware.

- design changes can easily be made by altering the contents of the CM.

Emulation: A micro-programmed CPU can execute programs written in the M/C language of several different computers.

Limitation

- (1) Costly than HCU due to C.M. & its access circuitry.  
slower due to fetch from C.M.

### Hardwired Control

Trade-Off: Amount of h/w, speed and cost. Design methods in practice are ad hoc, heuristic, cannot easily be formalized (due to large no. of instruction, and interdependency).

Three main methods (suitable for small C.U like RISC)

- 1) State-table method  
2) Delay-Element Method  
3) Sequence counter method

#### 1) State-table Method

- like any finite-state sequential m/c

C<sub>in</sub> : Input

C<sub>out</sub> : Output

Row : Set of internal states {S<sub>i</sub>}

Internal state : Information stored at discrete point of time

Column : Set of external signals to C.U

Entry in row S<sub>i</sub> and column l<sub>j</sub> : S<sub>i,j</sub> i, Z<sub>i,j</sub>

S<sub>i,j</sub> = next state of C.U

Z<sub>i,j</sub> = set of output signals Z<sub>i,j</sub> from C<sub>out</sub> that are activated by the application of l<sub>j</sub> to C.U. when it is in state S<sub>i</sub>.

Pros : systematic technique for minimizing no. of gates,  
glitches-flops:

Cons :-

## 2) Delay element Method

Consider the problem of generating the following sequence of computer signals at time  $t_1, t_2, \dots, t_n$  using a hardwired C.U.

$t_1$  : Activate  $\{C_1, j\}$ ;

$t_2$  : Activate  $\{C_2, j\}$ ;

:

:

:

$t_n$  = Activate  $\{C_n, j\}$ ;

Cons ~~and Pros~~

## 3) Sequence counter Method.

— X —

Relationship . Seq. counter and Delay element  
~~Multiplexor~~

## Microprogrammed Control

- Every instruction in a CPU is implemented by a sequence of one / more set of concurrent micro operations.
- Each micro operation is associated with information stored in ROM / RAM, called control memory (CM).
- Each microinstruction specifies the next

- ⇒ A micro programmed computer  $C_1$  used to execute programs written in  $nc$  language  $L_2$  (other mpc  $C_1$ ) by placing an emulator for  $L_2$  in the C.M. of  $C_1$ ,  $\rightarrow C_1$  is then said to be capable of emulating  $C_2$ .
- ⇒ Comparable with assembly language, but requires more detail knowledge of the processor h/w.
- ⇒ Symbolic language  $\rightarrow$  micro assembly language
- ⇒ Micro assembler

### \* Wilkes' Design

#### Microinstruction :

1. Control field : indicates the control lines to be activated,
2. Address field : indicates the address in the CM of the next micro instruction to be executed  
 $K_i, C_i : 16, K_i = 1$

CM : Row  $\rightarrow$  microinstruction  
 $COL \rightarrow$  Control / Address lines  
 CMAR  $\rightarrow$  control memory address register  
 which stores the address of the current instruction.

IR : instruction register

- Microinstruction format IBM sys. / 370 Model 115
- Parallelism in Microinstruction.
- Parity bits : error can be checked
- 21 different control fields: according to purpose
- Un-encoded format :
- microinstruction Register (MIR)

### Horizontal Microinstruction

- 1) Long formats
- 2) Ability to express a high degree of parallelism
- 3) Little encoding of the control information

 n.

- 1) Short formats
- 2) Limited ability to express parallel micro operations
- 3) Considerable encoding of the control information  
 $\log(n+1)$

### Microinstruction Addressing:

- 1) Each microinstruction contains the CM address of the next microinstruction to be executed
- 2) For branching, two possible next address are included.

Advantage : No time lost in address generation

Limitation : Wasteful of CM space.

Solution : microprogram counter (MPC) for branching.

Analogous to PC at instruction level

### conditional Branching:

- The condition to be tested is condition variable or flag generated by the DPU (Data Processing Unit).
- If several such conditions exist, a condition select subfield is included in the microinstruction.
- The branch address may be contained in the microinstruction itself, in which case it is loaded into CMAR, when a branch condition is satisfied (to save space CM can load only low-order bits)

Another approach:  
Skip on overflow

### Micro operation Timing:

- Each MI (Microinstruction) generates a series of control signals which are active for duration of the signal. MI's execution cycle.
- A single clock pulse synchronizes all control signals. It is called Monophase.
- Polyphase
- Each control signal is active during one of the phases.
- Added complexity

### Example : Register Transfer Operation

#### Micro operation Timing.

- Fetch & execute steps can be overlapped.

#### Residual Control

- Storing the control field in a register which continues to exercise control unit if it is modified by a subsequent MI.
- Application of residual control

Cons

- Two level
- Typically format small.

Advantage

- It can reduce time.
- Disadic Reduction
- nCM can be used.

One level

- In TWO level
- N = bit combination
- In place [6082 H]  
in nCM.

Hence size

$\frac{1}{8} N = 70$

| S <sub>0</sub> | S <sub>1</sub> | Meaning                      |
|----------------|----------------|------------------------------|
| 0              | 0              | No branching                 |
| 0              | 1              | Branch if V <sub>1</sub> = 1 |
| 1              | 0              | Branch if V <sub>2</sub> = 1 |
| 1              | 1              | Unconditional branch         |

### NaNoprogrammed Computer

- In some machines, the micro instructions do not directly issue the signals that control the hardware.
- Instead they are used to access 2<sup>nd</sup> control memory, called nanocontrol memory (nCM), that directly control the hardware.

There are 2 levels of control memories :

- 1) High level : Microcontrol memory MCM (u)
- 2) Low level : Nanocontrol memory nCM  
(microinstruction)

consequ

## Advantages.

- It can reduce the total size of CMs needed, this translates to smaller chip area.

## Disadvantage.

Reduction in speed due to extra memory access for nCM and a more complex CU organization.

$$\text{One level } S_1 = H_m (N + [\log_2 H_m]) \quad \dots \quad (1)$$

In TWO level design, μCM again stores  $H_m$  MIs, but  $N$ -bit control fields are transferred to nCM.

- In place of latter, each MI in μCM contains  $[\log_2 H_n]$  bit address to specify any NI location in nCM.

Hence size becomes,

$$S_1 = \{ H_m \cdot (N + [\log_2 H_m]) \}$$

$$= 650 (70 + [\log_2 650])$$

$$\textcircled{2} 515730 = 52,000$$

$$S_2 = H_m (2[\log_2 H_m] + [\log_2 n] + \alpha N)$$

$$= 650 (2[\log_2 650] + [\log_2 0.4] + 0.4 \times 70)$$

$$= 650 (2[10] + \cancel{8-1} + 28)$$

$$= 650 (20 + 28 - 1)$$

$$= 650 (47)$$

$$= 30550$$

## Processor Design -

CPU  $\rightarrow$  executes sequence of instructions stored in a memory which is external to CPU.

- The seq. of operations constitute instruction cycle.
- It has two parts.

i) Fetch cycle :

ii) Instruction cycles

- Sequence of micro operations, involves a register transfer operation.

CPU cycle time : ( $t_{CPU}$ ) time required for shortest well-defined CPU micro operation.

Clock rate :

CPU designs have following two premises :

- It should be as fast as technology permits ; number of components in the CPU must be kept relatively small.
- Main memory (MN) of relatively large capacity is needed to store programs and data.

Memory cycle time ( $t_M$ ) : time elapses between two successive read or write operations.

$t_M / t_{CPU} : 1-10$

- All I/O data transfers are implemented by memory referencing instructions.
- Memory locations and I/O ports share the same address space.

I/O Mapped or port-addressed I/O :-

- have I/O instruction that are distinct from memory-referencing instructions.

Accumulator : CPU register, plays a central role, being used to store an input or output operand in the execution of most instructions.

- A simple Accumulator based CPU. (Fig.)
- AR : Memory Address Register : operation of the CPU.
  - i) Fetch cycle
  - ii) Execution cycles.

3. Special register can be included to facilitate the transfer of control between instructions within a program.

e.g.: status register is used to indicate conditions resulting from the execution of the previous program.

4. Transfer of control b/w the diff. Subprograms due to interrupts or subroutine calls and return is also facilitated by special registers.

- control is transferred by saving the current PSW (Program Status Word) in a designed location in MM and loading a new PSW into the CPU.

- uses stack

- Advantage of stack: handles repeated program transfers.

5. Facilities can be provided for the simultaneous processing of two or more distinct instructions.

- can be fetched simultaneously by extending the memory addressing circuit & adding sufficient buffer storage to CPU.

→ Typical CPU with general Register operation (fig)

- ALU stores/obtains most of its results/operands into / from this register set

- Special logic i's

→ Instruction Sets

~~The~~ or

→ Addressing Modes

- Operand associates with data X

## Orthogonality :

The ability to use all relevant addressing modes in a uniform and consistent way for all opcodes of an instruction set.

- Simplifies programming both by reducing the number of opcodes needed by simplifying the rules for operand address specification.
- Most computers have little orthogonality.

→ Aut

limita

Absolute Addressing : Simplest mode of (direct) address formation :

- Requires complex operand address to appear in the instruction operand field
- Real address is used without further modification to access the desired data item.

extra

requi

- Index

Numb

Relative Addressing :

- Partial addressing information is included in the instruction.
- Complete address must be constructed by CPU.
- Instruction also identifies storage locations

- Few

- Few

and

task

$R_1, R_2, \dots$

- effective address = 1's + (D, R<sub>1</sub>, R<sub>2</sub>, ..., R<sub>e</sub>)



- R acts as a base register (constant is called base address)

3. Facilities the process of indexed data.

- R is called the INDEX REGISTER.

- Indexed terms  $x(0), x(1), \dots, x(n)$  are stored in the consecutive addresses of Main Memory.

→ Auto-Increment and Auto-decrement.

Limitation:

- Extra logic OR and extra processing time required for address computation.
- Indexed items are frequently accessed sequentially.

Number of ADDRESSES:

- Fewer the address, the shorter the instruction.
- Fewer ~~less~~ more addresses means more primitive inst.
- and longer program required to perform any given task.

Polish Notation (Polish logician Jan Łukasiewicz)

Postfix :  $x \circ y \Rightarrow xy\circ$

Prefix :  $x \circ y \Rightarrow \circ xy$

Advantage: No requirement of parentheses.

$A(B+C) \Rightarrow ABC + *$

### OPCODE

- A fixed length field of  $k$  bits within each instruction permitting up to  $2^k$  distinct operations to be specified
- Number of registers are small, so only a few bits are required.

1. Destination Register
2. Source Register address

Example;

ADD R,

i.e  $A \leftarrow A+R$ ,  $R$  is any 8 bit register

ADI D,  $D$  is ~~be~~ a 8 bit word, 2 bytes long

JMP address,

$PC \leftarrow \text{Address}$ , 3 byte branch instruction

→ Some programs occupy a ...

## Hazard of calling function ??

- Assign short format to the most frequently used instruction, and the longest format to the least frequently used.
- The frequency with which specific instruction types occur can be determined.
- One can therefore attempt to base opcode lengths on them.

### Instruction Types :

- what type of instructions should be included in a general purpose processor's instruction set?
- A typical machine instruction defines one or two register transfer (micro) operations, and a sequence of such instructions is needed to implement a statement in a high level programming language.
- Due to complexity of operation, data type and syntax HLPL few successful attempts have been made to construct computers whose machine language directly corresponds to a H.L.P.L.  $\xrightarrow{\text{High Level Program}}$ .
- Thus there is a semantic gap b/w HLPL and (M.I.) that implements it, a gap that compiler must bridge.

### Requirements :

- 1) Complete :- One should be able to construct a machine language program to evaluate any func. that is computable using a reasonable amount of memory space.
- 2) Efficiency :- Frequently required functions can be performed rapidly using relatively few instructions.
- 3) Regular :- Instruction set should contain expected opcodes. and addressing modes (e.g left shift or right shift). The instruction set should

also be reasonably orthogonal with respect to the addressing modes.

- 4) Compatible : To reduce both n/w and s/w design costs, the instructions may be required to be compatible with those of existing machines.

completeness :

A function  $f(x)$  is defined to be computable if it can be evaluated in a finite number of steps by a Turing machine.

But real computers have finite amount of memory, and can be used to any computable function, at least to a reasonable degree of approximation.

- Turing machine has only four basic operations
- while simple instruction sets require simple, therefore inexpensive, logic circuits to implement them, they can lead to excessively complex programs.

Instructions are divided into five major types:

- 1) Logical Instruction : Include Boolean and other non-numerical operation.
- 2) Program Control Instructions : Such as branch instructions, which change the sequence in which program executes.
- 3) Arithmetic Instruction
- 4) Data Transfer Instruction
- 5) Input-Output (I/O) Instruction

These are not always mutually exclusive.

## RISC VS CISC

- No general agreement about what constitutes the appropriate size of a general purpose instruction sets.
- Early computers had small & simple instruction sets, forced by the need to minimize the amount of h/w.
- As h/w became cheaper, instruction set tends to increase both in number and complexity.
- Reduce the semantic gap.  
→ multi.

one  $I_F$  or  $P_F$  ?? A (slide)  $\rightarrow$   
complex instruction

→ Execution of  $P_F$  will generally be slower than that of  $I_F$  due to the fact that more time must be spent fetching instruction to  $P_F$ .

→ It occupies more memory space than  $I_F$

→ But the disadvantage of  $I_F$  is that it adds complexity to the processor.

(slide)  $\rightarrow$

(slide)  $\rightarrow$

## Reduced Instructions Set Computer (RISC)

Contains small and relatively simple instruction sets.

- 1) Few instruction type & addressing modes
- 2) Fixed and easily decoded instruction
- 3) Fast - single cycled
- 4) Hardwired rather than micro programmed control.
- 5) Memory access limited mainly to load & store options.
- 6) Use of compilers to optimize object code performance.

## Implementation (Design processor).

The design of circuits to implement an instruction set is an exercise in Register-Level Logic design.

- Necessary to devise an algorithm for appropriate register transfers or other micro operations.
- Also should balance ckt cost & execution speed.
- Common to choose algo that permit ok's used by instructions to be stored.

### Ans

- \* Implementation of two conditional branch instructions  
SZA (skip on zero accumulator)  
SNA (skip on non-zero accumulator),



### ALU design

Various ckt used in the execution of data processing instructions by a processor are usually merged into a single unit called ALU.

- fixed-point / floating point or both
- can have auxiliary unit called arithmetic co-processor

Structure of a basic fixed point ALU.

Flags = Underflow or overflow activated when

## For fixed point



- Multiplication and Division are done by sequential digit-by-digit shift-and-add / subtract algo.
- Three one-word registers: AC, MQ and DR
- MQ: capable of left and right shifting

- Parallel adder stores the input from AC and DR and places the results in AC
- DR also stores the multiplicand for division.

Usage of the Registers:

$$ADD: AC \leftarrow AC + DR$$

$$SUB: AC \leftarrow AC - DR$$

$$AND: AC \leftarrow AC \wedge DR$$

$$X-OR: AC \leftarrow AC \oplus DR$$

$$MULT: AC \cdot MQ \leftarrow DR \times MQ$$

$$DIV: AC \cdot CMQ \leftarrow MQ \div DR$$

$$OR: AC \leftarrow AC \vee DR$$

$$NOT: AC \leftarrow \bar{AC}$$

- Sometimes DR serves as a memory buffer register to store data, addressed by an instruction address field ADR.

Then DR  
one - add  
Bit - slice

- Feasible
- A single
- can be copied
- single
- Result  
becoz  
'slice'

Advantage  
Any  
selection

\* 16

- Rate

Flight

L

Then DR can be replaced by M(ADR), results in a one-address-memoy-referencing format.

Bit-Sliced ALU : (4 bit to 16 bit)

- Feasible to design an entire fixed-points ALU on a single IC chip. (4/8 bits)
- can be designed to be expandable. In that  $K$  copies of the ALU chip can be connected to form a single ALU of  $K m$ -bit words directly
- Resulting array-like circuit is called bit-sliced, bcoz each component chip processes an independent 'slice' of  $m$ -bits for each  $K m$ -bit operands.

Advantage.

Any desired word size can be handled efficiently by selecting the appropriate number of components (bit-slice).

\* 16 bit bit-sliced ALU composed of 4-4-bit slices

- ~~Data buses~~

Floating Point Arithmetic

$$x(x_M, x_E) \text{ i.e } x = x_M \times B^{x_E}$$

$$\text{LW } x_M = n_M - b^{\text{int}}$$

$$x * y = (x_M * y_M) \times 2^{x_E + y_E}$$

$$x / y = (x_M / y_M) \times 2^{x_E - y_E}$$

$$x + y = (x_M 2^{x_E - y_E} + y_M) \times 2^{y_E}$$

$$x - y = (x_M 2^{x_E - y_E} - y_M) \times 2^{y_E}$$

| in - magnitude  
 | 2's comp  
 | 1's comp

- \* and / are relatively simple, since the mantissas and exponents can be processed independently.

Floating point



Steps :

- Compute  $(Y_E - X_E)$
- Shift  $X_M (Y_E - X_E)$  places to form  $X_M 2^{X_E - Y_E}$
- Compute  $(X_M 2^{X_E - Y_E} \pm Y_M)$   
(a fixed point + or -)

- \* Data processing part of a single floating point arithmetic unit



Drawback: It can be used for  $n$ -bit floating-point operations only, even though it has most of the facilities required for  $n$ -bit fixed-point operations.

Since all computers with floating-point instruction also have fixed-point instructions, it is often desirable to design a single unit for both types of instruction (a bit difficult (Ck+)).

### Arithmetic Processor

- CPU devotes a considerable amount of control and data processing h/w for non-arithmetic functions.
- Such CPU must rely on slower s/w routines.
- $\therefore$ , h/w costs due to IC count & chip area for implementing complex arithmetic operations often prevent their inclusion in the CPU's instruction set.

- Alternatively, a processor is devoted exclusively to arithmetic functions, so a full range of numerical operations can be implemented in h/w at relatively low cost e.g. in a single chip as aux

Two ways to implement it.

1. Peripheral processor: It can be treated as a peripheral or I/O device to which the CPU sends data and instructions & from which it receives results.
2. Co-processor: It is closely coupled to the CPU so that its instructions and register set are extensions to those of the CPU. The CPU's instruction set has a special subset of opcodes reserved for the auxiliary processors.