

①



\* For a 4 bit number

\*  $\text{SUB} = 1$  to subtract,  $\text{SUB} = 0$  to add

Full Adder with Subtraction @ Gate Level:



Runtime:  $O(n)$



Partial Full Adder @ Gate Level:

$G = A \cdot B$  if  $C_0$  is generated

$P = A + B$  if  $C_i$  is passed to  $C_0$



\* forgot to add subtractor, just swap  $B_i \rightarrow \text{SUB}$  - input  
and make  $C_0 = \text{SUB}$

AND  $C_i = G_{i-1} + P_{i-1} C_{\text{out}, i-1}$

②



cache holds recently accessed information from RAM/HD to reduce fetching time and speed up processes

Three Methods:  
① Direct mapping: maps to a fixed position in the cache

② Full association: maps to any position in the cache

③ Set association: can possibly exist in a range of locations in cache



Organization of a  $2M \times 32$  memory module using  $512 \text{ k} \times 8$  static memory chip



$A_{19}, A_{20}$  determine which rows are selected  
R/W common connection

(3)



### I/O Processor

- has direct memory access (DMA)
- combines I/O devices with memory
- replaces regular processor so it doesn't get interrupted

### Interrupts

- Daisy chaining priority method is used to determine the source of the interrupt
- Each device has an address and responds only if the data is specifically addressed to it
- Priority is established by including a serial connection of all devices the request interrupt
- Half an interrupt process if the new interrupt has an equal or higher priority



If  $P_1 = 1$  then  $P_0 = 0$  blocks the next device from acknowledge signal

(4) LOAD num0, R0

LOAD num1, R1

CLEAR R2

CLEAR R3

BRA([R0] < 2<sup>n-1</sup> - 1)

NOT R0

ADD R0, R0, #1

LOOP2

LOOP2 BRA([R1] < 2<sup>n-1</sup> - 1)

ADD R3, R3, #1

NOT R1

ADD R1, R1, #1

LOOP1

ADD R3, R3, #1

BRA([R0] ≥ [R2])

STORE R2, DIVIDE

LOOP1

SUB R0, R2, R0

ADD R2, R2, #1

BRA([R0] ≥ [R1])

BRA([R4] ≠ 1) LOOP3

NOT R3

ADD R3, R3, #1

LOOP3

STORE R3, DIVIDE

### MULTIPLY

LOAD NUM0, R0

LOAD NUM1, R1

BRA([R1] < 2<sup>n-1</sup> - 1) LOOP1

BRA([R0] < 2<sup>n-1</sup> - 1) LOOP2

NOT R0

ADD R0, R0, #1

NOT R1

ADD R1, R1, #1

BRA LOOP1

LOOP1

ADD R0, R0, R0

SUB R1, R1, #1

BRA([R1] > 0)

LOOP2

SWAP R0, R1

BRA LOOP1

### (5) INSTRUCTION PIPELINE

a) instruction fetch: instruction is fetched from RAM

b) instruction decode: instruction is decoded

c) operand fetch: operands are fetched

d) execute: decoded instruction is executed

e) write back: result is written to processor register or memory

\* Pipelining increases the number of instructions  
that can be sent at once

| Instruction Number | PIPELINE STAGE |    |    |    |    |    |     |
|--------------------|----------------|----|----|----|----|----|-----|
|                    | 1              | IF | ID | OF | EX | WB |     |
| 2                  |                | IF | ID | OF | EX | WB |     |
| 3                  |                |    | IF | ID | OF | EX | WB  |
| 4                  |                |    |    | IF | ID | OF | EX  |
| 5                  |                |    |    |    | IF | ID | OF  |
| Clock cycle        |                | 1  | 2  | 3  | 4  | 5  | 6 7 |

## ⑥ OUT-OF-ORDER EXECUTION

- processor can execute instruction in an order governed by availability of data, rather than their original order in a program
- if an operand is still unavailable during the clock cycle, the processor does not stall

Ex:

- 1) Instruction fetch
- 2) Instruction dispatch to instruction queue
- 3) Instruction waits in queue until its input operands are available. Instruction is then allowed to leave the queue before older instructions
- 4) Instructions are issued to appropriate functional unit and executed
- 5) Results are queued
- 6) Only after all older instructions have their results written back to the register file then is this result written back to the register file