

# Computer Hardware Cheat sheet

Multi-input XOR8  $\eta(1) \in 2^{N+1} \rightarrow$  output is 1.

## Flip Flops:

### SRFF

| $Q^t$ | S | R | $Q^{t+1}$ |
|-------|---|---|-----------|
| 0     | 0 | 0 | 0         |
| 0     | 0 | 1 | 0         |
| 0     | 1 | 0 | 1         |
| 0     | 1 | 1 | X         |
| 1     | 0 | 0 | 1         |
| 1     | 0 | 1 | 0         |
| 1     | 1 | 0 | 1         |
| 1     | 1 | 1 | K         |



$S=R=0 \rightarrow$  Keep  $Q^t$

$$Q(t+1) = Q(t) \cdot \overline{R(t)} + S(t)$$

| $Q^t$ | $Q^{t+1}$ | 00 | 01 | 10 | 11 |
|-------|-----------|----|----|----|----|
| S     | R         | 00 | 01 | 10 | 11 |
| 0     | 0         | 00 | 01 | 10 | 11 |
| 0     | 1         | 00 | 01 | 10 | 11 |
| 1     | 0         | 00 | 01 | 10 | 11 |
| 1     | 1         | 00 | 01 | 10 | 11 |

$$Q^{t+1} = S + \overline{R} \cdot Q^t$$

### JKFF

| $Q^t$ | J | K | $Q^{t+1}$ |
|-------|---|---|-----------|
| 0     | 0 | 0 | 0         |
| 0     | 0 | 1 | 0         |
| 0     | 1 | 0 | 1         |
| 0     | 1 | 1 | 1         |
| 1     | 0 | 0 | 1         |
| 1     | 0 | 1 | 0         |
| 1     | 1 | 0 | 1         |
| 1     | 1 | 1 | 0         |

$$Q^{t+1} = Q^t \cdot \overline{K(t)} + Q^t \cdot J(t)$$

$J=K=0 \rightarrow$  Keep  $Q^t$

$J=K=1 \rightarrow \neg Q^t$

| $Q^t$ | $Q^{t+1}$ | 00 | 01 | 11 | 10 |
|-------|-----------|----|----|----|----|
| S     | R         | 00 | 01 | 11 | 10 |
| 0     | 0         | 00 | 01 | 11 | 10 |
| 0     | 1         | 00 | 01 | 11 | 10 |
| 1     | 0         | 00 | 01 | 11 | 10 |
| 1     | 1         | 00 | 01 | 11 | 10 |

$$Q^{t+1} = J \cdot Q^t + \overline{K} \cdot \overline{Q^t}$$

### DFF

| $Q^t$ | D | $Q^{t+1}$ |
|-------|---|-----------|
| 0     | 0 | 0         |
| 0     | 1 | 1         |
| 1     | 0 | 0         |
| 1     | 1 | 1         |

$$Q^{t+1} = D(t)$$

| $Q^t$ | $Q^{t+1}$ | 00 | 01 | 11 | 10 |
|-------|-----------|----|----|----|----|
| D     | D         | 00 | 01 | 11 | 10 |

| $Q^t$ | D | $Q^{t+1}$ |
|-------|---|-----------|
| 0     | 0 | 0         |
| 1     | 0 | 1         |

$T=0 \rightarrow$  Keep  $Q^t$

$T=1 \rightarrow \overline{Q^t}$

| $Q^t$ | T | $Q^{t+1}$ | 00 | 01 | 11 | 10 |
|-------|---|-----------|----|----|----|----|
| T     | T | 0         | 01 | 10 | 11 | 10 |

| $Q^t$ | T | $Q^{t+1}$ |
|-------|---|-----------|
| 0     | 0 | 0         |
| 1     | 0 | 1         |

$$Q^{t+1} = Q^t \oplus T$$

## Counters:

Asynchronous: Output is free from the clock signal.



This example can count 15 to 0 to 0.

If input were  $\overline{Q}$ , example counts 0 to 15.

## Synchronous:



## Registers:

Register stores a vector of binary values.



## Shift Register with Parallel Load:

| Shift | Load | operation          |
|-------|------|--------------------|
| 0     | 0    | No change          |
| 0     | 1    | Load Parallel Data |
| 1     | X    | Shift Left         |



## Bidirectional Shift Register:

| $S_1$ | $S_0$ | operation     |
|-------|-------|---------------|
| 0     | 0     | No change     |
| 0     | 1     | Shift Left    |
| 1     | 0     | Shift Right   |
| 1     | 1     | Parallel Load |



## Register Transfer Operations

The movement and processing of data stored in registers.

There are three basic components

- set of registers
- operations
- control of operations

And the system partitioned into 2 types of modules.

→ Datapath performs data processing opo

→ Control Unit determines sequence of opo

Datapaths are defined by their registers and the operations performed on binary data stored in the registers.



## Conditional Transfer



## Register Transfer Structures

### Dedicated-Multiplexer-Based



### Select

$$R_0 \leftarrow R_2 \iff \begin{matrix} S_0 & S_1 & S_2 \\ 0 & X & X \end{matrix}$$

$$R_2 \leftarrow R_1, R_1 \leftarrow R_0 \iff \begin{matrix} X & 0 & 1 \end{matrix}$$

### Load

$$\begin{matrix} L_0 & L_1 & L_2 \\ 1 & 0 & 0 \end{matrix}$$

$$0 \ 1 \ 1$$

### MUX-Based



## Multiplexer Bus



single bus

| Select      | Load              |
|-------------|-------------------|
| $S_1 \ S_0$ | $L_2 \ L_1 \ L_0$ |
| $1 \ 0$     | $0 \ 0 \ 1$       |
| $0 \ 1$     | $1 \ 0 \ 1$       |

$R_0 \leftarrow R_2 \iff 1 \ 0$

$R_0 \leftarrow R_1, R_2 \leftarrow R_1 \iff 0 \ 1$

$R_0 \leftarrow R_1, R_1 \leftarrow R_0 \iff \text{impossible}$

## Three State Bus

The 3-input MUX can be replaced by a 3-state bus and 3-state buffers.



| EN <sub>0</sub> | EN <sub>1</sub> | IN <sub>1</sub> | IN <sub>0</sub> | OUT            |
|-----------------|-----------------|-----------------|-----------------|----------------|
| 0               | 0               | X               | X               | H <sub>z</sub> |
| 0               | 1               | X               | 0               | 0              |
| 1               | 0               | X               | 1               | 1              |
| 1               | 0               | 0               | X               | 0              |
| 1               | 1               | 0               | X               | 1              |
| 1               | 1               | 0               | 0               | 0              |
| 1               | 1               | 0               | 1               | 1              |
| 1               | 1               | 1               | 1               | 1              |

→ cost is further reduced, but transfers are limited.

→ Characterized the simultaneous transfers possible with this structure

→ Characterized the cost savings and comparison

Registers with  
Bidirectional I/O Lines



## A BCD Counter Example



Counts [9, 14]

## Memory Basics



## RAM Example 1

- Design 256Kx8 RAM by using 64Kx8 RAM



## RAM Example 2

- How many 128Kx16 RAM chips are needed to provide a memory capacity of 1MB?

$$1\text{MB} = n \cdot 16\text{bit} \rightarrow n = 512\text{ K}$$

$$\frac{512\text{K} \times 16}{128\text{K} \times 16} = 4 \text{ piece of } 128\text{K} \times 16 \text{ RAM}$$



## Design with decoder

(A<sub>16</sub>, ..., A<sub>0</sub>) 17 address lines



## Programmable Logic Devices (PLD)

- PLD's are fabricated with structures that
- implement logic functions
  - used to control connections to store information specifying the actual logic functions
- ROM
  - PAL
  - PLA
  - CPLD/FPGA

## Prom (Programmable Read Only Memory)



## RDM Example



## Rom Example

- implementing a combinational Logic Circuit using ROM. The circuit accepts a 3 bit number and generates an output binary number equal to the squares of input numbers



| $A_2$ | $A_1$ | $A_0$ | $S_5$ | $S_4$ | $S_3$ | $S_2$ | $S_1$ | $S_0$ |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| 0     | 0     | 0     | 0     | 0     | 0     | 0     | 0     | 0     |
| 0     | 0     | 1     | 0     | 0     | 0     | 0     | 0     | 1     |
| 0     | 1     | 0     | 0     | 0     | 0     | 1     | 0     | 0     |
| 0     | 1     | 1     | 0     | 0     | 1     | 0     | 0     | 1     |
| 1     | 0     | 0     | 0     | 1     | 0     | 0     | 0     | 0     |
| 1     | 0     | 1     | 0     | 1     | 1     | 0     | 0     | 1     |
| 1     | 1     | 0     | 1     | 0     | 0     | 1     | 0     | 0     |
| 1     | 1     | 1     | 1     | 1     | 0     | 0     | 0     | 1     |



## PLA (Programmable Logic Array):



Compared to ROM, PLI is the most flexible having a programmable set of add's combined with a programmable set of ORs.

## Advantages :

- PLA can have large K and M permitting implementation of equations that are impractical for a ROM
  - PLA has all of its product terms connectable to all outputs, overcoming the problem of the limited inputs to the PAL ORs.
  - Some PLAs have outputs that can be complemented.

## Disadvantages:

- Often the product terms count limits the application of PLA.
  - Two level multiple output optimization is required to reduce the number of product terms in an implementation.

## PLA Example

- Implement the following two boolean functions with a PLA.

$$F_1 = \sum_m (0, 1, 2, 4), \quad F_2 = \sum_m (0, 5, 6, 7) \text{ for } (A, B, C)$$



$$F_1 = \bar{A}\bar{B} + \bar{B}\bar{C} + \bar{A}\bar{C}$$

$$\bar{F}_1 = BC + \underline{AC} + \underline{AB}$$

| $F_2$ | BC | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|----|
| A     | 0  | ①  | 0  | 0  | 0  |
| 1     | 0  | ①  | ①  | ①  | 1  |

$$F_2 = \underline{AC} + \underline{AB} + \bar{A} \bar{B} \bar{C}$$

$$\overline{F_2} = \overline{A}C + \overline{A}B + A\overline{B}\overline{C}$$

→ Most common elements

## PLA table

| <u>Product Term</u> | <u>Inputs</u> |   |   | <u>Outputs</u> |         |
|---------------------|---------------|---|---|----------------|---------|
|                     | A             | B | C | $F_1^C$        | $F_2^T$ |
| BC                  | -             | 1 | 1 | 1              | -       |
| AC                  | 1             | - | 1 | 1              | 1       |
| AB                  | 1             | 1 | - | 1              | 1       |
| $\bar{ABC}$         | 0             | 0 | 0 | -              | 1       |





## PAL (Programmable Array Logic):



PAL is opposite of ROM.

### Advantages:

- For given internal complexity, PAL can have larger K and no
- Some PAL's have outputs that can be complemented

### Disadvantages:

- ROM guaranteed to implement any  $n$  functions of  $K$  inputs, PAL may have too few inputs to the OR gates.

## PAL Example

$$W(A, B, C, D) = \sum_m^1 (2, 12, 13)$$

$$X(A, B, C, D) = \sum_m^1 (7, 8, 9, 10, 11, 12, 13, 14, 15)$$

$$Y(A, B, C, D) = \sum_m^1 (0, 2, 3, 4, 5, 6, 7, 8, 10, 11, 15)$$

$$Z(A, B, C, D) = \sum_m^1 (1, 2, 8, 12, 13)$$

Simplifying after Karnaugh Mapoooo

$$W = A_1 B_1 \bar{C} + \bar{A}_1 \bar{B}_1 C \bar{D}$$

$$X = \bar{A}_1 + B_1 C \bar{D}$$

$$Y = \bar{A}_1 B_1 + C \bar{D} + \bar{B}_1 \bar{D}$$

$$Z = A_1 B_1 \bar{C} + \bar{A}_1 \bar{B}_1 C \bar{D} + A_1 \bar{C} \bar{D} + \bar{A}_1 \bar{B}_1 \bar{C} \bar{D} = W + A_1 \bar{C} \bar{D} + \bar{A}_1 \bar{B}_1 \bar{C} \bar{D}$$



## Computer Design Basics



## Arithmetic Logic Unit (ALU):

Decompose the ALU into:

- An arithmetic circuit
- A logic circuit
- A selector to pick between two circuits

### Arithmetic Circuit Design:

- An  $n$ -bit parallel adder
- A block of logic that selects four choices for the B input to the adder.

## Arithmetic Circuit Design:

| Select      | Input     |
|-------------|-----------|
| $S_1 \ S_0$ | $y$       |
| 0 0         | all 0's   |
| 0 1         | B         |
| 1 0         | $\bar{B}$ |
| 1 1         | all 1's   |



B input logic

| $S_1 \ S_0$ | $B_C$ | $y_B$ |
|-------------|-------|-------|
| 0 0         | 0     | 0     |
| 0 0         | 1     | 0     |
| 0 1         | 0     | 0     |
| 0 1         | 1     | 1     |
| 1 0         | 0     | 1     |
| 1 0         | 1     | 0     |
| 1 1         | 0     | 0     |
| 1 1         | 1     | 1     |

| $S_1 \ S_0 \ B_C$ | $y_B$ |
|-------------------|-------|
| 0 0 0             | 0     |
| 0 0 1             | 0     |
| 0 1 0             | 0     |
| 0 1 1             | 1     |
| 1 0 0             | 1     |
| 1 0 1             | 0     |
| 1 1 0             | 1     |
| 1 1 1             | 1     |

$y_B = S_0 \cdot B_C + S_1 \cdot \bar{B}_C$

## Arithmetic Circuit Design Example

| $S_1 \ S_0$ | $C_{in}=0$    | $C_{in}=1$      |
|-------------|---------------|-----------------|
| 0 0         | $G=A+B$       | $G=A+\bar{B}+1$ |
| 0 1         | $G=\bar{A}+B$ | $G=\bar{A}+B+1$ |
| 1 0         | $G=A-1$       | $G=\bar{A}+1$   |
| 1 1         | $G=\bar{A}$   | $G=\bar{A}+1$   |



| $S_1 \ S_0 \ A_C$ | $X_C$ |
|-------------------|-------|
| 0 0 0             | 0     |
| 0 0 1             | 1     |
| 0 1 0             | 1     |
| 0 1 1             | 0     |
| 1 0 0             | 0     |
| 1 0 1             | 1     |
| 1 1 0             | 1     |
| 1 1 1             | 0     |

$$\rightarrow X = S_0 \oplus A$$

| $C_{in}$ | $S_1$ | $S_0$ | $B_C$ | $y_B$ |
|----------|-------|-------|-------|-------|
| 0        | 0     | 0     | 0     | 0     |
| 0        | 0     | 0     | 1     | 1     |
| 0        | 0     | 1     | 0     | 0     |
| 0        | 0     | 1     | 1     | 1     |
| 0        | 1     | 0     | 0     | 1     |
| 0        | 1     | 0     | 1     | 0     |
| 0        | 1     | 1     | 0     | 0     |
| 1        | 0     | 0     | 0     | 1     |
| 1        | 0     | 0     | 1     | 0     |
| 1        | 0     | 1     | 0     | 0     |
| 1        | 0     | 1     | 1     | 1     |
| 1        | 1     | 0     | 0     | 0     |
| 1        | 1     | 0     | 1     | 0     |
| 1        | 1     | 1     | 0     | 0     |
| 1        | 1     | 1     | 1     | 1     |



## Arithmetic Circuit Design

| $S_1 \ S_0$ | $C_{in}=0$    | $C_{in}=1$      |
|-------------|---------------|-----------------|
| 0 0         | $G=A+B$       | $G=A+\bar{B}+1$ |
| 0 1         | $G=\bar{A}$   | $G=\bar{A}+1$   |
| 1 0         | $G=\bar{B}$   | $G=\bar{B}$     |
| 1 1         | $G=A+\bar{B}$ | $G=A+\bar{B}+1$ |

| $S_1 \ S_0$ | $C_{in}=0$ | $C_{in}=1$ |
|-------------|------------|------------|
| 0 0         | A B        | A B        |
| 0 1         | A 0        | A 0        |
| 1 0         | 0 B        | A all 1's  |
| 1 1         | A B        | A B        |

| $C_{in}$ | $S_1$ | $S_0$ | $X$ |
|----------|-------|-------|-----|
| 0        | 0     | 0     | A   |
| 0        | 0     | 1     | A   |
| 0        | 1     | 0     | 0   |
| 0        | 1     | 1     | A   |
| 1        | 0     | 0     | A   |
| 1        | 0     | 1     | A   |
| 1        | 1     | 0     | A   |
| 1        | 1     | 1     | A   |

$$\rightarrow \text{Karnaugh} \rightarrow X = S_1 A + S_0 A + C_{in} A$$

## Instruction Example



### Control Word Encoding

$M=3$



### Function Select

|                            |      |                           |      |
|----------------------------|------|---------------------------|------|
| $F \leftarrow A$           | 0000 | $F \leftarrow A \wedge B$ | 1000 |
| $F \leftarrow A+1$         | 0001 | $F \leftarrow A \vee B$   | 1001 |
| $F \leftarrow A+B$         | 0010 | $F \leftarrow A \oplus B$ | 1010 |
| $F \leftarrow A+B+1$       | 0011 | $F \leftarrow \bar{A}$    | 1011 |
| $F \leftarrow A \cdot B+1$ | 0101 | $F \leftarrow B$          | 1100 |
| $F \leftarrow A \cdot B+1$ | 0110 | $F \leftarrow sR_B$       | 1101 |
| $F \leftarrow A$           | 0111 | $F \leftarrow s1B$        | 1110 |

### Microoperations

|                            | DA        | RA        | BA        | MB         | FS            | MD      | RW       |
|----------------------------|-----------|-----------|-----------|------------|---------------|---------|----------|
| $R_1 \leftarrow R_2 + R_3$ | R1<br>001 | R2<br>010 | R3<br>011 | Reg<br>0   | A+B+1<br>0101 | F<br>0  | WR<br>1  |
| $R_4 \leftarrow SCLR_6$    | R4<br>100 | -         | R6<br>110 | Reg<br>0   | SE<br>1110    | F<br>0  | WR<br>1  |
| $R_7 \leftarrow R_7 + 1$   | R7<br>111 | R7<br>111 | -         | -          | INC<br>0001   | F<br>0  | WR<br>1  |
| $R_1 \leftarrow R_0 + 2$   | R1<br>001 | R0<br>000 | -         | const<br>1 | A+B<br>0010   | F<br>0  | WR<br>0  |
| $Data.out \leftarrow R_3$  | -         | -         | R3<br>011 | Reg<br>0   | -             | -       | NWR<br>0 |
| $R_4 \leftarrow Data.in$   | R4<br>100 | -         | -         | Reg<br>0   | -             | D1<br>1 | WR<br>1  |
| $R_5 \leftarrow \emptyset$ | RS<br>101 | RS<br>101 | RS<br>101 | Reg<br>0   | A⊕B<br>1010   | F<br>0  | WR<br>1  |

## Instructor Set Architecture (ISA) for Simple Computer (SC)

- A programmable system uses a sequence of instructions to control its operation.
- An typical instruction specifies
  - Operation to be performed
  - Operands to use, and
  - Where to place the result, or
  - Which instruction to execute next.
- Instructions are stored in RAM or ROM.
- The addresses for instructions in a computer are provided by a program counter (PC) that can:
  - Count up
  - Load a new address based on an instruction and, optionally status information.
- The PC and associated control unit are part of the Control Unit.



- A instruction consists of a bit vector.
- The fields of an instruction are subvectors representing specific functions.
- The format of an instruction defines the subvector and their functions.
- The SC ISA contains three formats:

|        |                     |          |               |
|--------|---------------------|----------|---------------|
| Opcode | Destination Reg     | Source A | Source B      |
| DR     | (1) Register        | SA       | SB            |
| opcode | Destination Reg     | source A | Operand       |
| DR     | (2) Immediate       | SA       | OP            |
| opcode | Address Left        | Source A | Address Right |
| AD     | (3) Jump and Branch | SA       | AD            |

Example for 1 :  $R_1 \leftarrow R_2 + R_3$ ,  $R_1 \leftarrow S1 R_2$

Example for 2 :  $R_1 \leftarrow R_2 + 3$

Example for 3 : if  $R_6 == 0$

$PC \leftarrow PC - 20$

## Control Unit

- The data memory has been attached to the Address Out and Data Out and Data In lines of the Datapath.
- The MW input to the Data memory is the Memory Write signal from the Control Unit.
- The Instruction Memory address input is provided by the PC and its instruction output feeds the Instruction Decoder.
- The PC is controlled by Branch Control logic.

## Instruction Example

JB | 1 1 1 1 0  
PL | 0 0 0 0 1

MW | 0 0 0 0 0

RW | 1 1 1 1 0

MD | 1 1 0 0 1

FS | 1010 0000 0010 1110 1

MS | 0 0 0 0 1

BA | 011 100 1 10 100

RA | 111 1 101 1 1

DA | 000 001 010 011 1

$R[0] = R[2] \oplus R[3]$   
 $R[1] \leftarrow M[R[4]]$   
 $R[2] \leftarrow R[5] + 2$   
 $R[3] \leftarrow S[R[6]]$   
 if ( $R[2] = 0$ ) :  
 $PC \leftarrow PC + 8$   
 else:  
 $PC \leftarrow PC + 1$

| PC Operation       | PL | JB | BC |
|--------------------|----|----|----|
| Count UP           | 0  | X  | X  |
| JUMP               | 1  | 1  | X  |
| Branch on Negative | 1  | 0  | 1  |
| Branch on Zero     | 1  | 0  | 0  |

## Single-Cycle Computer Issues

- Complexity of instructions executable in a single cycle is limited.
  - Accessing both an instruction and data from a single memory is impossible.
  - A long worst case delay path limits clock frequency and the rate of performing instructions.
- How to handle?
- The first two issues can be handled by the multiple-cycle computer.
  - The third issue is dealt with by pipelining.

## Multiple-Cycle Computers

- data  
path  
modifi-  
cations
- Uses a single memory for both instructions and data.
  - Requires new MUX M1 with control signal M11 to select the instruction address from the PC.
  - Register file becomes 16x16
  - Addresses to Register File increase from 3 to 4.
  - Three new control fields for register address source selection and temporary storage addressing: AX, BX, DX

- control  
unit  
modifi-  
cations
- Requires an Instruction Register (IR) to hold the instruction
  - Requires the addition of a "hold" operation to the PC since it only counts up to obtain a new instruction.

## Encoding For Datapath Control

|    |               |               |
|----|---------------|---------------|
| R9 | 1100          | 1100          |
| M0 | 0             | 0             |
| F2 | 00            | 000           |
| M3 | 0             | 0             |
| D1 | 111           | 1             |
| AX | 1000          | 1100          |
| BX | 1001          | 1011          |
|    | R9 ← R8 + R15 | R11 ← R12 + 1 |

# Instruction Set Architecture

## Example ISAs:

- RISC (Reduced Instruction Set Computer)
  - IBM PowerPC
  - Sun Sparc etc
- CISC (Complex Instruction Set Computer)
  - Intel x86
  - Motorola 68000
- VLIW (Very Large Instruction Word)
  - Intel Itanium

- A processor is specified completely by its instruction set architecture.
- Each ISA will have a variety of instructions and instruction formats, which will be interpreted by the processor's control unit and executed in the processor's datapath.



## Register Set:

- Programmer accessible registers (R0 to R7 in multi-cycle computer)
- Registers in the register file accessible only to microprograms (R8 to R17)
- Instruction register (IR)
- Program Counter (PC)
- Pipeline register /
- Stack pointer (SP)

## Operand Addressing:

Operand = register value, memory context, or immediate

**Explicit Address:** address field in the instruction

**Implied Address:** The location of operand is specified by the opcode or other operand address.

## Three Address Instructions

Example:  $X = (A+B)(C+D)$

$$\begin{array}{ll} \text{ADD } T_1, A, B & M[T_1] \leftarrow M[A] + M[B] \\ \text{ADD } T_2, C, D & M[T_2] \leftarrow M[C] + M[D] \\ \text{MUX } X, T_1, T_2 & M[X] \leftarrow M[T_1] \times M[T_2] \end{array}$$

OR

$$\begin{array}{ll} \text{ADD } R_1, A, B & R_1 \leftarrow M[A] + M[B] \\ \text{ADD } R_2, C, D & R_2 \leftarrow M[C] + M[D] \\ \text{MUX } X, R_1, R_2 & X \leftarrow R_1 \times R_2 \end{array}$$

+ : short program, 3 instructions

- : binary coded instruction requires more bits to specify three address.

## Two Address Instruction

Example:  $X = (A+B)(C+D)$

$$\begin{array}{l} \text{MOVE } T_1, A \\ \text{ADD } T_1, B \\ \text{MOVE } X, C \\ \text{ADD } X, D \\ \text{MUL } X, T_1 \end{array}$$

5 instructions

## One Address Instructions

Implied address = a register called an accumulator ACC for one operand and the result, single accumulator architecture

$$\begin{array}{ll} \text{LD } A & ACC \leftarrow M[A] \\ \text{ADD } B & ACC \leftarrow ACC + M[B] \\ \text{ST } X & M[X] \leftarrow ACC \\ \text{LD } C & ACC \leftarrow M[C] \\ \text{ADD } D & ACC \leftarrow ACC + M[D] \\ \text{MUX } X & ACC \leftarrow ACC \times M[X] \\ \text{ST } X & M[X] \leftarrow ACC \end{array}$$

7 instructions

All operations between the ACC and memory operands

## zero address Instructions

- use stack

PUSH A  
PUSH B  
ADD  
PUSH C  
PUSH D  
MUL  
POP X

- Data manipulation operations : between the stack elements
- Transfer operations : between the stack and memory

## Addressing Architecture :

Two kinds of addressing architectures :

- memory-to-memory
  - only one register - PC
  - All operands from memory, and results to memory
- register-to-register
  - restrict only one memory address to load/store types, all operations are between registers

## Addressing Modes :

- Implied mode : implied in opcode, such as stack and ACC.
- Immediate mode (operands) :  $a = 0x0801234$
- Register mode :  $a = R[b]$
- Register-Indirect mode :  $a = M[R[b]]$
- Direct addressing mode :  $a = M[0x0018df8]$
- Indirect addressing mode :  $a = M[M[0x0018df8]]$
- PC-relative addressing : offset + PC
- Indexed addressing :  $a = b[-1]$

| CISC               | RISC                                                                                               | RISC                   | RISC                 | RISC                                         |
|--------------------|----------------------------------------------------------------------------------------------------|------------------------|----------------------|----------------------------------------------|
| Memory Access      | restricted to load/store instructions, and data manipulation instructions are register-to-register | all of the same length | of different lengths | both clarity and compact                     |
| Addressing mode    | implied in number                                                                                  | instruction format     | instructions         | hardwired "by" throughput and fast execution |
| Instruction format | instruction format                                                                                 | instructions           | operations           | microprogrammed, facilitate compact          |
| Control            | control unit                                                                                       | control                | control              | programs and concise                         |

## Conditional Branching :

| Branch Condition         | Mnemonic | Test       |
|--------------------------|----------|------------|
| Branch if zero           | BZ       | Z=1        |
| Branch if not zero       | BNZ      | Z=0        |
| Branch if carry          | BC       | C=1        |
| Branch if no carry       | BNC      | C=0        |
| Branch if minus          | BN       | N=1        |
| Branch if plus           | BNN      | N=0        |
| Branch if overflow       | BV       | V=1        |
| Branch if no overflow    | BNV      | V=0        |
| • FOR Unsigned Numbers   |          |            |
| Branch Condition         | Mnemonic | Condition  |
| Branch if above          | BA       | A > B      |
| Branch if above or equal | BAE      | A ≥ B      |
| Branch if below          | BB       | A < B      |
| Branch if below or equal | BBE      | A ≤ B      |
| Branch if equal          | BE       | A = B      |
| Branch if not equal      | BNE      | A ≠ B      |
| C is borrow bit          |          |            |
| Branch Condition         | Mnemonic | Status Bit |
| Branch if above          | BA       | A > B      |
| Branch if above or equal | BAE      | A ≥ B      |
| Branch if below          | BB       | A < B      |
| Branch if below or equal | BBE      | A ≤ B      |
| Branch if equal          | BE       | A = B      |
| Branch if not equal      | BNE      | A ≠ B      |

## • For Signed Numbers

| <u>Branch Condition</u>    | <u>Mnemonic</u> | <u>Condition</u> | <u>Status Bits</u>     |
|----------------------------|-----------------|------------------|------------------------|
| Branch if greater          | BG              | $A > B$          | $(N \oplus V) + Z = 0$ |
| Branch if greater or equal | BGE             | $A \geq B$       | $(N \oplus V) = 0$     |
| Branch if less             | BL              | $A < B$          | $(N \oplus V) = 1$     |
| Branch if less or equal    | BLE             | $A \leq B$       | $(N \oplus V) + Z = 1$ |
| Branch if equal            | BE              | $A = B$          | $Z = 1$                |
| Branch if not equal        | BNE             | $A \neq B$       | $Z = 0$                |

## Interrupts:

External: Hard drive, mouse, keyboard, modem, printer

Internal: Overflow, divide by zero, invalid opcode, memory stack overflow, protection violation

Software: External or internal interrupts by inserting an instruction into the code.

## Floating Point Computation:

Unsigned  $\rightarrow 0$  to  $2^{n-1}$   
 2's complement  $\rightarrow -2^{n-1} + 0$  to  $2^{n-1} - 1$   
 1's complement  $\rightarrow -2^{n-1} + 1$  to  $2^{n-1} - 1$   
 BCD  $\rightarrow 0$  to  $10^{n/4} - 1$

## Real Scientific Notation

IEEE FpP  $\pm 1.m \times 2^{e-127}$



$$N = (-1)^S \cdot 2^{E-127} (1.m)$$

## Example 1

Convert the following decimal numbers to signed binary. Perform the arithmetic shift left and right operations and indicate the overflow bit.

| $(+82)$              | $(-75)$                                                              |
|----------------------|----------------------------------------------------------------------|
| $(+82)_2 = 00111110$ | $(-75)_2 = 01001011$<br>↓<br>$10110100$<br>+<br>$(-75)_2 = 10110101$ |

Shlq  $\rightarrow 01111100$ , Shrq  $\rightarrow 01101010$   $V=0$

Shra  $\rightarrow 00011111$ , Shr  $\rightarrow 11011010$  ↴ no overflow

$\rightarrow V=0$   
 ↴ No overflow



|                |                              |
|----------------|------------------------------|
| $A = 01010010$ | $V = 1, Z = 0, N = 1$        |
| $A = 11010001$ | True $(1 \oplus 1) + 0 = 0$  |
| $A = 11011110$ | False $(1 \oplus 0) + 0 = 1$ |
| $A = 01011110$ | True $(1 \oplus 0) = 1$      |
| $A = 10000001$ | False $(1 \oplus 1) = 0$     |

## Example 2

The program in a computer compares two signed 2's complement numbers A and B by performing subtraction A-B and updating status bits.

- Determine  $N, Z, V$

| <u>Mnemonic</u> | <u>Condition</u> | $A - B$                | $N, Z, V$          |
|-----------------|------------------|------------------------|--------------------|
| BL              | $A > B$          | $(N \oplus V) + Z = 0$ | $(N \oplus V) = 1$ |
| BGE             | $A \geq B$       | $(N \oplus V) = 0$     | $(N \oplus V) = 0$ |
| BL              | $A < B$          | $(N \oplus V) = 1$     | $(N \oplus V) = 1$ |
| BGE             | $A \leq B$       | $(N \oplus V) + Z = 1$ | $(N \oplus V) = 0$ |

## Processing External Interrupts:

- $SP \leftarrow SP - 1$  Decrement stack pointer
- $M[SP] \leftarrow PC$  Store return address on stack
- $SP \leftarrow SP - 1$  Decrement stack pointer
- $M[SP] \leftarrow PSR$  Store processor status word on stack
- $EI \leftarrow 0$  Reset enable interrupt flip flop
- $INTACK \leftarrow 1$  Enable interrupt acknowledgement
- $PC \leftarrow IVAD$  Transfer interrupt vector address to PC and GO to fetch.



## Input - Output

- keyboards
- displays
- printers
- magnetic devices
- compact disc read only memory
- modems, scanners, speakers, mice etc

## Sample - Peripherals:

- Devices that CPU controls directly are said to be connected online
- Input or outputs devices attached to the computer online are called peripherals.
  - Keyboard
  - Harddrive
  - Liquid Crystal Display Screen

## Keyboard:

- Scan matrix lies beneath the keys
- Whether a key depressed and released, the control code at the time of the event is sensed and is translated by the microcontroller into k-scan codes
- When a key depressed, a make code is produced, when a key is released a break code is produced



## Hard Drive:

- The hard drive is the primary intermediate speed nonvolatile, writable storage medium.
- Revolutions per Minutes (RPM): rotational speed of a disk
- Disk access time and transfer time

## LCD Screen:

- The display screen is defined in terms of picture element called pixels.
- The color display has three subpixels corresponding to the primary colors; red, green and blue (RGB)



## I/O Configurations

- Memory-mapped: Use common data, address, and control busses for both memory and I/O.
- Isolated: Share a common address and data bus but use different control lines (eg intel 8086)
  - IN, OUT instructions for Access I/O ports
  - MOVE instruction to access memory

### Example of I/O Interface



The address assigned to the four registers of the I/O interface on previous figure are equal to the binary equivalent of 240, 241, 242 and 243. Show the external circuit that must be connected between an 8-bit I/O address from the CPU and CS, RSO, RSI inputs of the interface.



## Example of I/O Interface

---

- | **Example:** Six interface units of the type shown in previous slide are connected to a CPU that uses an I/O address of eight bits. Each one of the six chip select (CS) inputs is connected to a different address line. Specifically, address line 0 is connected to the CS input of the first interface unit, and address line 5 is connected to the chip select input of the sixth interface unit.
  - | Address lines 7 and 6 are connected to the RS1 and RS0 inputs, respectively, of all six interface units. Determine the 8-bit address of each register in each interface ( a total of 24 addresses)
- 
- | Solution ???

6 interface  $\rightarrow$  A<sub>0</sub> to CS<sub>1</sub> A<sub>5</sub> to CS<sub>6</sub>

8 bit  $\rightarrow$  A<sub>7</sub> = R<sub>S1</sub>  
A<sub>8</sub> = R<sub>S0</sub>

(1) I/O Add

| <u>Dec</u> |
|------------|
| 1          |
| 65         |
| 129        |
| 193        |

I/O Add

| <u>Bin</u> |
|------------|
| 00 00000 1 |
| 01 00000 1 |
| 10 00000 1 |
| 11 00000 1 |

Reg

5

and

6

is sume

(2) I/O add

| <u>Dec</u> |
|------------|
| 2          |
| 66         |
| 130        |
| 194        |

I/O Add

| <u>Bin</u>  |
|-------------|
| 00 00000 10 |
| 01 00000 10 |
| 10 00000 10 |
| 11 00000 10 |

Reg

(3) I/O add

| <u>Dec</u> |
|------------|
| 4          |
| 68         |
| 132        |
| 196        |

I/O Add

| <u>Bin</u> |
|------------|
| 00 000 100 |
| 01 000 100 |
| 10 000 100 |
| 11 000 100 |

Reg

(4) I/O Add

| <u>Dec</u> |
|------------|
| 8          |
| 72         |
| 136        |
| 200        |

I/O Add

| <u>Bin</u>  |
|-------------|
| 00 000 1000 |
| 01 000 1000 |
| 10 000 1000 |
| 11 000 1000 |

Reg

## Strobings

The databus between two units is assumed to be made bidirectional by the use of three state buffers.



(a) destination-initiated transfer



(b) Source-initiated transfer

## Handshakings



## Serial Communications

### - Parallel Communication

- transfers more than one bit of data at a given time
- $N$ -bits transmitted at the same time through  $N$  wires
- faster but require many wires

### - Serial Communication

- cannot handle more than one bit
- slower but less wires

## Asynchronous Serial Communication

- interacts with devices outside of the computer
- Ex: modems, printers

- Transmits individual bytes instead of large blocks
- Do not share a common clock.
- (start bit)

## Synchronous Serial Communication

- Transmits block of data in frames

- Frames are had head in front of the data and a tail at the end of the data.
- The head and tail contain information that allows the two computers to synchronise their clocks.



## Asynchronous Serial Communication Example

- How many characters per second can be transmitted over a 57.600-baud line in each of the following modes? (Assume a character code of 8 bits)
  - Asynchronous serial transmission with odd-parity bit and two stop bits.
  - Draw a time diagram for sending character 'A'.

$$a') 1 \text{ start bit} + 8 \text{ data bits} + 1 \text{ parity bit} + 2 \text{ stop bits} = 12 \text{ bits}$$

$$57.600 \text{ bps}/12 \text{ bits} = 4.800 \text{ char per second.}$$

$$b') \text{ ASCII of 'A' is } (65)_{10} = (01000001)_2$$

$\begin{array}{c} 0 \\ \downarrow \\ \text{start bit} \end{array} \quad \begin{array}{c} 10000010 \\ \downarrow \quad \downarrow \\ \text{Data code} \quad \text{odd parity} \end{array} \quad \begin{array}{c} 11 \\ \downarrow \\ \text{stop bits} \end{array}$



## Universal Asynchronous Receiver/Transmitters (UARTS)

It is used to convert serial communication when receiving and convert parallel communication to serial when sending.

slow

## Modes of Transfer :

- Data transfer under program control.
- Interrupt-initiated data transfer
- Direct memory access (DMA) transfer  $\xrightarrow{\text{HDD?}}$
- Transfer through an I/O processor  $\xrightarrow{\text{mainframe}}$

## 1. Data Transfer Under Program Control

### I/O to CPU

- The device transfers bytes of data one at a time as they are available. When available, the device places it on the I/O bus and sets Ready.
- The interface accepts the byte into its register and sets Acknowledge.
- The interface sets a bit in the status register, which we will refer to as a flag. Device can now reset Ready, but it won't transfer another byte until Acknowledge is reset by the interface, according to the handshaking.



## 2-Interrupt Initiated Data Transfer

a) Daisy chain priority  
process priority  
(device)



if working speed ↑  
then priority ↑  
else priority ↓  
for

→ if request at same time

| PI | RF | PO | enable |
|----|----|----|--------|
| 0  | 0  | 0  | 0      |
| 0  | 1  | 0  | 0      |
| 1  | 0  | 1  | 0      |
| 1  | 1  | 0  | 1      |

## b) Parallel Priority Hardware



## Priority Encoder %

low priority (e.g. Keyboard)



4 different  
I/O device

| W <sub>3</sub> | W <sub>2</sub> | W <sub>1</sub> | W <sub>0</sub> | Y <sub>1</sub> | Y <sub>0</sub> | Z |
|----------------|----------------|----------------|----------------|----------------|----------------|---|
| 0              | 0              | 0              | 0              | —              | —              | 0 |
| 0              | 0              | 0              | 1              | 0              | 0              | 1 |
| 0              | 0              | 1              | —              | 0              | 1              | 1 |
| 0              | 1              | —              | —              | 1              | 0              | 1 |
| 1              | —              | —              | —              | 1              | 1              | 1 |

### 3-Direct Memory Access

- DMA improves system performance by speeding up data transfer between memory and I/O System.
- Bypass CPU, allow CPU to be used in another process.
- DMA controllers must manipulate each data transfer from I/O devices, and only can reads



### DMA Controllers



1- the starting address of the memory block in which data is available (for reading) or data is to be stored (for writing).

2- The word count, which is the number of words in the memory block.

3- A control bit to specify the mode of transfer, such as read or write.

4- A control bit to start the DMA transfer.

CPU is out of order when this transfer happens.



needs start control bit to transfer .

### Example for DMA Controller

- Why are the read and write control lines in a DMA controller bidirectional?

DMA controller needs to be configured before it is let loose to do the memory transfer. Registers in DMA controller need to be written to and read from, and hence for that purpose they are inputs, however when they are used by the DMA controller to do the DMA transfer, they are outputs.

# The Memory System



## Direct Cache Mapping:



Maps each block of main memory into only one possible cache line. If a line is previously taken up by a memory block when a new block needs to be loaded, the old block trashed.

2 Matching logic

## Fully Associative Mappings



randomly mapping - placed

cache hit → data found in cache

cache miss → data not found in cache

comparing what is needed at CPU and tag then matching.  
In this case we need 8 comparing circuits.

parallel comparing  
 $\hookrightarrow n$  (row) tag  
So it is computationally expensive.



## Set Associative Mapping:



common indexing. Capacity is 8 like fully associative but has two sets. No need to overwrite.

Matching logic and  $\frac{6}{n}$  XOR gate  
 $n$  (set)

## Principal of Locality:

- Cache work on a principle known as locality of reference. This principle states asserts that programs continually use and re-use the same locations.
  - Instructions: Loops, common subroutines
  - Data: look-up tables, arrays
- Temporal Locality is same location will be referenced again soon.
- Spatial Locality is nearby locations will be referenced soon.

## Direct Mapped Cache:

- in a direct-mapped cache, each memory address is associated with one possible block within the cache due to spatial locality.
  - We only need to look single location in cache for the data if exists.

## Example for Direct-Map Cache:

- Suppose we have a 16KB of data in a direct mapped cache with 4 word blocks. Determine size of the tag, index and offset fields if we are using a 32-bit architecture.
  - ↳ 1 word

- offset needs to specify correct byte within a block
  - block contains 4 words = 16 bytes =  $2^4$  bytes
  - needs 4 bits to specify correct byte
  - ↳ 2 bit byte offset
  - ↳ 2 bit word offset

- index needs to specify correct row in cache
  - cache contains 16KB =  $2^{14}$  bytes
  - block contains 24 bytes (4 words)
  - $2^{14}/2^4 = 2^{10}$
  - needs 10 bits to specify this many rows
- Tag:  $32 - 4 - 10 = 18$  bits

| address (hex) | value of word |
|---------------|---------------|
| 000000010     | a             |
| 000000014     | b             |
| 000000018     | c             |
| 00000001C     | d             |
| ---           |               |

|           |   |
|-----------|---|
| 000000080 | e |
| 000000084 | f |
| 000000088 | g |
| 00000008C | h |
| ---       |   |

|          |   |
|----------|---|
| 00008010 | i |
| 00008014 | j |
| 00008018 | k |
| 0000801C | l |
| ---      |   |

read 4 addresses

- 0x000000014
- 0x00000001C
- 0x000000034
- 0x00008014



00000000000000000000000000000000    0000000001 0100  
 00000000000000000000000000000000    0000000001 1100  
 00000000000000000000000000000000    0000000001 0100  
 0000000000000000000000010 0000000001 0100



## Average Access Time:

Average access time,  $t_a$ , given by:

$$t_a = t_c + (1-h)t_m$$

### Example

- if hit ratio is 0.85, main memory access time is 50 ns and cache access time is 5ns, what will be average access time?

$$5 + (1-0.85) * 50 = 12.5 \text{ ns}$$

### Example

- hit time = 1 cycle → cache  
 Miss rate = 5%  
 miss penalty = 20 cycles → main memory

Average memory access time is

$$1 + 0.05 \times 20 = 2 \text{ cycle}$$

### Example

- L1 Hit time = 1 cycle

L1 Miss rate = 5%

L2 Hit time = 5 cycle

L2 Miss rate = 15%

L2 miss penalty = 100 cycles

$$\rightarrow L_1 \text{ miss penalty} = 5 + 0.15 \times 100 = 20$$

$$\rightarrow \text{Average memory access} = 1 + 0.05 \times 20 = 2 \text{ cycle}$$