

classmate

Date 30/6/14

Page \_\_\_\_\_

CLASS NOTES

## Overview of Subject

## CO: Computer Organization

Books : ① Computer Architecture and Organization

- John P. Hayes  
3rd edn

## ② Computer System Architecture

- Morris Mano

③ William Stallings - Computer System Architecture.

## Chapters

1. History
  2. Design methodology — DDC concepts automatically included.

- ### 3. Process Basics



- |                                                                   |                                                     |
|-------------------------------------------------------------------|-----------------------------------------------------|
| 2nd session<br>{<br>4. Design of Datapath unit<br>5. Control Unit | Registers, ALU<br>Hardwired C.U.<br>Programmed C.U. |
|-------------------------------------------------------------------|-----------------------------------------------------|

*3rd Session*

|                        |                        |
|------------------------|------------------------|
| 6. Memory Organization | 7. System Organization |
|------------------------|------------------------|

# Chapter 1

## History

IR - Instruction Register

PC → Program Counter

DPU →



## Memory

### Various Terminologies

PCU → Program Control Unit.

DPU → Data Processing Unit.

The function of PC is  $\rightarrow$  fetch the instruction  
and decode that instruction.

AR → Address Register → stores the address of the current instruction.

MQ → Multiplier - quotient Register.

IR → Instruction Register

PC → Program Counter

DR → Data Register. ( it stores data temporarily ).

AC → Accumulator.

IBR → next-instruction buffer register.

\* PC is must because it stores the next instruction address.

\* opt code is stored in IR.

## System Bus

1. Address Bus [Unidirectional]
  2. Data bus [Bidirectional]
  3. Control Bus

IAS → Institute of Advanced Studies.



The IAS Computer →

The basic unit information in an IAS computer is a 40 bit word. (it is the standard packet information stored in memory location or transferred in one step between the CPU and the main memory M.)

- ✓ Each location in M is either a 40 bit single memory or else a pair of 20 bit instructions.

- ✓ Number format is fixed-point, contains implicit binary point in some fixed position.

Numbers are usually treated as signed binary fractions lying between -1 and +1, but they can also be interpreted as integers.

01101000.000 0000000000 0000000000  
0000000000 = +8.125

is a scaled number format.  
( no need to remember )

- ✓ 1 AS instruction consists of an 8 bit opcode (operation code) followed by a 12 bit address A that identifies one of up to  $2^{12} = 4\text{ k}$  40 bit words stored in M.

It thus has one-address instruction set format that we represent as

OP A.

\* EDVAC has 4-address instruction format.

- So IAS format saves memory space

Two key aspects of IAS's design that have been incorporated into all computers are:

- ① The CPU contains a small set of high-speed storage devices called registers, that serve as implicit storage spaces for operands and results.

- ② A program's instructions are stored in memory in approximately the same sequence in which they are executed.

- The address of the next instruction is usually that of the current instruction plus one.



OP stores the operation to be performed on the operands stored in A1 and A2. The result is stored in A3. A4 stores the address of the next instruction.

Depending on the variants, the inst. address  
is either in  $A_3$  or  $A_2$ .

Should give opp in a reasonable amount of time



Abstract computer what is a reasonable

- ✓ infinite memory comp?
- ✓ has sol<sup>n</sup> to only those for which algo. is given
- ✓ should process in a fixed count of time & should give opp. all possible sol<sup>n</sup> are not stored.
- infinite memory

Addition of 3 and 5.



$s_n \rightarrow$  present state of

$t_i \rightarrow$  input

$\theta_j \rightarrow$  output

$s_k \rightarrow$

- Start from left most 1 and go to the blank space.
- replace that blank space with a 1.
- has more left towards the first '1'.
- Remove that 1.
- sol<sup>n</sup> obtained.

Unsolvable problems → that can't be solved by any reasonable computer.

ex. Goldbach's conjecture → sum of any two prime numbers is always an even number.

→ It's proved to be true for each and every test case.

→ However, a generalized sol<sup>n</sup> (mathematical) has not been found or constructed.

Clock → SRAM  
Main memory → DRAM

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

Undecidable problems → Result is ambiguous.

Some inputs give output Yes and

some inputs give output No.

↳ Intractable problems → Problems that can't be solved using reasonable memory and and reasonable time i.e. intractable problems.

7/7/14

## Chapter 2

Computer → A system consisting of various components that are connected together, work in unison and give some op. (input-process-output)

every system can be presented using a graph.

graph → nodes/vertices (represent 'data')

graph → edges (represent data transfer)

Every system has some

✓ behaviour  
(for some specific I/P some spec. O/P is obtained)  
✓ structure (graphs)  
✓ truth table

Structure represented using graph ⇒



⇒ Individually looking either at the structure or behaviour, we can't figure out what the machine has been designed for.

So, Block diagram is used → in which we not only represent simple circles, but also give the proper names of nodes.

No design a Ckt →

1. Behaviour should be known.
2. Components available should be known & using which system is to be made].

✓ Thus, desirable system would be obtained.  
If not obtained, then we need to redesign our chrt. until successful.

✓ Simulators should be used instead of physically checking the circuitry each and every time.

Levels of design → There are 3 different levels of design.

- (1) Gate level
- (2) Register level
- (3) Processor level.

Processor level → - CPU, ALU, memory, I/O devices  
- is the lowest level.

(blocks of words) → - processing is carried from blocks of data instead of bit-by-bit  
- Also called 'System level'.

Register level → - registers, counter, multiplexers, decoders, encoders, PLA  
- processing is done word by word.

(words)

Gate level → - most primitive form.  
- done on individual gates.  
- flip-flops, logic gates.  
(bit) ← - processing done bit by bit  
- lowest level.

How actual designing process is carried out?

1. Designing should start at the processor level.



- Depending upon the behaviour, we need to design the (ppu).

Ex. for 4 bit processor,  
we need to go to register level.

next we need 4 bit registers, so we need 4 bit devices, (made at gate-level).

So, we need to go from

[ processor → register → gate level ]

| Adder →     |  | Behaviour |   |   | Adding two bits (A, B) |      |                        |
|-------------|--|-----------|---|---|------------------------|------|------------------------|
| Truth Table |  | A         | B | S | C                      | Half | - generate a sum bit   |
|             |  | 0         | 0 | 0 | 0                      |      |                        |
|             |  | 0         | 1 | 1 | 0                      |      |                        |
|             |  | 1         | 0 | 1 | 0                      |      |                        |
|             |  | 1         | 1 | 0 | 1                      | Full | { - generate carry bit |

Adding three bits (A, B, carry)  
{ - generate sum bit  
- generate carry bit

Ex



$\Rightarrow$   $a, b, c, d, e$  are stored in memory. ~~Load & Store~~ architecture is used. Only registers are used for computation. Intermediary results can not be stored in the memory. Only reg. are to be used for computation. Find the minimum no. of reg. reqd. to perform the above operations.

$$(c - a - b) + (e - (c + d))$$

$$\begin{array}{ccc} 1 & & \\ R_1 & R_2 & \\ \downarrow & & \downarrow \\ R_2 & & R_1 \\ | & & | \\ R_2 & & R_1 \end{array}$$

$\rightarrow$  3 minimum registers reqd. to perform the above operation.

Ex.

$$\begin{aligned} R_1 &\rightarrow M[2050] \\ M[2050] &\rightarrow R_2 \\ M[2050] &\rightarrow R_3 \end{aligned}$$

WAP

that gives the same op. however requires less time / clock cycles.

$\Rightarrow$

$$\begin{aligned} R_1 &\rightarrow M[2050] \\ \text{MOV } R_2, R_1 \\ \text{MOV } R_3, R_1 \end{aligned}$$

$$R_1 \rightarrow M[2050]$$

$$R_1 \rightarrow R_2$$

$$R_1 \rightarrow R_3 / R_2 \rightarrow R_3$$

faster (reg.) operations

(Ex).

There are 2 level cache memory. Access time of level 1 cache is 1 nanosecond. Access time of level 2 cache is  $10 \Delta$  that of main memory is 500 ns. Hit rate of level 1 and level 2 are 0.3 and 0.9 respectively. Find out average access time.

$\Rightarrow$

$$t_A = H t_{A1} + (1-H) (t_{A2} + t_{B2})$$

$$H = \frac{N_1}{N_1 + N_2}$$

$$(0.3 \times 1) + (0.02 \times 500) + (0.18 \times 10)$$

1

20% level 2 failure,  
90% + 2 success

### Truth Table for Full Adder

| A | B | Cin | Sum | Cout |
|---|---|-----|-----|------|
| 0 | 0 | 0   | 0   | 0    |
| 0 | 0 | 1   | 1   | 0    |
| 0 | 1 | 0   | 1   | 0    |
| 0 | 1 | 1   | 0   | 1    |
| 1 | 0 | 0   | 1   | 0    |
| 1 | 0 | 1   | 0   | 1    |
| 1 | 1 | 0   | 0   | 1    |
| 1 | 1 | 1   | 1   | 1    |

$$\text{Sum} \rightarrow A \oplus B \oplus \text{Cin}$$

$$\text{Carry} \rightarrow A \cdot B + (A \oplus B) \cdot \text{Cin}$$

Full adder circuit using decoder + multiplexers.



classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

S

C

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_



Carry



12/7/14

Characteristic table  
- for analysis

Excitation table  
- for designing

Q Design a counter that counts in odd sequence. 1, 3, 5, 7. using D-flip flop.

State diagram →



ABC  
001  
111  
101  
011

D<sub>A</sub> D<sub>B</sub> D<sub>C</sub>.  
0 1 1  
1 0 2  
1 1 1  
0 0 1



$$D_A = AB + \bar{A}B \\ = A \oplus B$$



$$y = 2^n - 1 - \bar{y}$$

Eg:  $y = 0110$   
 $\bar{y} = 1001$

$$2^n = 2^4 = 16 - 2 = 15$$

$$\begin{array}{r} 1111 \\ - 1001 \\ \hline 0110 \end{array}$$

Use adder ch. for comparison  $\Rightarrow$

(1)  $x > y$

$$x > y \Rightarrow x - y > 0$$

$$x - (2^n - 1 - \bar{y}) > 0$$

$$x + \bar{y} > 2^n - 1$$

$$(x > y) \Rightarrow [x + \bar{y} > 1111]$$



2.  $x \leq y$ ?

$$x \leq y$$

$$y - x > 0$$

$$y - (2^n - 1 - \bar{y}) > 0$$

$$y + \bar{y} > 2^n - 1$$

$$\bar{x} + y > 1111$$

$$\bar{x} \quad y$$



If  $\text{cout} \leq 1$ ,

then  $x \leq y$

$\text{cout}$

$$x_0 \quad x_1 \quad x_2 \quad x_3$$



3.  $x = y$ ?

AOB for each bit

AND each output bit

and check if equal

to 1.



Bus  $\rightarrow$  unidirectional (address bus)

Bus  $\rightarrow$  bidirectional (data bus)

- many shared components

can be used  
this bus.

4 registers  $\rightarrow$



Some synchronization is present between the reg.  
so as to transfer data to bus. (from them/to them.)



- Select lines common for all multiplexers
- only one reg gets selected at a time.

Tri-state buffer  $\Rightarrow$



If  $C = 1$ , then

| A | Y |
|---|---|
| 0 | 0 |
| 1 | 1 |

If  $C = 0$ , then output would be in high impedance state.

Open circuit

$\rightarrow$  Y would be disconnected from the ckt. It can be pulled 'low' or 'high'.



Encoder



priority

|         | $x_0$ | $x_1$ | $x_2$ | $x_3$ | $x_4$ | $x_5$ | $x_6$ | $x_7$ | A | B | C |
|---------|-------|-------|-------|-------|-------|-------|-------|-------|---|---|---|
| encoder | 1     | 0     | 0     | 0     | 0     | 0     | 0     | 0     | 0 | 0 | 0 |
|         | x     | 1     | 0     | 0     | 0     | 0     | 0     | 0     | 0 | 0 | 1 |
|         | x     | x     | 1     | 0     | 0     | 0     | 0     | 0     | 0 | 1 | 0 |
|         | x     | x     | x     | 1     | 0     | 0     | 0     | 0     | 0 | 1 | 1 |
|         | x     | x     | x     | x     | 1     | 0     | 0     | 0     | 1 | 0 | 0 |
|         | x     | x     | x     | x     | x     | 1     | 0     | 0     | 1 | 1 | 0 |
|         | x     | x     | x     | x     | x     | x     | 1     | 1     | 1 | 1 | 1 |

14/7/14

## Register Level Design

$$\text{Condition } z = f(x_1, x_2)$$



Multifunction ALU that can perform various ops.



CU - Reg to ALU  
CU - MUX to Select I/P

ALU to consists of adder, subtractor

→ All connected to a decoder

CU - ALU to Reg

Check  $x_m \& s_b$ .  
if it's 0, then  
do nothing,  
if it's 1, then  
add multiplicand  
to AL

1st { 0000  
0000  
0000  
0201  
0010  
0010  
0010  
0010

0000 1010  
0000 1010  
0101 0101  
0201 0101  
0010 1010  
0010 1010  
0010 0101  
0010 0101

0101

0101  
A  
Mux  
Multiplicand

Partial products

$$P_i = P_i + x_{n-i} y_M$$

$$P_{i+1} = 2^{-1} \cdot P_i \text{ (shift right)}$$

Algo. for mult

2 ' $n$ ' bits added  $\Rightarrow$  answer would be max. ( $n+1$ ) bits.

2 ' $n$ ' bits are multiplied  $\Rightarrow$  answer would be max.  $2n$  bits.

To obtain sign of answer, we can XOR the sign bits of the multiplier & multiplicand.

|     |     |     |
|-----|-----|-----|
| 0 0 | 0 - | +ve |
| 0 1 | + - |     |
| 1 0 | - - |     |
| 1 1 | 0 - |     |



- To design any processor
- first design the Algorithm
  - next based on the algorithm, list out the components reqd.
  - Once the components have been decided, ~~make~~ <sup>design</sup> the DPU.
  - Next step would be to design the CD.

Processor Level ⇒

- ✓ CPU
- ✓ IOP (processor)
- deal with Memory
- I/O devices
- ✓ Interconnection
- do as not to interrupt the CPU.



- Usually processor level components are asynchronous
- ⇒ Components are highly interdependent
- If all of them are synchronized, then there would be disparity between their operating speeds.

## Data Representation

19/7/14

Memory → stores the opcodes, not the instruction.

Instruction

MOV A, B

Opcode

06 (say)



183

$\rightarrow 1.83 \times 10^2$

$\rightarrow 18.3 \times 10^1$

(18.3, 1)

represented as  $(M, E)$

↑

Exponent  
Non-fractional part of the significand

Instruction

Instruction size

MOV A, B

1 byte

MVI ~~A~~ A, 06

2 bytes

# LDA 20 30

3 bytes

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

Q How to store data in memory?

→ Can be stored in 2 ways →

little endian



MSB → higher add.

LSB → smaller add.

big endian



MSB → smaller add.

LSB → higher add.

Q Two to detect overflow and underflow while performing addition of numbers?

$x + y$

$$x_{n-1} \cdot c_{n-2} + x_{n-1} \cdot y_{n-1} \cdot c_{n-2}$$

1001

+ 1101

010110

Carry generated  $CY = 1$

Q Design a circuit to set/reset sign, zero, carry, aux. carry and overflow flag.

## Data Representation

19/7/14

Memory → stores the opcode, not the instruction.

Instruction

MOV A, B

Opcode

0B (say)



Instruction

Instruction size

MOV A, B

1 byte

MVI ~~A~~, D6

2 bytes

LDA ~~x030~~

3 bytes

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

Q How to store data in memory?

→ Can be stored in 2 ways →

little endian



MSB → higher add.

LSB → smaller add.

big endian



MSB → smaller add.

LSB → higher add.

Q How to detect overflow and underflow while performing addition of numbers? ↳ signed

$$x + y = x_n \cdot y_{n-1} \cdot c_{n-2} + x_{n-1} \cdot y_{n-1} \cdot c_{n-2}$$

$$\begin{array}{r} 1001 \\ + 1101 \\ \hline 0110 \end{array}$$

Carry generated  $C_4 = 1$

Q Design a circuit to subtract sign, zero, carry, aux. carry and overflow flag.



Q How is the size of any instruction det?

→ It is determined using Huffman code.  
Inst. that are frequently used are smaller in size than those which are (less) rarely used or less frequently used.



21/7

Mantissa → precision  
Exponent → range

$\frac{3\text{ bits}}{M}$        $\frac{3\text{ bits}}{E}$

0 11, 1011

$$3 \times 2^3 = \underline{\underline{24}} \text{ (highest)}$$

1 11,

$$-3 \times 2^3 = -24$$

1 00, 100

$$-4 \times 2^{-4}$$

| Dec | Unsigned | 1's comp. | 2's comp. | sign. mag. |
|-----|----------|-----------|-----------|------------|
| 4   | 011      | 011       | 011       | 011        |
| 3   | 010      | 010       | 010       | 010        |
| 2   | 001      | 001       | 001       | 001        |
| 1   | 000      | 000       | 000       | 000        |
| 0   | 000      | 111       | (1)000    | 000        |
| -1  | 001      | 110       | 111       | 101        |
| -2  | 010      | 101       | 110       | 110        |
| -3  | 011      | 100       | 101       | 111        |

Tag field → (3 bits) special bits used for recognizing if the word is simple data, inst., its precision, etc.

The disadvantage is → memory usage increases

However, now they are not used.

- the very 1st word is considered as inst. as processor.

### Check bits

parity → fails for even no. of errors

second → single Error Correction  
Double Error Detection

$$2^c \geq n + c + 1$$

↓  
clock bit  
data bits

for  $n=8$  the above eqn is satisfied for

$$c=4$$

|                |                |                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| D <sub>7</sub> | D <sub>6</sub> | D <sub>5</sub> | D <sub>4</sub> | D <sub>3</sub> | D <sub>2</sub> | D <sub>1</sub> | D <sub>0</sub> |
| 0110           | 0101           | 0100           | 0010           | 0010           | 0001           | 0000           | 0000           |

|                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|
| D <sub>7</sub> | D <sub>6</sub> | D <sub>5</sub> | D <sub>4</sub> | D <sub>3</sub> |
| 1000           | 1010           | 1010           | 1001           | 1000           |

$$\begin{aligned} C_0 &= D_0 \oplus D_1 \oplus D_2 \oplus D_3 \oplus D_4 \oplus D_5 \oplus D_6 \oplus D_7 \quad (\text{LSB}) \\ C_1 &= D_1 \oplus D_2 \oplus D_3 \oplus D_4 \oplus D_5 \oplus D_6 \oplus D_7 \quad (\text{2nd bits}) \\ C_2 &= D_2 \oplus D_3 \oplus D_4 \oplus D_5 \oplus D_6 \oplus D_7 \quad (\text{3rd bits}) \\ C_3 &= D_5 \oplus D_6 \oplus D_7 \end{aligned}$$

### 2nd Sessional

4/8

#### Logic Unit

|   |     |   |
|---|-----|---|
| A | AND | B |
| A | OR  | B |
| A | XOR | B |

COMPLEMENT / NOT A

1st bit and so on for n-bit.



disadvantage  
 $n \times (\text{delay})$

## Carry Look Ahead Adder

$$G_i = x_i \cdot y_i \quad \text{only when } (x_i = y_i = 1)$$

Carry gen. signal  $\leftarrow G_i$

Carry propagate signal  $\leftarrow P_i = (x_i + y_i) \cdot$

$$\text{Sum} = x_i \oplus y_i \oplus C_{in}$$

$$\text{Carry} = x_i y_i + x_i C_{in} + y_i C_{in}$$

$$(G_i) + (P_i \cdot C_{in})$$

$$C_0 = C_{in}$$

$$C_1$$

$$C_2$$

$$C_3$$

$$C_4 = x_1 y_1 + x_1 C_{in} + y_1 C_{in}$$

$$S =$$

$$C_{i+1} = G_i + C_i \cdot P_i$$

$$G_0 = x_0 y_0$$

$$C_0 = G_0 + C_0 \cdot P_0$$

$$C_1 = G_1 + C_1 \cdot P_1$$

$$C_2 = G_2 + C_2 \cdot P_2$$

$$C_3 = G_3 + C_3 \cdot P_3$$

$$= G_3 + (G_2 + C_2 P_2) \cdot P_3$$

$$= G_3 + P_3 C_2 + (G_1 + P_1 C_1) \cdot P_3 \cdot P_2$$

$$= G_3 + P_3 C_2 + P_3 P_2 G_1 +$$

$$+ P_3 P_2 P_1 C_1$$

$$= G_3 + P_3 C_2 + P_3 P_2 G_1 +$$

$$P_3 \cdot P_2 \cdot P_1 \cdot (G_0 + C_0 P_0)$$

$$C_{out} = G_3 + P_3 C_2 + P_3 P_2 G_1 + P_3 P_2 P_1 C_0 + P_3 P_2 P_1 P_0 C_0$$

$$+ P_3 P_2 P_1 P_0 C_0 \leq C_{in}$$



=> implementation of  $(x+y) \oplus (x-y)$  in the same circuit.



| $x$ | $y$ | $x+y$ | $x-y$ |
|-----|-----|-------|-------|
| 0   | 0   | 0     | 0     |
| 0   | 1   | 1     | 1     |
| 1   | 0   | 1     | 1     |
| 1   | 1   | 0     | 0     |

\* B AD B  
 $\begin{cases} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{cases}$

mode (supplied by  $(U)$ )

mode A  
 $\begin{cases} 0 \\ 1 \end{cases}$

mode B  
 $\begin{cases} 0 \\ 1 \end{cases}$



mode A ( $\oplus$ )  
 $\begin{cases} 0 \\ 1 \end{cases}$

mode B  
 $\begin{cases} 0 \\ 1 \end{cases}$

⇒ The CU provides the select inputs.

8/8/14



| $S_1$ | $S_0$ | $Cin$ | opera <sup>n</sup> | $(B_0   B_1)$             | $A + B$                                             | ADD     |
|-------|-------|-------|--------------------|---------------------------|-----------------------------------------------------|---------|
| 0     | 0     | 0     | "                  | "                         | $A + B + 1$                                         | ADC     |
| 0     | 0     | 1     | "                  | $(\bar{B}_0   \bar{B}_1)$ | $A + \bar{B} - 1$                                   | $A - B$ |
| 0     | 1     | 0     | "                  | "                         | $A + \bar{B} + 1$                                   | $A - B$ |
| 0     | 1     | 1     | "                  | "                         | $A + \bar{B} + 1$                                   | SUB     |
| 1     | 0     | 0     | A                  | A                         | transf <sub>s</sub> A                               |         |
| 1     | 0     | 1     | A                  | A                         | $A + 0 + 1$                                         | 1NK     |
| 1     | 1     | 0     |                    |                           | $A_{12}'$ complement<br>of 1<br>$\Rightarrow A - 1$ | DCK     |
| 1     | 1     | 1     | A                  | A                         | $A - 1 + 1 = A$                                     |         |

Shift right and shift left operation:



For rotate operation,  
 $(ROL) \rightarrow A_3$  to be connected to 'shift right' input.  
 $(ROL) \rightarrow A_0$  must be connected to 'shift left' input.

n-bit arithmetic input →



transfer  
of A.



- $S_3, S_2 \rightarrow$  choose that particular arithmetic operation.
- $S_1, S_0$  (depends on these select lines)
- $00 \rightarrow$  performs arithmetic operation.
- $01 \rightarrow$  performs logical operation (depends on  $S_1, S_0$ . which one to perform)
- $10 \rightarrow$  shift right operation.
- $11 \rightarrow$  shift left operation.

\* Perform multiplication operation  $\rightarrow$  shifting and adding.

$$\begin{array}{r}
 1010 \quad \text{Multiplicand} \\
 \times 0110 \quad \text{Multiplier} \\
 \hline
 0000 \\
 1010x \\
 1010xx \\
 0000xxx \\
 \hline
 0111100
 \end{array}$$

- unsigned  
for multiplication of  $2^n$  bit numbers  $\rightarrow$
- $\rightarrow$  use a register of size  $2n$  bits.
  - $\rightarrow$  we can use the same thing for magnitude part for signed multiplication.
  - $\rightarrow$  for sign of the resultant multiplication would be 'XOR' of the signs of the two numbers.
  - $\rightarrow$  Can the same steps be performed for 2's complement represented numbers?
  - $\rightarrow$  For 2's complement rep., if the number is +ve, then it can be applied. However, if the number is -ve, then this method cannot be used.
  - We need a common rep. of both +ve & -ve of the 4-bit numbers.

$$\begin{array}{r}
 +5 \quad 0101 \\
 -(1010) = 1011 \quad (2\text{'s comp. of } +5)
 \end{array}$$

$$-5 \rightarrow (0011)$$

$$\begin{array}{r}
 1101 \\
 +1 \\
 \hline
 1000
 \end{array}$$

$$\begin{aligned}
 X &= x_{n-1} \cdot x_{n-2} \cdots x_0 \\
 2\text{'s comp. } X &= \overline{x_{n-1} x_{n-2} \cdots x_0} + 1
 \end{aligned}$$

$$\begin{aligned}
 -X &= 1 - X \\
 &= (1 - x_{n-1})(1 - x_{n-2}) \cdots (1 - x_0) + 1 \\
 &= 111 \dots 1 - x_{n-1} x_{n-2} \cdots x_0 + 1
 \end{aligned}$$

$$\underbrace{111 \dots 1}_{n \text{ times}} + 1 = 2^n$$

$$2^3 = 8$$

$$\begin{array}{r} 111 \\ + 1 \\ \hline 1000 \end{array}$$

$$= 2^n - x_{n-1} x_{n-2} \dots x_0$$

$$-x = 2^n - x$$

$$\Rightarrow x = -2^n + x$$

$$= 2^n - [0x_{n-2} \dots x_0 + \overbrace{1000 \dots 0}^{(x_{n-1})}]$$

if  $x$  is  $(-ve)$ .

$$= 2^n - 2^{n-1} - x_{n-2} \dots x_0$$

$$-x = 2^{n-1} - \sum_{i=0}^{n-2} 2^{n-2-i} x_i - x$$

$$x = -2^{n-1} + \sum_{i=0}^{n-2} 2^i x_i$$

$$\text{Common eqn: } x = -2^{n-1} x_{n-1} + \sum_{i=0}^{n-2} 2^i x_i$$

(for +ve numbers).

(Same for -ve numbers.)

### Booth's Algorithm

$A, Q_0, Q_1, M$

Consider the example of multiplying  $\underline{\underline{6 \times 2}}$

$$M = 0110$$

$$Q = 0010$$

$$\begin{matrix} Q_0 & Q_1 \\ 0 & 0 \\ -1 & 1 \end{matrix} \quad \left. \begin{matrix} \\ \\ \end{matrix} \right\} \text{shift right}$$

$$\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix} \quad \left. \begin{matrix} A = A + M \\ A = A - M \end{matrix} \right\} \text{shift right}$$

$$\begin{array}{r} 1010 \\ - 0110 \\ \hline 1001 \\ - 0110 \\ \hline 1011 \\ - 0110 \\ \hline 0011 \end{array}$$

$$M = 0110$$

$$A \quad Q \quad Q_1 \quad Q_0$$

$$\begin{array}{r} 0000 \\ 0000 \\ \hline 1010 \end{array} \quad \begin{array}{r} 0010 \\ 0001 \\ \hline 0001 \end{array} \quad \begin{array}{r} 0 \\ \text{shift} \\ 0 \\ \text{subtract} \\ 0 \\ \hline 1 \end{array}$$

$$\begin{array}{r} 1101 \\ 1101 \\ \hline 0000 \end{array} \quad \begin{array}{r} 0 \\ \text{shift} \\ 1 \end{array}$$

$$\begin{array}{r} 0011 \\ 0001 \\ \hline 1000 \end{array} \quad \begin{array}{r} 0 \\ \text{add} \\ 1 \end{array}$$

$$\begin{array}{r} 0011 \\ 0001 \\ \hline 1000 \end{array} \quad \begin{array}{r} 0000 \\ 1000 \\ \hline 0 \end{array} \quad \begin{array}{r} 0 \\ \text{shift} \\ 1 \\ \text{shift} \\ 0 \\ \hline 0 \end{array}$$

$$\begin{array}{r} 0000 \\ 1100 \\ \hline 1100 \end{array} \quad \begin{array}{r} 0 \\ \text{shift} \end{array}$$

$$6 \times 2 = 12$$



25/8

## Floating-point (exponents) ALU.

$$x+y = x_m \times 2^{e_x} + y_m \times 2^{e_y}$$

$$x-y = x_m \times 2^{e_x} - y_m \times 2^{e_y}$$

$$x \times y = x_m \times y_m \times 2^{e_x+e_y}$$

$$x/y = x_m / y_m \times 2^{e_x-e_y}$$



normalized  $\rightarrow$  number represented in 2's complement form.



CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

Ch 9 ✓

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

## Pipelining

$x+y \Rightarrow$

|                 |   |
|-----------------|---|
| shift the exp   | ] |
| shift the mant. |   |
| Add mant.       |   |

Normalized

Chapter 4

Datapath Design

instruction processor

DPU

CU

Fixed-Point arithmetic

+  
-  
\*, /

## Addition & Subtraction

Full adder  $\rightarrow z_i = x_i \oplus y_i \oplus c_{i-1}$

$$G_i = x_i y_i + x_i c_{i-1} + y_i c_{i-1}$$

Full subtractor  $\rightarrow$

full sub  
 $b_i = x_i \oplus y_i$

$$z_i = x_i \oplus y_i \oplus b_{i-1}$$

$$b_i = x_i \bar{y}_i + x_i b_{i-1} + \bar{y}_i b_{i-1}$$

$b_i = x_i \oplus y_i$   
 $b_i = x_i \oplus y_i \oplus b_{i-1}$   
 $b_i = x_i \oplus y_i \oplus b_{i-1} \oplus b_{i-2}$   
 $b_i = x_i \oplus y_i \oplus b_{i-1} \oplus b_{i-2} \oplus b_{i-3}$   
 $b_i = x_i \oplus y_i \oplus b_{i-1} \oplus b_{i-2} \oplus b_{i-3} \oplus b_{i-4}$

30/8

$-9 \rightarrow 1011$  requires 5 bits for representation

A Q M

1111 1011 00101  
SL

1111 01110 00101, restore A  
SL

11110 11100 00101, restore A  
SL

11101 11000 00101, restore A  
SL

11101 11000 00101, restore A  
SL

11011 10000 00101, restore A  
SL

10011 10000 00101, restore A  
SL

10111 00000

11100 00001

Sign bit remains zero  $\rightarrow q_0 = 1 + A \wedge A + M$

2's complement of  $q - 1$  00001

$$\begin{array}{r} +1 \\ 11110 \\ \hline 11111 = (-1) \end{array}$$

CLASSMATE Date Page

$$\begin{array}{c} 11111 \\ + 00101 \\ \hline 100100 \end{array} \quad \begin{array}{c} 11110 \\ + 00101 \\ \hline 00011 \end{array} \quad \begin{array}{c} 11101 \\ + 00101 \\ \hline 00010 \end{array} \quad \begin{array}{c} 11011 \\ + 00101 \\ \hline 100000 \end{array} \quad \begin{array}{c} 101113 \\ + 00101 \\ \hline 11100 \end{array}$$

CLASSMATE Date Page

9/5

A Q M

000000 01001 00101  
SL

000000 10010 00101  
SL

10111 / 00101 000101  
10010 00101, restore A  
SL

10111 00101

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

10111 00100

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

Design 2's complement adder / subtractor so that it can complete the operations,

$x+y$ ,  $x-y$ ,  $y-x$  as specified by a control input.



$$x+y = x + \bar{y} + 1$$

$$y-x = y + \bar{x} - 1$$

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

$$x \oplus s = x \quad \text{if } s=0$$

$$x \oplus s = \bar{x} \quad \text{if } s=1$$

$$\bar{x} \cdot 0 + x \cdot 1 = x$$

$$\bar{x} \cdot 1 + x \cdot 0 = \bar{x}$$

$$y-x = y + (-x)$$

$(1+\bar{x})$  in 2's complement exp.

Sub<sup>n</sup> using adder circ.  $\rightarrow$



$(x \neq y)$   
overflow  $\rightarrow$  add 2<sup>n+2</sup> of 2 the nos

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

$-6 = 1010$

-ve Multiplication using Booth's algo.

$$-6 \times 2$$

$$(-6) \rightarrow$$

$$A \quad Q \quad Q_1 \quad M$$

$$\begin{array}{r} 0000 \\ - 1010 \\ \hline 0 \end{array}$$

$$Q_0 - Q_1 = 00$$

$$\begin{array}{r} 0000 \\ - 0101 \\ \hline 0 \end{array} \rightarrow$$

$$Q_0 - Q_1 < 10 \rightarrow \text{Sub.}$$

$$\begin{array}{r} 1110 \\ - 0101 \\ \hline 0 \end{array} \rightarrow$$

$$Q_0 - Q_1 = 01 \rightarrow \text{Add}$$

$$\begin{array}{r} 0111 \\ - 0010 \\ \hline 1001 \end{array} \rightarrow$$

$$Q_0 \rightarrow 1, Q_1 = 1 \rightarrow \text{Add}$$

$$\begin{array}{r} 0110 \\ - 1001 \\ \hline 0001 \end{array} \rightarrow$$

$$Q_0 \rightarrow 1, Q_1 = 0 \rightarrow \text{Add}$$

$$\begin{array}{r} 0011 \\ - 0100 \\ \hline 0 \end{array} \rightarrow$$

now take 2's complement

$$\Rightarrow 1011 + 1 = \underline{\underline{1100}} \Rightarrow \underline{\underline{-(12)}}$$

$$\underline{-3 \times 11} \Rightarrow (11 \times -3)$$

$$+11 \rightarrow$$

$$01011$$

$$-3 \rightarrow$$

$$\underline{\underline{11101}}$$

$$00010$$

$$\underline{\underline{11100}}$$

$$+1$$

$$\underline{\underline{11101}}$$

$$1$$

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

$010000$

$$\begin{array}{r} 10101 \\ - 10101 \\ \hline 00010 \end{array}$$

$$\begin{array}{r} 010000 \\ - 11101 \\ \hline 00010 \end{array}$$

$$\begin{array}{r} 00010 \\ - 11101 \\ \hline 00101 \end{array}$$

$$\begin{array}{r} 00101 \\ - 00010 \\ \hline 00101 \end{array} \rightarrow$$

$$01111$$

$$\begin{array}{r} 10000 \\ - 1 \\ \hline 10001 \end{array}$$

$$-4 \times 3$$

$$\begin{array}{r} 0100 \\ - 1011 \\ + 1 \\ \hline 1100 \end{array}$$

$$\begin{array}{r} Q_0 Q_1 \\ 00 \\ 11 \end{array} \rightarrow$$

$$\begin{array}{r} 01 \\ 10 \end{array} \rightarrow$$

$$A \quad Q \quad Q_1 \quad M$$

$$\begin{array}{r} 0000 \\ \cancel{1111} \\ 0000 \end{array}$$

$$1111 \quad 1100 \quad 0 \quad 0011$$

$$1111 \quad 1110 \quad 0 \quad \cancel{0}$$

$$1100 \quad 1110 \quad 0 \quad \cancel{0}$$

$$1110 \quad 0111 \quad 1 \quad \cancel{0}$$

$$1111 \quad \boxed{0011} \quad \cancel{0} \quad \cancel{1}$$

$$1100 \quad \cancel{11} \quad \boxed{1101} \quad \cancel{1}$$

Sub

Add

classmate

Date \_\_\_\_\_  
Page \_\_\_\_\_

## classmate

Date \_\_\_\_\_  
Page \_\_\_\_\_

| A              | B           | C        | M        |
|----------------|-------------|----------|----------|
| 000            | 1100        | 0        | 0011     |
| <del>000</del> | <u>0110</u> | <u>0</u> | <u>→</u> |
| 000            | <u>0011</u> | <u>0</u> | <u>→</u> |
| 101            | 0011        | 0        | <u>→</u> |
| 110            | <u>1001</u> | <u>1</u> | <u>→</u> |
| <u>111</u>     | <u>0100</u> |          |          |
|                |             |          |          |
|                | 1011        |          |          |
|                | + 1         |          |          |
| (-1)           | <u>1900</u> |          |          |

|               |   |       |
|---------------|---|-------|
| <u>-8 × 3</u> |   |       |
| 0             | 0 | M     |
| 11000         | 0 | 00001 |
| 01100         | 0 | →     |
| 00110         | 0 | →     |
| 00011         | 0 | →     |
| 00011         | 0 | Sub.  |
| 10001         | 0 | →     |
| 14847         | 1 | →     |
| 010000        |   |       |
| 1             |   |       |
| 8             |   |       |

$$\begin{array}{r} 60010 \\ - 01011 \\ \hline 10111 \end{array}$$

$$-3 \times 1$$

| A    | Q            | Q <sub>0</sub> | M    |
|------|--------------|----------------|------|
| 0000 | 110 <u>1</u> | 0              | 1011 |
| 0101 | 1101         | 0              |      |
| 0010 | 1110         | 1              |      |
| 1101 | 1110         | 1              |      |
| 0110 | 1111         | 0              |      |
| 0011 | 1111         | 0              |      |
| 0001 | 1111         | 1              |      |
| 0000 | 1111         |                |      |

Sub → 1

Add → 2

Sub → 3

→ 4

$$-3 \times 11 \rightarrow u \Rightarrow 0101$$

$$-3 \Rightarrow 11(0)$$

| A     | Q     | Q <sub>o</sub> | M     |
|-------|-------|----------------|-------|
| 00000 | 11101 | 0              | 01011 |

|                                         |        |        |   |          |   |
|-----------------------------------------|--------|--------|---|----------|---|
| <del>000000<br/>010111<br/>101011</del> | 10101  | 11101  | 0 | Sub<br>→ | 1 |
| <del>010101<br/>010111<br/>101011</del> | 101010 | 11110  | 1 | Add<br>→ | 2 |
| <del>010101<br/>010111<br/>101011</del> | 00101  | 11110  | 1 |          |   |
| <del>001010<br/>001011<br/>10111</del>  | 00010  | 11111  | 0 |          |   |
| <del>001010<br/>001011<br/>10111</del>  | 10111  | 11111  | 0 | Sub      | 3 |
| <del>001010<br/>001011<br/>10111</del>  | 01110  | 11111  | 0 | →        | 3 |
| <del>001010<br/>001011<br/>10111</del>  | 11011  | 11111  | 1 |          |   |
| <del>001010<br/>001011<br/>10111</del>  | 00110  | 10111  | 1 | →        | 4 |
| <del>001010<br/>001011<br/>10111</del>  | 00011  | 10111  | 1 |          |   |
| <del>001010<br/>001011<br/>10111</del>  | 11110  | 11111  | 1 | →        | 4 |
| <del>000000<br/>000011<br/>101111</del> | 000011 | 101111 | 1 |          |   |
| <del>000000<br/>000011<br/>101111</del> | 000011 | 101111 | 1 | →        | 5 |

$$\begin{array}{r} -9 \times -12 \\ 12 \rightarrow 01100 \\ 00011 \\ +1 \\ \hline 00100 \end{array}$$

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

$$9 \rightarrow 11001$$

$$\begin{array}{r} 00110 \\ +1 \\ \hline 00111 \end{array}$$

$$\begin{array}{cccc} A & Q & Q_o & M \\ 00000 & 00100 & 0 & 00111 \end{array}$$

$$\begin{array}{cccc} 00000 & 00010 & 0 & \xrightarrow{1} \\ 00000 & 00001 & 0 & \xrightarrow{2} \\ 00000 & 00100 & 1 & \xrightarrow{3} \\ 00100 & 01011 & 10000 & \xrightarrow{4} \\ 00111 & 00101 & 11000 & \xrightarrow{5} \end{array}$$

$$\begin{array}{cccc} 00000 & 00001 & 0 & \xrightarrow{Sub} \\ 00001 & 00100 & 10000 & \xrightarrow{Add.} \\ 00000 & 00001 & 0 & \xrightarrow{2} \\ 00000 & 00001 & 0 & \xrightarrow{3} \\ 00000 & 00001 & 0 & \xrightarrow{4} \end{array}$$

$$\begin{array}{cccc} 00100 & 01011 & 10000 & \xrightarrow{1} \\ 00111 & 00101 & 11000 & \xrightarrow{2} \\ 00010 & 11100 & 0 & \xrightarrow{3} \\ 00010 & 11100 & 0 & \xrightarrow{4} \end{array}$$

$-2 \times -12$

$$\begin{array}{cccc} A & Q & Q_o & M \\ 00000 & 00100 & 0 & 00100 \\ 11110 & 0 & 0 & 00100 \\ \hline 00000 & 00000 & 0 & \xrightarrow{1} \\ 11100 & 11110 & 0 & \xrightarrow{2} \\ 11100 & 01111 & 0 & \xrightarrow{Sub} \\ 00000 & 00000 & 0 & \xrightarrow{3} \\ 00000 & 00000 & 0 & \xrightarrow{4} \end{array}$$

$$\begin{array}{cccc} 00000 & 00000 & 0 & \xrightarrow{1} \\ 11100 & 11110 & 0 & \xrightarrow{2} \\ 11100 & 00111 & 1 & \xrightarrow{3} \\ 11111 & 00011 & 1 & \xrightarrow{4} \end{array}$$

$$\begin{array}{cccc} 11111 & 00011 & 1 & \xrightarrow{5} \\ 11111 & 10001 & 1 & \xrightarrow{6} \\ 11111 & 11000 & 0 & \xrightarrow{7} \end{array}$$

$$\begin{array}{cccc} 11111 & 11000 & 0 & \xrightarrow{8} \\ 11111 & 11000 & 0 & \xrightarrow{9} \end{array}$$

$$\begin{array}{r|rr} 01100 & 00010 \\ \hline 11101 & +1 \\ \hline 11110 & +1 \\ \hline 11110 & \end{array}$$

$$\begin{array}{r|rr} 1111 & 00111 \\ \hline 10010 & +1 \\ \hline 11000 & \end{array}$$

First left shift

(111)  
Add

Enter m  $\rightarrow$   
Enter division  $\rightarrow$

$$\begin{array}{r} 00010 \\ 11101 \\ +1 \\ \hline 11110 \end{array}$$

$$\begin{array}{r} 00010 \\ 11101 \\ +1 \\ \hline 10100 \end{array}$$

$$\begin{array}{r} 11110 \\ 10100 \\ -12 \\ \hline 10100 \end{array}$$

$$\begin{array}{r} 01100 \\ 10011 \\ +1 \\ \hline 10100 \end{array}$$

$$\begin{array}{r} 00000 \\ 11110 \\ -00010 \\ \hline 00010 \end{array}$$

$$\begin{array}{r} 00001 \\ f11110 \\ 11111 \\ \hline \end{array}$$

$$\begin{array}{r} 11111 \\ 11110 \\ -00001 \\ \hline 00001 \end{array}$$

$$0111 \\ 1001$$

$$\begin{array}{r} A \quad M \quad D \\ 1111 \quad 1001 \quad 0011 \\ \hline -7/3 \end{array}$$

$$0010$$

Add.  
 $Q_o \leftrightarrow$ , Restore A

←

if m is (-ve) then Ac  $\Rightarrow$  [111..]

13/9/14

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_



Serial Access  
Random Access  
Semirandom Access.

RAM



Between the tracks → Random Access memory.

In / On the tracks → Serial Access memory

DRO / NDRO → Destructive / Non-destructive

(EEPROM) read out memory.

Dynamic / Static → memory.

Volatile / Non-volatile → memory.

Data transfer rate → no. of bits transferred

time taken.



22/9/19 Available list

1000

- Occupied list
- Directory



Non-preemptive

- ✓ First fit
- ✓ Best fit

Preemptive allocation →



✓ demand swapping

$$t_A = t_B + t_M$$

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

gcc -std=c99 x.c

(helpercode/mw)

CLASSMATE  
Date \_\_\_\_\_  
Page \_\_\_\_\_

std::scanf("%d",  
--gcd);  
-m = Euclid

22/



< Accumulator based CPU >

OpCode  $\leftarrow$  IR  
Data  $\leftarrow$  DR/AC  
Result  $\leftarrow$  AC



M: External Memory  
MM: Main M [Also RAM]

CM: Cache memory [for faster memory oper.  
often mounted on the same die as  
the processor.]

Memory-mapped-IO → same instruction used  
for (IO-mapped-IO) → (memory and IO)

### Q Difference between a CPU and a processor.

CPU → Central Processing Unit → It is the main element in a computer which controls all processes of the computer from the basic arithmetic, IO, logical and many other processes.

Processor → It is a thing that is meant to process computer operation.

In large computers, the CPU requires one or more printed circuit boards. However, in personal computers and small workstations, the ones which most of us use, the CPU is housed in a single silicon chip called a microprocessor.

CPU's follow a 4 step process while performing any operation:

- ① fetch
- ② decode
- ③ execute
- ④ writeback



A microprocessor incorporates all of the function of a computer's CPU on a single integrated IC.

→ The internal working of the microprocessor

depends on the design and purpose of the microprocessor.  
It is limited by the number of transistors that can be placed on the chip, the terminals that can connect to other parts of the computer.

Due to the very similarity in usage, it is easy to understand why both of these words have become synonymous.

(CPU → processor is)  
acceptable

now-a-days

Procedure execution is therefore carried out as follows:

1. The CPU transfers instructions, their input data from main memory to registers in the CPU.
2. The CPU executes these instructions in their stored sequence (except) if altered by a branch instruction.
3. CPU transfers the results from its registers to the main memory thereafter.



## Difference between cache and main memory

### Cache Memory

- 1) High speed
- 2) less amount of memory
- 3) does work in one instruction cycle.
- 4) integrated in the processor to reduce the average time to access the main memory.
- 5) can work without cache.
- 6) L1, L2 and L3 cache  
(inside processor)
- 7) different sizes available.

### Main memory

- 1) Relatively lower speed
- 2) more amount of memory
- 3) takes ~~many~~<sup>more</sup> cycles to perform the same work.
- 4) can be placed independently.
- 5) cannot work without main memory.

→ A particular memory address is referred to as  $M(\text{adr})$  or simply adr.

→ The cache is designed to be transparent to CPU's instructions.

The CPU communicates with I/O devices in the same way as it communicates with external memory.

The I/O devices are associated with addressable registers called I/O ports to which the CPU can store a word (in operation) or from which it can load a word (from I/O operation).

→ All I/O data transfers are implemented by memory referencing instructions, and approach called memory-mapped I/O.

↳ (memory referencing)

The one-give-one condition is that the I/O devices and memory locations share the same set of addresses.

In memory mapped I/O, the peripheral devices are treated as memory locations. This means that memory addresses are assigned to I/O ports. The CPU just reads and writes data to a memory location, and the I/O controller automatically maps the updating part, (i.e.) transferring data from port to memory and from memory to port.

Advantage is Thus mapping allows the use of full instruction set of CISC computer directly on peripherals, because to the CPU (or program), they are just simple memory locations.

Such systems do not have I/O specific instructions.



Is interrupt part of instruction cycle?  
 → Yes, because before the instruction has been completed check for interrupt is performed in machine cycles  
 instruction cycle)  $\Rightarrow$  machine cycle (instruction executes completed  
 (blockcycles)  
 The second method to map I/O system in a computer system is  
 $\Rightarrow$  I/O mapped I/O

In I/O mapped I/O, the peripheral devices are addressed directly by the CPU using the port addresses. The length of port addresses is generally less than the length of a memory address. There are specific instructions for carrying out input and output tasks at the ports.

→ This method of mapping does not facilitate the use of memory oriented complex instruction set directly on port addresses.



User and Supervisor modes  $\Rightarrow$

programs executed by general-purpose computers fall in 2 broad categories  
 ↗ user programs  
 ↗ supervisor programs

User/ Application program handles a specific application such as a word processing program, that are of interest to computer users.

On the other hand, supervisor programs manage the routine functions of the computer system on behalf of its users. They are OS.

Instruction cycle  $\rightarrow$  sequence of operations performed to finish classmate  
 and instruction.

$$T_{clock} = \left( \frac{1}{f} \right) \quad \boxed{1} \quad \boxed{1}$$

Date \_\_\_\_\_  
Page \_\_\_\_\_

further divided into clock cycles.

Examples of supervisory functions  $\rightarrow$  controlling graphics interface

- ✓ transferring data between secondary and main memory.

Interrupts  $\rightarrow$  It is generally useful to design a CPU so that it can receive requests for supervisor services directly from secondary memory units and other I/O devices. Such a request is called 'interrupt'

$\uparrow$   
 (These are high-priority requests)

In the event of an interrupt;

→ CPU suspends execution of the program that it is currently executing and transfers to an interrupt-handling program.

→ As interrupts from I/O devices require rapid response from the CPU, it checks frequently for the presence of interrupt requests.

CPU Operation  $\rightarrow$  The sequence of operations performed by the CPU in processing an instruction constitutes an instruction cycle. While the details of the instructions vary with the type of instruction, all instructions require 2 major steps: ① fetch step during which new inst. is read from external memory M

② execute step

→ To check for pending (interrupt requests) is usually included in the instruction cycle.



⇒ Overview of CPU operations

Accumulator-based CPU → CPU design continues to be based on the premise that the CPU should be as fast as the available technology and overall design req allow.

Since cost generally increases with design complexity, the number of CPU components is kept relatively small.

IAS computer proposed by Von Neumann & his colleagues follows

basis for subsequent designs.

→ It comprises a small set of registers and the circuits needed to execute a functionally complete set of instructions.

→ In many early designs, the accumulator, a CPU register

- ✓ stored input or output operand
- ✓ stored result
- in many instruction executions.

Registers level basic CPU (early)

- ✓ used by first generation and typical was 16 bit microcontrollers
- ✓ word size is fixed in bit (assumption)
- ✓ instructions can be adequately specified by means of register-transfer operations in our HDL
- ✓ Instructions fetched by PC, which has ~~it~~ main register PC or Program counter.
- ✓ instructions are executed in DPU.

✓ DPU consists of 1-bit 'ALU' and 2 data registers DR and AC.

✓ Most instructions executed are of the form

$$x_1 := f_i(x_1, x_2)$$

$x_1, x_2$  denote a CPU register (AC, PC or DR), or an external memory  $M(\text{addr})$ .

$f_i \rightarrow$  operation performed by the ALU are limited to fixed-point (integer) add and subtraction, shifting, logical word-gate operations.



A small accumulator based CPU.

### Program Control Unit (PCU)

AR : Address Register

IR : Instruction Register

PC : Program Counter

### Data Processing Unit (DPU)

AC : Accumulator Register

DR : Data Register

Memory addresses are stored in 2 addr. registers in PCU

- ① the program counter PC (for instruction addr.)
- ② general purpose data register. (AR) (address)

I  $\rightarrow$  that requests data word in  $M$ 's external memory location  $\text{addr}$ .

has 2 parts

✓ opode 'op'

✓ memory address 'adr'

$$I = (op \cdot adr)$$

Each instruction cycle begins with the instruction fetch operation.

$$IR, AR := M(PC);$$

$\rightarrow$  transfers the int. word I from  $M$  to the CPU.

The 'op' is loaded into PLU's instruction register IR, and address 'adr' is loaded into address register AR.

$$IR := op, AR := adr$$

Instructions that don't reference M don't use DR, their opcode part specifies CPU register to use, as well as f1 to be carried out.

Once, opcode of I has been placed in the FR, the CPU proceeds to decode and execute it.

Note: CPU can increment PC in order to obtain the address of the next instruction.

Two imp. memory addressing instructions are

• Load      LD adr      AC := M(adr)  
 Vstore      M(adr) := AC;  
 ST adr

| HDL            | Assembly lang. | Narrative                 |
|----------------|----------------|---------------------------|
| AC := M(X);    | LD X           | Load X from M into ac. AC |
| DR := AC;      | MOV DR, AC     | Move contents of AC to DR |
| AC := M(Y);    | LD Y           | Load Y into ac. AC        |
| AC := AC + DR; | ADD            | Add DR to AC              |
| M(Z) := AC     | ST Z           | Store contents of AC in M |

The above program is using the load/store feature and is hence called  
 [load/store architecture]

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

The CPU design in the previous diagram can be designed to implement

$$AC := f_1(AT, M(addr))$$

Thus, we can optimize the prev. code fragment.

HDL

$$AC := M(X)$$

$$AC := AC + M(Y)$$

$$M(Z) := AC$$

Assembly lang.

LD X

ADD Y

ST Z

Narrative

Load X from M in AC

Load Y in DR & add AC

Store content of Y in M.

→ The ADD Y instruction can be expected to take longer to execute than the original ADD instr. that refers only to CPU registers. (Memory ref. complicate instr.-decoding task in CPU)

→ However, overall exec. time is reduced as we have reduced LD ~~^ MV~~ (eliminated).

{ RISC → Reduced Instruction Set Computing follows (Load & Store) Architecture  
 CISC → Complex Instruction Set Computing.  
 (minimal + easy)  
 + reduced

| instr. may complete the whole task  
 ✓ interval decoding of inst. to pipeline.

8085 microprocessor

→ 8 bit processor

(each reg. is of 8 bits)

→ address bus is of 16 bits

→ instruction size can vary.

8/7/14

SP → stack pointer → stores stack memory  
 (of 16 bit as add. bus is 16 bit) classmate  
 Date \_\_\_\_\_  
 Page \_\_\_\_\_

Registers available are



8085

Flag → tells some conditions after execution of instruction.  
 C, S, Z, P, AC → sign, carry, auxiliary carry, parity, zero  
 ↓  
 used in decimal operations

Q What is Program Status Word in reference to the  
 8085 microprocessor? (PSW)  
 (Is it same as flag register?)

⇒ A register containing status flag  
 values → AL register content

## Arithmetic and Data Transfer Instructions

### Commands

↓ dest. reg

MOV Rd, Rs

↓ source reg

MOV A, B

→ In reality data is being copied from B to A.

Size of this instruction in opcode would be → 1 byte.

⇒ This entire thing would be represented by a  
 single opcode.

(Instruction)

MOV A, B

(Opcode)

↓  
opcode

⇒ 1 byte only (78)

(MOV B, A) ⇒ 1 byte only (47) ↴  
 opcode

(MOV A, C) ⇒ 1 byte only (7D) ↴  
 opcode

MVI B, 47 H ↴ more immediately data  
 (more immediately to the  
 specified address) ↴  
 (47) specified by '47' opcode into  
 register B.

Say so is the data, then  
 it would be stored (copied)  
 to register B.

⇒ Again its a 2 byte instruction.

MVI B 57H ↴ 42 = 2 byte instruction.  
 06 opcode  
 57

opcode → gives a shorter instruction to the  
 processor.

MVI D 47H (opcode)  
 (47) → 2 byte instruction

Add Register

ADD R

ADD B

$A = A + B \rightarrow$  Add content of B to content of A and store the result in A.

$\Rightarrow$  2 byte instruction set

Simply

ADD C  $\Rightarrow AC := AC + C$

$\rightarrow$  Add content of register C to the content of the accumulator and store the result in the accumulator.

$\Rightarrow$  2 byte instruction.

Instruction

Opcode

ADD A

87

ADD B

80

ADD C

81

ADI C6

58  $\rightarrow$  C6

Add immediately to  
and store in AC.

$\rightarrow$  Add content of C6 to AC & store in AC (immediately)

No ambiguity  $\Rightarrow$  ADI C6  
on which C6 or C0 to be selected?  
C0  $\leftarrow$  2 byte instr  
(+)  $\leftarrow$  2 byte instr

Reason { Opcode knows the size of instruction  
so it would know that 'C0' [and one] is  
to be used not the first one

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

classmate

Date \_\_\_\_\_

Page \_\_\_\_\_

SUB A  $\rightarrow$  Subtract content of A from content of AC and then store in AC.

$AC := AC - A$ .

1 byte instruction.

SUB A, B  $\rightarrow$  A := A - B

2 byte instruction

$\rightarrow$  Subtract content of B from the content of A and store it in A.

~~INR~~ INR Rg  $\rightarrow$  Increment register by 1  
INR B  $\rightarrow$  " B "  
INR C  $\rightarrow$  increment  $\rightarrow (C := C + 1)$

ADI A, 01  $\rightarrow$  2 byte instruction.

$\Rightarrow$  INR is preferable as it is 1 byte instruction.

Logical Instructions  $\Rightarrow$

ANA A  $\Rightarrow$

A  $\rightarrow$  0000 0100 AND  
B  $\rightarrow$  0011 1100  

---

 $00\ 00\ 0100$   $\rightarrow$  A

ANA B  $\Rightarrow$

And content of A with content of B.  
and store in A.

ANI B  $\Rightarrow$  And content of AC with content of B.  
and store in AC.

ORA       $Rg \Rightarrow$  'Or' content of Reg with AC and store in AC.  
 XRA       $Rg \Rightarrow$  'XOR' content of Reg with that of AC and store in AC.  
 CMA       $\Rightarrow$  Complement ~~content~~ contents of AC and store in AC.

Branch Instructions  $\Rightarrow$   
 JMP  $\Rightarrow$  16-bit memory address.  
 $\hookrightarrow$  jump to following memory add.

|      |     |                          |                         |
|------|-----|--------------------------|-------------------------|
| 1000 | 05L | MVI B, 06 (2 byte inst.) |                         |
| 1001 | 02L |                          |                         |
| 1002 | C3  |                          |                         |
| 1003 | 10  |                          | JMP 2010                |
| 1004 | 20  |                          | ↓<br>3 byte instruction |

JC  $\Rightarrow$  16-bit memory add.  
 $\hookrightarrow$  jump on carry (3 byte instruction)

JNC  $\Rightarrow$  16-bit memory add.

$\hookrightarrow$  jump on no carry (3 byte instructions)

similarly for JZ, JNZ, JPE, JPO  $\rightarrow$  All have  
 16-bit memory add + 3 byte instr.

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

reg. B is always with reg. C.  
immediately

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

LXI B, 5060  $\Rightarrow$   
 T  
 Register pair       $B \leftarrow 50$  }  $\rightarrow$  contents of '50' &  
 $C \leftarrow 60$  }

11/11/14

NOP  $\rightarrow$  No operation.  
 HLT  $\rightarrow$  Halt

ADD A  $\rightarrow$  Add content of register A to accumulator(AC) and store in AC.  
(1 byte inst.)

MVI A, 20  $\rightarrow$  2 byte inst.  
 ↓  
 Inst.  $\rightarrow$  1 byte  
 and storing memory add.  $\rightarrow$  +1 byte

For storing 05 in registers A  $\rightarrow$

✓ MVI A, 05 <sup>8 bit</sup>  $\Rightarrow$  correct. (used for storing 8-bit data directly in the specified register)

~~XLOA A, 05~~ <sup>16 bit</sup> LDA 3050  $\rightarrow$  means used data stored at location 3050 and store in AC.  
(load accumulator) (say its 01) then (AC  $\leftarrow$  01)

LXI B, 3050  $\rightarrow$  store immediately 50 in B  
A and its pair i.e. C should store 60.

LXI B-C

LXI D-E

LXI H-L

(register pairs)

STA 1111  $\rightarrow$  store the contents of accumulator at the specified memory location.

⇒ All these instructions are specific to the ~~Intel 8085~~  
microprocessor.

LXI R<sub>p</sub>, 16-bit memory address

e.g.: LXI B, 3010 (3 byte instruction)

MOV R, M

↑  
memory add. from  
'HL' pair

Eg: MOV D, M

indirectly take memory add. from  
register → HL 1050 ← 16 bit add.  
1050 [30] ← 8 bit value.

D ← 30 ← 8 bit reg. D

Q Store 50 in register B. using the instruction  
MOV B, M.

Assume all memory addresses are available.

STA 3050/V

STA M(X)

MOV R<sub>p</sub>, M(V)

MOV A, 3050(X)

MVI A, 50

STA 1050

LXI H, 1050

MVI B, M

MOV M, A → whatever is the content of accumulator

↳ 50 should be stored at the memory add. pointed by HL. In this case 1050.

internally  
retrieved  
from HL  
1050 [50]

At memory locn 1050 ↴

50 would  
be stored

MVI M, 8bit add → 2 byte inst.

retrieves add from HL reg

address → [20]

⇒ retrieves data from 8-bit add. and stores it in the memory add. referenced by the 'HL' register pair.

classmate

INX B → 1 byte

B C  
00 01  
00 02

INR A → 1 byte inst

00 FF  
01 00

increment value of register pair by 1.

DCX B → 1 byte ⇒ decrement value of reg. pair by 1.

B C  
00 03  
00 02  
01 00  
00 FF

Load ⇒ Accumulator receives data from memory

✓ LDAX B } → store the content of the memory add. pointed by the register pair 'B-C' and store it in the accumulator.

✓ LDAX D } similarly consider the reg. pair D-E, and store in the accumulator.

Store ⇒ Accumulator → data transferred to memory

STAX B } 0607 → 05 stored from A  
STAX D } 0909 → 05 stored from A

A → 05  
B → 06  
C → 07  
D → 08  
E → 09

0607 → 03 (stored initially)

0909 → 07 ("")

LDA

1676 → load content from this memory add. in A  
 ↑  
 memory add.

INT

SP

L (stack pointer)

Rotate functions

PLC

→ work on the left shifting operation.

RAL

rotate left

rotate right

RRRC

→ work on right shifting operation.

RAR

rotate right

RCC

RAL → Rotate Accumulator left through carry

RAL

carry value is written in D0

C [ ]



RAL → 9-bit data.

RRRC → rotate right by shifting carry.

RAR → rotate right thru carry.

15/7/14

LHLD → directly stores data from the specified 16-bit memory address in HL registers

Ex: LHLD 16 bit add.



15/7/14

## RISC Architecture

LD X

ST X

MOV DR, AC

MOV AC, DR

ADD

SUB

AND

NOT

BRA code

BRBZ code

→ Negate the value stored in Accumulator.

MOV DR, AC

{ MOV DR, AC

SUB

SUB

→ Perform the multiplication  $AC = AC \times N$   
Multiply Y with N. L

ST 1234

AC +  
AC +  
AC +  
AC } N times

MOV AC, 0000H

LD 0000H

MOV DR, AC

LD 0001H

MOV DR, AC

LD 0002H

MOV DR, AC

LD 0003H

MOV DR, AC

ADD

ST 0001H

LD 0002H

MOV DR, AC

ADD

B2 end

LD 0000H

MOV DR, AC

LD 0001H

MOV DR, AC

LD 0002H

MOV DR, AC

ADD

ST 0003H

BRA multi

B2 end:

0000 J

0001 N

0002 Y

0003 Product

18/7/14

Institute of Advanced Sciences  
 CLASSMATE  
 Date \_\_\_\_\_  
 Page \_\_\_\_\_

### The IAS Computer

Address bus → 12 bits  
 Data bus = 12 bits  
 Address of 12 bits  
 40 bits × 4k  
 ↑ = memory = no of addresses  
 Add. Bus  
 $= 4k \left( 2^{12} \right) \left( 2^{10} = 2048 - 1k \right)$   
 $2^2 = 4$



### Add'n features

#### → Architectural Features Extensions

1. Multipurpose register set in storing data + add.
2. Additional data, instruction & address types.
3. Register to indicate computation status.
4. Program control stack
5. Speed up techniques — Caching — Pipelining

Q Which are the diff. address types?  
 — for some instructions, operand is add. & for some add. of operand is spec. in the program per.

Q A machine has a 32-bit architecture with 1 word long instruction. It has 64 registers each of which 32 bits long. It needs to support 45 instr. which have an immediate operand in addition to two register operands. How many bits can be used for immediate operand? → 14 bits



→ PSW and AC are treated as 10 bit units for stack operations.

(8 bits) on each memory add.

| 8085        | 8086 | IAS |
|-------------|------|-----|
| Data bus    | 8    | 16  |
| Address bus | 16   | 22  |

### BASIC computer (Mano machine)

- one accumulator
- 12-bit address bus
- 16-bit data bus
- 8-bit I/O bus
- 28 instructions
- PE; AR → 12 bits
- DR, AC, IR, TR → 16 bits
- INPR, OUTR - 8 bits
- Interrupt flags →
- Single bit reg. → carry flag & halt

[450]

classmate

Date

Page

16bit.

All instructions are 1 word long.



I = 0 → direct addressing mode [LDA 0450 / STA 0450]

I = 1 → indirect addressing mode [MOV A, M]



1. Memory reference Insts.

2. Register "

3. I/O "

0h | 000

0h | 111

2 | 111

Add.

Add.

Add.



PC → 8 bits

Accumulator → 16 bits

## Description

AND - anding AC with content from direct add.

ADD - add "

LDA - load AC "

BON - unconditional branch to address following

BSA - Store PC to direct memory and branch to following add.

1000 BSA 1450



IS2 - increment value in direct memory & skip to next inst. if sum is 0.

IS2 012

TNC.

$M[AR] \leftarrow M[AR] + 1$

if  $M[AR] = 0$

$PC = PC + 1$

1000 IS2 4000

if 1000 IS2 FFFF

(4000) → 50

$\frac{7}{1}$   
S1 to

if 4000 → FFFF

$\frac{1}{2}$   
0

INC PC

15

14-13-12

11 → 0

|   |   |     |                                  |
|---|---|-----|----------------------------------|
| 0 | 7 | 200 | C LA                             |
| 0 | 7 | 400 | C LE                             |
| 0 | 7 | 200 | C MA                             |
| 0 | 7 | 100 | C ME                             |
| 0 | 7 | 080 | C IR → R AR                      |
| 0 | 7 | 040 | C IL → R AL                      |
| 0 | 7 | 020 | INC → thru carry                 |
| 0 | 7 | 010 | S PA - skip next inst. if AC > 0 |
| 0 | 7 | 009 | S NA - skip next inst. if AC < 0 |
| 0 | 7 | 009 | S ZA - skip next inst. if AC = 0 |
| 0 | 7 | 002 | S ZE - skip if carry is 0        |
| 0 | 7 | 001 | HLT Halt                         |

|   |   |     |                                        |
|---|---|-----|----------------------------------------|
| 1 | 7 | 800 | INP → Input to AC from data bus        |
| 1 | 7 | 400 | OUT → Put data from AC to data bus     |
| 1 | 7 | 200 | SKI → skip next inst. if INP flag is 1 |
| 1 | 7 | 100 | SKO → skip next inst. if OUT flag is 1 |
| 1 | 7 | 080 | ION → enable interrupt                 |
| 1 | 7 | 040 | TOF → disable interrupt                |

25/7/14

⇒ Addressing modes

{ Fetch → opcode fetched  
Decode → operation, operation determined  
Execute → execute.

→ implied mode ⇒ nothing is specified.

Sq: CMA (8085)

RAR/RAL/RRC/...

HLT

Immediate Mode  $\Rightarrow$  The operand is present.  
Nothing else is being specified.

Ex. MVI 08H  
ADI 2040H

Register mode  $\Rightarrow$  Registers are used

MOV A, B  
MOV A, M  
L DAX B

Autoincrement  $\begin{matrix} (+, \text{use}) \\ (-, \text{use}) \end{matrix}$  Autodecrement  $\begin{matrix} (-, \text{use}) \\ (+, \text{use}) \end{matrix}$  Mode  $\Rightarrow$  used  
in register indirect mode (not like INR & DCR)  
- Modifies content of the registers.  
- (use the content)

Effective Address  $\Rightarrow$  is defined to be the memory add. obtained from the computation dictated by the given addressing mode.

3000  $\leftarrow$  ADD 2000

for direct  $\rightarrow$  Effective add.

for indirect  $\Rightarrow$  [4000]

Effective add.

Relative Address mode  $\Rightarrow$   
Here offset is specified in the instruction.

Effective Add. = PC + (Offset specified in the instruction)  
offset

- used in branch inst.

Advantage  $\rightarrow$  Whenever branch inst. is used,  
- memory space reqd. is less  
- relocatability is achieved.

Index Addressing Mode  $\Rightarrow$

Index + Add. spec. in instruction = Effective add.

(If processor supports  
Index register)

Base Register Addressing Mode  $\Rightarrow$

Effective = Base add. + Add. specif. in Inst.

Example  $\Rightarrow$  PC = 200  
R1 = 400  
Processor reg. AC  
 $\begin{matrix} X \\ R = 100 \\ \uparrow \\ \text{Index reg.} \end{matrix}$

AC

$$\begin{aligned} PC &= 200 \\ RI &= 400 \rightarrow \text{reg.} \\ X_R &= 100 \rightarrow \text{index reg.} \end{aligned}$$

~~PC is 202~~

Add.

Memory

|  |     |                 |          |
|--|-----|-----------------|----------|
|  | 200 | LOAD to AC Mode | } 2 byte |
|  | 201 | Add. = 500      |          |
|  | 202 | Next inst.      |          |
|  |     | :               |          |
|  | 399 | 450             |          |
|  | 400 | 700             |          |
|  | 401 | 702             |          |
|  |     | :               |          |
|  | 500 | 800             |          |
|  |     | :               |          |
|  | 600 | 900             |          |
|  |     | :               |          |
|  | 700 | 300             |          |
|  | 702 | 325             |          |
|  | 800 | 300             |          |

classmate

Date \_\_\_\_\_  
Page \_\_\_\_\_

Content of reg. is  
to be transformed.

RA

AC

Registers Add.

Reg. Indirect

450 -

400

700 400

700

Auto Increment

400

202

(700)

700

202 (300)

Auto Decrement

399

450

Assignment

Indexed Reg. Addressing mode → There is a index reg. in some processor that stores the address of an address.

(IR f address specified in the instruction)

- looping
- jumping
- counters

EA

AC

Direct  
Add.

500

800

Indirect  
Add.

800

300

Immediate  
operand

200

500

Relative  
add.

(200 + 100)

700 + 100

= 700

325

Indexed Add.

600

900

## II<sup>nd</sup> Phase

5/8/14

$$K - k_b \rightarrow k \text{ bits}$$

KB - k bytes.

$$\frac{2^4}{8} = 3 \text{ bytes}$$

$$\frac{600}{3} = 200$$

$$2^{32} \times 8 \text{ bytes} = 2^{32} \text{ bytes}$$

$$2^{32} \times 8 \times 2 = 2^{32} \times 2 \text{ bytes}$$

## Chapter 5 Control Logic



|   | S <sub>0</sub> | S <sub>1</sub> | S(x,y) |
|---|----------------|----------------|--------|
| 0 | 0              | 0              | 1      |
| 0 | 0              | 1              | 1      |
| 1 | 0              | 0              | 1      |
| 1 | 1              | 0              | 0      |

NAND gate

flip-flop - A bit of data.



Read  
write } → from/to register  
↓ depends on type of triggering

edge → both  
level → one



| Q <sub>t</sub> | S | R | Q <sub>t+1</sub> |      |
|----------------|---|---|------------------|------|
| 0              | 0 | 0 | 0                | Same |
| 1              | 0 | 1 | 0                | 0    |
| 0              | 1 | 0 | 1                | 1    |
| 1              | 1 | 1 | No               |      |



Ans.

Can level triggered & edge triggered opr.  
have read/write in the same cycle.

→ used in master/slave flip-flop.

- edge triggered flip-flop can have read/write in the same cycle.

12/8/14

## Chapter 5 Control Design



Control signals generated upon: →  
 ① which function is to be used.  
 ② which registers are to be used.



Cycle

1

Write A }  
 Read A }  
 Read B }

select pt A }  
 select u.w }  
 select v.x }

Add

For 16-bit addition

$$\begin{array}{r}
 AX \\
 BX
 \end{array}
 \quad
 \begin{array}{r}
 AH \\
 BH \\
 + BL \\
 \hline
 AC
 \end{array}
 \quad
 \begin{array}{r}
 AL \\
 \downarrow \\
 AH \\
 \hline
 AL
 \end{array}$$

Cycle

1

Write A }  
 Read A }  
 Read B }

select pt }  
 select u.w }  
 select v.x }

Add

2

Control unit can be of two types

→ hardware  
 → microprogrammed

Loop: if cond1 = true then add  
 else sub

if cond2 = true then output  
 else loop

output: -----

Current State      op      next state      function

Add 1      cond1 = true      add2  
, cond1 = false      add2

Add 2      cond2 = true      add4  
, cond2 = false      add 1

i/p      o/p  
gcd(x, y)  
gcd(12, 18)

Pseudo Code  $\rightarrow \left\{ \begin{array}{l} \text{if } y > x \\ \quad \text{gcd}(x, y) \\ \quad \quad x = x - y \\ \quad \quad y = y - x \\ \quad \quad \text{else} \\ \quad \quad \quad \text{op } y \leftarrow 1 \\ \end{array} \right.$   
13. 15



gcd(x, y)  
{ while(x != 0)  
  { if(y > x || x > y)  
      {  
        z = y % x  
        if(z == 0 or  
          exit  
      }  
      }  
   }

while (x != 0)  
{ if(x > y)  
  x = x - y  
else  
  y = y - x  
OP z = x  
if(x == 0 || y == 0)  
OP 0!

Control Signals

c0  
c1  
c2  
c3  
c4  
c5  
c6  
c7  
c8  
c9  
c10  
c11  
c12

Operation

AR := PC  
DR := M(AR)  
PC := PC + 1  
PC := ~~PC~~ DR(ADR)  
IR := DR(OP)  
AR := DR(ADR)  
DR := AC  
AC := DR  
M(AR) := DR  
AC := AC + DR  
AC := AC - DR  
AC := AC and DR  
AC := not AC





$$\begin{aligned} D_2^+ &= D_1 \cdot \text{Begin} \\ D_3^+ &= D_2 \\ D_4^+ &= D_3 \end{aligned}$$

$$\begin{aligned} D_5^+ &= \neg D_4 \cdot (LD + ST) \\ D_6^+ &= D_5 \cdot LD \end{aligned}$$

$$D_1^+ = D_6 \cdot CS1 + X003 + D_4 \cdot (ST + LD + ADD + SUB + AND + NOT + BRA + (B2 \otimes AC=0) + (B2 \otimes AC=0))$$

+ D<sub>1</sub> · (Begin)

$$C_0 = D_1$$

$$C_2 = C_4 = D_3$$

~~$$C_3 = D_4 \quad \text{PARITY CHECK}$$~~

$$C_5 = D_4$$

$$C_3 = D_4 \cdot (BRA + BZ \cdot (AC=0))$$

$$C_7 = D_4 \cdot MOV2 + D_6 \cdot LD$$

$$C_5 = D_4 \cdot (LD + ST)$$

$$C_7 = D_6 + D_4$$

CMAR - Control Memory Address Register  
Address field - which instrn. is to be executed next.

9/9/14

3<sup>rd</sup> Sessional

Classical

faster

↔

slower

(why?)

if ( $x \cdot z = 0$ )

$x \leftarrow x + 1$

else

$x \leftarrow x - 1$

if ( $x$  and  $z$ )

$x \leftarrow x - 1$

else

$x \leftarrow x + 1$

Microprogrammed Control Unit

- ROM

- if RAM then WCM → Writable Control Memory.

→ Microprograms

contains microinstructions.

→ Microinstruction

contains microoperations(s).



## Format of microinstructions:

- Vertical format
- horizontal format
- shorter format
- less parallelism
- ~~less~~ more encoding
- longer format
- more parallelism
- less encoding



address  
↓  
control signals



decoder (no decoder)  
↓  
control signals

## Moderate format $\Rightarrow$



control signals.

decoder  
↓  
control signals



$\mu\text{PC} \rightarrow$  micro program counter.

CM  $\rightarrow$  control memory.

IR  $\rightarrow$  microInst<sup>n</sup> Register.

A accumulator based CPU's      Control Unit  $\rightarrow$

Basic Simulator :

FETCH:

$AR := PC$   
 $DR := M[AR]$

$PC := PC + 1$ ,  $IR := DR(COP)$

goto IR

LD:

$AR := DR(Addr)$   
 $DR := M[AR]$

$AC := DR$ , goto FETCH

SUB:  $AC := AC - DR$ , goto FETCH

IR: Based on the opcode, decide which address is to be jumped to.

$$\text{CLEAR: } \rightarrow \left\{ \begin{array}{l} DR := AC \\ AC := AC - DR \end{array} \right\}$$

$$\text{CLEAR: } \rightarrow \left\{ \begin{array}{l} \text{or} \\ DR := AC \\ AC := AC \text{ not } AC \\ AC := AC \text{ and } DR \end{array} \right\}$$

### 10 basic instn

- |           |             |
|-----------|-------------|
| 1. LD     | 7.8. AND    |
| 2. ST     | 8. NOT      |
| 3. Branch | 9.8 AC < DR |
| 4. Bz     | 10. DR < AC |
| 5. SUB    |             |
| 6. ADD    |             |

Nanoprogramming: first we go to a CM, from there, we again go to a CM.



\* => microprogram is to actual program what nanoprogram is to microprogram.

Advantage of nanoprogramming: →

- 1) parallelism can be achieved.
- 2) Loosening of bonds between them. → size of control memory reduced.
- 3) Disadvantages: →
  - 1) access time increases as 2 levels of CM.
  - 2) complex organization.

### Pipeline Control:

- 1) Instruction pipeline.
- 2) Arithmetic pipelining.

CPI : (Clock # cycles per instruction).

The optimal CPI that can be achieved is 1.

→ Pipelining helps us achieve lower CPI.



8085  
 MOV B, A;  
 SUB C  
 JMP 2030

$\left. \begin{array}{l} I_1 \\ I_2 \\ I_3 \end{array} \right\} I_1$

Date \_\_\_\_\_  
Page \_\_\_\_\_

without pipelining

$I_1 \rightarrow I_2 \rightarrow \dots$

$I_1 \rightarrow I_2 \rightarrow \dots$

1000 instrn  $\rightarrow$  2000 clock cycles.

$I_1 \Rightarrow 2$  clock cycles.



16/9/14

MIPS  $\rightarrow$  Millions of instructions per second.  
CPI  $\rightarrow$  Cycles per instruction.

$$\text{CPI} = \frac{\text{Clock cycles}}{\text{Instructions}} \times \frac{\text{Seconds}}{\text{Instructions}} = \frac{\text{Clock cycles}}{\text{Seconds}}$$

$(10^{-6}) \text{ MIPS} \rightarrow 1 \text{ instrn/sec.}$

(CPI)

Date \_\_\_\_\_  
Page \_\_\_\_\_

classmate

$I_1, I_2, I_3, I_4 \dots$

$I_1, I_2, I_3 \dots$

$2 I \Rightarrow 3$  clock cycles

$MI = (n^m)$   
 $1000^2 = 1000$  block cycles.

Total ~~time~~ clock cycles/sec

$\downarrow$

(CPI  $\times$  MIPS  $\times 10^{-6}$ )

frequency of microprocessor = (MIPS  $\times$  CPI)

|                 |    |                               |
|-----------------|----|-------------------------------|
| (Instn Fetch)   | IF | <del>External</del> (I-cache) |
| (Operand Load)  | OL | CD-cache                      |
| (Execution)     | Ex | memory access used.           |
| (Operand store) | OS | CD-cache                      |

4 stages  
for each  
instrn

without  
pipelining



|    | I              | 1              | X              | 2 | X              | 3 | X              | 4              | X | 5 |  |
|----|----------------|----------------|----------------|---|----------------|---|----------------|----------------|---|---|--|
| IF | I <sub>1</sub> |                | X              |   | X              |   | I <sub>2</sub> |                |   |   |  |
| OL | X              | I <sub>2</sub> |                | X |                | X |                | I <sub>2</sub> |   |   |  |
| Ex | X              | X              | I <sub>3</sub> |   | X              |   |                |                |   |   |  |
| DS | X              | X              | X              |   | I <sub>4</sub> |   |                |                |   |   |  |

would be seq'd. to  
read & write data  
simultaneously

|    | 1              | 2              | 3              | 4              | 5 |  |
|----|----------------|----------------|----------------|----------------|---|--|
| IF | I <sub>1</sub> |                |                |                |   |  |
| OL |                | I <sub>2</sub> |                |                |   |  |
| Ex |                | X              | I <sub>3</sub> |                |   |  |
| DS |                | X              | X              | I <sub>4</sub> |   |  |

(4 clock cycles)

perform supercalar processing  
(CPI  $< 1$ ). greater than  
one ins per clock cycle.

Superscalar m/c  $\rightarrow$  The (fetching + execution)  
units are increased. Hence, multiple instructions  
can be executed at the same time.

Ex:

with pipelining

|                      | 1  | 2              | 3              | 4              | 5              | 6                               | 7                               | 8                                                |
|----------------------|----|----------------|----------------|----------------|----------------|---------------------------------|---------------------------------|--------------------------------------------------|
| (Instn Fetch)        | IF | I <sub>1</sub> | I <sub>2</sub> |                |                |                                 |                                 |                                                  |
| (Read from Reg)      | RD | X              |                | I <sub>1</sub> | I <sub>2</sub> |                                 |                                 |                                                  |
| (Exec <sup>n</sup> ) | Ex | X              |                | X              |                | I <sub>1</sub> , D <sub>1</sub> | I <sub>2</sub>                  |                                                  |
| (Memory Access)      | MA | X              |                | X              |                | X                               | I <sub>1</sub> , D <sub>1</sub> | I <sub>2</sub>                                   |
| (Write Back)         | WB | X              |                | X              |                | X                               | X                               | I <sub>1</sub> , D <sub>1</sub> , I <sub>2</sub> |

0  $\rightarrow$  stall clock cycle.

Suppose an instrn loads a data word  $x$  into a register  
and is immediately followed by an instrn that uses  $x$  in its execution stage.

8085

|                |     | 3050 | 1           | A                       | 2 |
|----------------|-----|------|-------------|-------------------------|---|
|                |     |      |             | B                       | 3 |
|                |     |      |             | C                       | 4 |
|                |     |      |             | D                       | 5 |
| I <sub>1</sub> | LDA | 3050 | (5)         |                         |   |
| I <sub>2</sub> | ADD | B    | (7)         |                         |   |
| I <sub>3</sub> | MOV | C, D | (8) / (6)   |                         |   |
|                |     |      | (using NOP) | (using I <sub>3</sub> ) |   |

possible solutions

② Use NOP (No operation) instead of ~~D~~.

1) stall clock cycle.

2) use NOP 3) insert another instrn whenever this case arises such that no other it doesn't depend on the present op.

load

ADD

sub (if operands used for these are not same)

| S <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub> | x              | x              | x              | x              |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| S <sub>2</sub> | x              | I <sub>1</sub> | x              | I <sub>2</sub> | x              | x              |
| S <sub>3</sub> | x              | x              | I <sub>1</sub> | x              | I <sub>2</sub> | x              |
| S <sub>4</sub> | x              | x              | x              | I <sub>1</sub> | x              | I <sub>2</sub> |

Calculate the efficiency / utilization.

I → utilized (8)

x → non-utilized.  $(24) - 8 = 16$

$$\left(\frac{8}{24}\right) \times 100\% = \frac{25}{67}\% \quad [0.33]$$

$$\left(\frac{8}{24}\right) \times 100 = 33.33\% \text{ in (1)}$$

\* Efficiency = 1 can never be achieved.

non-pipelined  
pipelined

$$\text{Speed up} = \frac{S(m)}{T(m)} = \frac{T(1)}{T(m)} \rightarrow \text{no pipelining}$$

larger than m stages of pipelining

$$\text{Efficiency} = \frac{E(m)}{E(1)} = \left(\frac{8}{24}\right)$$

If we are having 4 stages the

$$S(4) = \frac{T(1)}{T(4)}$$

Thus,  $T(1) \leq m T(m)$ .

relation betw<sup>n</sup> S(m) & E(m) =>

$$\frac{8}{6} = 4 \times \frac{8}{24}$$

$$E(m) \propto S(m)$$

$$S(m) = m \cdot E(m) ?$$

PCR (Performance Cost Ratio)

$$\text{PCR} = \frac{f}{K} \rightarrow (\text{MHz}) \text{ clock frequency}$$

cost consumption for H/W

a → delay of non-pipelined -

a ⇒ delay of each stage.

b ⇒ delay due to buffer reg.

$$\text{pipelined clock period} \leftarrow T_C = \frac{a}{m} + b$$

pro analysing  
 $R = cm + d \rightarrow$  cost of data logic reg.  
→ cost of buffer reg./stage

$$PCR^+ = T_c \cdot K$$

$$= \left( \frac{a}{m} + b \right) (cm + d)$$

$\Leftarrow ac \rightarrow a$

$$= \left( \frac{a + bm}{m} \right) (cm + d)$$

$$PCR^- = \left( \frac{acm + bcm^2 + bdm + ad}{m} \right)$$

$$\Rightarrow PCR = \left( \frac{m}{\underbrace{acm + bcm^2 + bdm + ad}_x} \right)$$

Differentiating PCR wrt m;

$$PCR' = \frac{(1)(x) - m(ac + 2bcm + bd)}{(acm + bcm^2 + bdm + ad)^2}$$

$$= \left( \frac{x - m(ac + bd + 2bcm)}{x^2} \right)$$

$$= \frac{adm + bcm^2 + bdm + bd}{x^2} - \frac{acm + bdm + 2bcm^2}{x^2}$$

$$\frac{d}{dm}(PCR) = \left( \frac{ad - bcm^2}{x^2} \right)$$

reservation table  $\rightarrow$  shows stage usage in every clock cycle while I is being executed.

For optimum value,

$$PCR' = 0$$

$$\Rightarrow ad - bcm^2 = 0$$

$$\Rightarrow ad = bcm^2 \Rightarrow m = \sqrt{\frac{ad}{bc}}$$

Optimal value of m to get by

$$m_{opt} = \sqrt{\frac{ad}{bc}}$$

19/9/14 Collision

### Reservation Table

|                |   |   |   |
|----------------|---|---|---|
|                | x |   |   |
| S <sub>1</sub> |   | x |   |
| S <sub>2</sub> |   |   | x |
| S <sub>3</sub> | 1 | 2 | 3 |

Tells when we can suppose to start our instn.

forbidden list of Reservation Table.

| 1 instn        |   |   |   | 2nd instn. |   |   |     | 3rd instn. |     |    |     | 4th instn. |     |     |                                      |
|----------------|---|---|---|------------|---|---|-----|------------|-----|----|-----|------------|-----|-----|--------------------------------------|
| S <sub>1</sub> | x | x | v | v          | v | v | x   | (f)        | (f) | x  | (f) | (f)        | (f) | (f) | S <sub>1</sub> $\rightarrow$ 1, 3, 7 |
| S <sub>2</sub> | x | x | - | v          | v | x | (f) | (f)        | (f) | x  | (f) | (f)        | (f) | (f) | S <sub>2</sub> $\rightarrow$ 2, 4, 8 |
| S <sub>3</sub> | x | x | v | v          | v | x | (f) | (f)        | (f) | x  | (f) | (f)        | (f) | (f) | S <sub>3</sub> $\rightarrow$ 3, 4, 8 |
| S <sub>4</sub> | x | x | v | v          | v | x | (f) | (f)        | (f) | x  | (f) | (f)        | (f) | (f) | S <sub>4</sub> $\rightarrow$ 4, 5, 9 |
|                | 1 | 2 | 3 | 4          | 5 | 6 | 7   | 8          | 9   | 10 | 11  | 12         | 13  | 14  |                                      |

Reservation Table for addition of 2 nos.

$$I_1 \Rightarrow x$$

$$I_2 \Rightarrow$$

$\rightarrow \{ \text{next instn cannot be started just in the next clock cycle.} \}$

$$R \Rightarrow \{ 2, 5, 6 \}$$

We can start new instn from 3 & 4.

If we start from 5, then collision would occur.

forbidden list  $\Rightarrow \{1, 4, 5\}$

$\xrightarrow{\text{refers to}}$  the clock cycle.

Upon generalizing, we neglect;

$\left( \begin{matrix} 2 \\ n \\ 7 \\ n+1 \end{matrix} \right) \xrightarrow{\text{and}} \left( \begin{matrix} 5 \\ 6 \\ n+4 \\ n+5 \end{matrix} \right)$

hence; (1, 4, 5).

### Initiation Latency / latency

→ If an  $inst^n$  is started at  $x + 7$   
the next  $inst^n$  is at  $(x+n)$ ; Then the  
latency is ' $n$ '.

→ To increase efficiency, we need to decrease  
the latency.

### Schedule the pipeline:

If started at  $(n+12)$

|       |       |       |       |       |       |       |       |       |       |          |          |          |          |          |          |          |          |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|----------|----------|----------|----------|----------|----------|----------|
| $S_1$ | $I_1$ | $I_2$ | $I_3$ | $I_4$ | $I_5$ | $I_6$ | $I_7$ | $I_8$ | $I_9$ | $I_{10}$ | $I_{11}$ | $I_{12}$ | $I_{13}$ | $I_{14}$ | $I_{15}$ | $I_{16}$ | $I_{17}$ |
| $S_2$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_1$ | $I_2$ | $I_3$ | $I_3$ | $I_4$ | $I_4$    | $I_3$    | $I_3$    | $I_4$    | $I_4$    | $I_3$    | $I_5$    | $I_4$    |
| $S_3$ | $I_1$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_1$ | $I_2$ | $I_3$ | $I_3$ | $I_4$    | $I_4$    | $I_3$    | $I_3$    | $I_4$    | $I_4$    | $I_3$    | $I_5$    |
| $S_4$ | $I_1$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_1$ | $I_2$ | $I_3$ | $I_3$ | $I_4$    | $I_4$    | $I_3$    | $I_3$    | $I_4$    | $I_4$    | $I_3$    | $I_5$    |
| 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     | 10    | 11       | 12       | 13       | 14       | 15       | 16       | 17       |          |
| 7     | 8     | 9     | x     | x     | x     | x     | x     | x     | x     | x        | x        | x        | x        | x        | x        | x        |          |
|       |       |       | $I_2$ |       |       | $I_3$ |       |       | $I_4$ |          |          | $I_5$    |          |          |          |          |          |

If we start at  $(n+3)$  then.

|       |       |       |       |       |       |       |       |       |       |          |          |          |          |          |          |          |          |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|----------|----------|----------|----------|----------|----------|----------|
| $S_1$ | $I_1$ | $I_2$ | $I_3$ | $I_4$ | $I_5$ | $I_6$ | $I_7$ | $I_8$ | $I_9$ | $I_{10}$ | $I_{11}$ | $I_{12}$ | $I_{13}$ | $I_{14}$ | $I_{15}$ | $I_{16}$ | $I_{17}$ |
| $S_2$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_1$ | $I_2$ | $I_3$ | $I_3$ | $I_4$ | $I_4$    | $I_3$    | $I_3$    | $I_4$    | $I_4$    | $I_3$    | $I_5$    | $I_4$    |
| $S_3$ | $I_1$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_1$ | $I_2$ | $I_3$ | $I_3$ | $I_4$    | $I_4$    | $I_3$    | $I_3$    | $I_4$    | $I_4$    | $I_3$    | $I_5$    |
| $S_4$ | $I_1$ | $I_1$ | $I_1$ | $I_2$ | $I_2$ | $I_1$ | $I_2$ | $I_3$ | $I_3$ | $I_4$    | $I_4$    | $I_3$    | $I_3$    | $I_4$    | $I_4$    | $I_3$    | $I_5$    |
| 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     | 10    | 11       | 12       | 13       | 14       | 15       | 16       | 17       |          |
| x     | x     | x     | x     | x     | x     | x     | x     | x     | x     | x        | x        | x        | x        | x        | x        | x        |          |
|       |       |       | $I_2$ |       |       | $I_3$ |       |       | $I_4$ |          |          | $I_5$    |          |          |          |          |          |

↑ optimal scheduling method

steady state efficiency = 1.

Hazards → not achieving

- Structural - (less resources)
- Control - (2 ~~inst~~ trying to access the same resource)
- Data

PI → Fetch instr

DI → Decode Inst

CO → Compute operand add.

FO → Fetch operand

EI → Execute inst

WO → Write operand.

|                | MVN            | $R_1, X \leftarrow$ from memory | 1              | 2              | 3              | 4              | S | 6 | 7 | 8 |
|----------------|----------------|---------------------------------|----------------|----------------|----------------|----------------|---|---|---|---|
| I <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub>                  | I <sub>3</sub> | X              |                |                |   |   |   |   |
| D <sub>1</sub> | I <sub>1</sub> | I <sub>1</sub>                  | I <sub>3</sub> |                |                |                |   |   |   |   |
| C <sub>0</sub> | I <sub>1</sub> | I <sub>1</sub>                  | I <sub>2</sub> | I <sub>3</sub> |                |                |   |   |   |   |
| P <sub>0</sub> |                |                                 | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> |                |   |   |   |   |
| E <sub>1</sub> |                |                                 | I <sub>1</sub> | I <sub>1</sub> | I <sub>3</sub> |                |   |   |   |   |
| W <sub>0</sub> |                |                                 |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> |   |   |   |   |

in 4<sup>th</sup> clock cycle, I<sub>1</sub> is in 'P0' phase

We can't start I<sub>2</sub> in 4<sup>th</sup> clock cycle as its already in 'Fetch Operand' from memory phase.

Control Hazards  $\rightarrow$  I<sub>1</sub> (because of Branch Instn);

BR ADD

|                |                |                |                |     |  |  |
|----------------|----------------|----------------|----------------|-----|--|--|
| I <sub>1</sub> | I <sub>1</sub> | X              |                |     |  |  |
| D <sub>1</sub> |                | I <sub>1</sub> | X              |     |  |  |
| C <sub>0</sub> |                | I <sub>1</sub> | X              |     |  |  |
| P <sub>0</sub> |                | I <sub>1</sub> | X              |     |  |  |
| E <sub>1</sub> |                | I <sub>1</sub> | X              | ADD |  |  |
| W <sub>0</sub> |                |                | I <sub>1</sub> | X   |  |  |

unless the whole 'ADD' procedure has been completely executed, we can't start new instn.

Data Hazards  $\rightarrow$  because of data dependency.

|                |                |                |                |                |                |                |                |  |  |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|--|--|----------------|
| ADD B          | $\Rightarrow$  | $A = A + B$    |                |                |                |                |                |  |  |                |
| SUB B          | $\Rightarrow$  | $A = A - B$    |                |                |                |                |                |  |  |                |
|                |                |                |                |                |                |                |                |  |  |                |
| E <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> |                |                |                |                |  |  |                |
| D <sub>1</sub> |                | I <sub>1</sub> | I <sub>2</sub> |                |                |                |                |  |  |                |
| C <sub>0</sub> |                |                | I <sub>1</sub> | I <sub>2</sub> |                |                |                |  |  |                |
| P <sub>0</sub> |                |                |                | I <sub>1</sub> | I <sub>2</sub> | X              |                |  |  |                |
| E <sub>1</sub> |                |                |                | I <sub>1</sub> | X              | I <sub>2</sub> |                |  |  |                |
| W <sub>0</sub> |                |                |                |                | I <sub>1</sub> | X              | I <sub>2</sub> |  |  | I <sub>2</sub> |

can be wntd. from 7<sup>th</sup> clock cycle because ~~op~~ operand is update ~~in~~ in 6<sup>th</sup> one.

|                | f | D | E | W |
|----------------|---|---|---|---|
| I <sub>1</sub> | 1 | 2 | 2 | 1 |
| I <sub>2</sub> | 2 | 3 | 3 | 2 |
| I <sub>3</sub> | 3 | 1 | 1 | 1 |
| I <sub>4</sub> | 1 | 1 | 1 | 1 |

refers to the  
clock cycle no.

I<sub>1</sub>  $\rightarrow$  1 times fetched.  
 $\rightarrow$  2 " decoded.  
 $\rightarrow$  2 times executed.  
 $\rightarrow$  1 times written.

1 2 3 4 5 6 7

F  
D  
E  
W

su in mobile gallery

F  $T_1 T_2 T_3 T_4 T_5 T_6$   
 D  $T_1 T_2 T_3 T_4 T_5 T_6$   
 E  $T_1 T_2 T_3 T_4 T_5 T_6$   
 N  $T_1 T_2 T_3 T_4 T_5 T_6$

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

23/9/14

Non optimal scheduling  
(ES)

Optimal scheduling  
(ES)



Collision vector

$CV_0 = 10011$

Collision register

$CR = 00000$   
(initially 0)

Whenever a new clock cycle is being used, then we need to shift CR left by one bit (whenever beginning a new task)

forbidden list {1,4,8,3}

CLASSMATE

Date \_\_\_\_\_  
Page \_\_\_\_\_

### Task initiation diagram X

→ Whenever initializing a new task in a clock cycle, we (OR) CR with CV0 along with a left shift

$CR = 00000 \quad CV_0 = 10011$

$CR = 10011 \quad \leftarrow \text{left shift} + \text{OR}$

If no new task is initialized then only left shift needs to be performed.

$CR = \underline{\underline{\underline{\underline{\underline{0}}}}} 00110$

In next clock cycle, we are starting a new task

$\therefore CR \leftarrow \text{left shift} + \text{OR with } CV_0$ .

and so on...

$\begin{pmatrix} 10011 \\ 11111 \end{pmatrix}$

These 2 states are 6 clock cycles apart.

$\begin{array}{ll} 1 & 11111 \quad (3) \\ \ll 1 & 11110 \quad (4) \\ \ll 1 & 11100 \quad (5) \\ \ll 1 & 11000 \quad (6) \\ \ll 1 & 10000 \quad (7) \\ \ll 1 & 00000 \quad (8) \\ \ll 1 & 000000 \quad (9) \\ + \text{OR} & 10011 \\ \hline & 10011 \quad (9) \end{array}$

$\lll$  00110 ⑩

$\lll$  01100

OR 10011

11111 ⑪

$\lll$  11110 ⑫

$\lll$  11100 ⑬

$\lll$  11000 ⑭

$\lll$  10000 ⑮

$\lll$  00000 ⑯

$\lll$  00000 ⑰

OR 10011

10011 → ⑱



starting from 9<sup>th</sup> clock cycle above state diagram is obtained.

for optimal scheduling mechanism:

$$CN_0 = 10011$$

$$CR = 00000$$

$\lll$  00000  
OR 10011

10011 ① }

~~$\lll$~~  00110 ② }

$\lll$  01100 ③ }

$\lll$  01100 ④ }

OR 10011

11011 ⑤ }

~~$\lll$~~  10110 ⑥ }

$\lll$  11000 ⑦ }

OR 10011

11011 ⑧ }

$\lll$  10110 ⑨ }

11011

11011

11011

$\lll$  01100 ⑨ }  
 $\lll$  11000 ⑩ }  
OR 10011 } 11011

$\lll$  11011 ⑪ }  
 $\lll$  10110 ⑫ }  
 $\lll$  01100 ⑬ } 11011

$\lll$  11000 ⑭ }  
OR 10011 } 11011

$\lll$  11011 ⑮ }  
 $\lll$  10110 ⑯ }  
 $\lll$  01100 ⑰ } 11011

$\lll$  11000 ⑱ }  
OR 10011 } 11011

$\lll$  11011 ⑲ }  
 $\lll$  10110 ⑳ }  
 $\lll$  01100 ㉑ } 11011

$\lll$  11000 ㉒ }  
OR 10011 } 11011

$\lll$  11011 ㉓ }  
 $\lll$  10110 ㉔ }  
 $\lll$  01100 ㉕ } 11011

$\lll$  11000 ㉖ }  
OR 10011 } 11011

$\lll$  11011 ㉗ }  
 $\lll$  10110 ㉘ }  
 $\lll$  01100 ㉙ } 11011

$\lll$  11000 ㉚ }  
OR 10011 } 11011

$\lll$  11011 ㉛ }  
 $\lll$  10110 ㉜ }  
 $\lll$  01100 ㉝ } 11011

Thus after 3 clock cycles each, states repeat themselves.

11011  
01100

10011  
11111

11011  
10110

11011  
01100

Task initiation diagram.

→ Optimal one is where latency is minimized.

$p \rightarrow$  probability of encountering branch instrn.  
 $q_V \rightarrow$  probability of taking branch instrn (actually).

$s \rightarrow$  instrn stream. (for each instrn)

$m \rightarrow$  no. of stages simultaneously being executed

clock cycle = if taking a branch + if not taking a branch.

$$CPI = p q m s + (1 - p q) s$$

$$\frac{\text{Clock cycles}}{m} = 1 + p q (m - 1)$$

$$\text{Ex: } m = 6 \quad p = 0.2 \quad q_V = 0.4$$

$$CPI = 1 + (0.2)(0.4)(5) = 1 + 0.08 \times 5 = 1.4$$

→ In execution of branch inst<sup>n</sup> using pipelining, if delaying the jump would increase the efficiency then its called delayed jumps.

RAW (Read After Write)  
 $I_1 \xrightarrow{R R \text{ } W W} R \quad R$  ] no problem created.

$I_2 \xrightarrow{R W \text{ } R W}$   
(RAW) Read After Write Hazard. → If reading occurs after 'write' inst<sup>n</sup>, then incorrect data would be read. This hazard would be averted if  $I_2$  could be stalled when  $I_1$  is reading.  
 [True dependency]

(WAR) Write after Read Hazard → If  $I_2$  will get incorrect data after  $I_1$  has executed 'write' instruction.  
 [anti-dependency]  
 (Why?)

(WAW) Write After Write Hazard  
 [output dependency]

forwarding path → output of intermediate stage would be given to the 'next' stage(s).  
 Ex: branch instr<sup>n</sup>, 3 clock cycles per instruction wasted. Speed up factor is 8 and each clock cycle is of  $10\mu s$ . How many stages are present in the pipeline system?

Ans → Let no. of stages =  $n$

$$8 = n \times 10\mu s$$

$$\frac{8}{(1+3 \times 0.3) \times 10\mu s} = \left( \frac{n}{1+0.9} \right) \Rightarrow n = 8 \times 1.02$$

$(-5) \rightarrow 1019$

classmate  
Date \_\_\_\_\_  
Page \_\_\_\_\_

### Non-Restoring Division Algo

$$\begin{array}{r} 0000 \\ 1101 \\ \hline 0011 \end{array} \quad AC \quad MQ$$

$$\begin{array}{r} 0001 \\ 0011 \\ \hline 1110 \end{array} \quad k \text{ bits} \quad k \text{ bits}$$

$M_{Q,0} \rightarrow \text{LSB of } MQ$  gives the quotient saved after a sub<sup>n</sup> or add<sup>n</sup>.

Total no. of add<sup>n</sup> / sub<sup>n</sup> are  $k^2$  & total no. of shifts =  $k+2$  (14th Nov)

4 bits for divisor & 7 bits dividend.

Sign bits  $S \rightarrow$  shows the sign.

dividend of 2 words + divisor of 1 word.

- ① check sign of divisor.
- ② check sign of dividend  
 If (1) then take 2's complement

$$P_{in} = S_{div} \odot S_{ALU}$$

After left shift  $\begin{cases} \text{MSB} \\ \text{dividend} \end{cases} \oplus S_{div} \rightarrow \begin{cases} 0 & \text{sub} \\ 1 & \text{Add.} \end{cases}$

101  
100

0



a[100]

classmate

Date

Page

for (i=0; i<10; i++)

{ for (j=0; j<10; j++)

{ if (a[i]>a[j])

{ t = a[i];

a[i] = a[j];

a[j] = t;

b[i] = 1;

mn

for (i=0; i<strlen(str); i++)

{ if (str[i] == '1')

{ for (j=i+1; j<strlen(str); j++)

temp = str[i];

str[i] = str[j];

str[j] = temp;

3

C100 → 002F

C101 → 002F

C102 → 0020

C103 → 001B

C104 → 0016

C105 → 000F

C106 → 000C

C107 → 0007

C108 → 0002

C109 → 0001

LAB 2

If the no. is larger than 50, 1 in A100,

less than 50, 2

= 50, 3

Sorting of memory locn C100, C109

Sort

Ascending order

C100 - C109

119/10/15 /7/8/4/12/11/34



13/9/14

Ch-6 Memory Organization

sss

Access Type:

- 1> Serial Access
- 2> Random Access
- 3> Semirandom Access.



→ In serial access, we have only one r/w head per plate

→ In random access, we have r/w head per track.

Random Access + Serial Access = Semi Random Access.

Different types of erasable memory:

- DRAM (Destructive read out memory).
- Dynamic
- Volatile

Non-erasable memory:

- NDROM
- Static
- Non-volatile

In Dynamic RAM is composed of capacitors: If they are charged, represents 1; else - represents 0.

- Refreshing in dynamic RAM is hence time consuming.

- Static RAM consists of flip-flops ~~cache memory~~ e.g., Cache memory.

- Static RAM is costlier and less denser than Dynamic RAM.

Non-volatile  $\rightarrow$  contents of RAM don't get erased if power failure occurs.

Volatile  $\rightarrow$  contents of RAM would get erased if power failure occurs.

Flash memory: EEPROM or E<sup>2</sup>PROM

Access time: time interval between the request & the 'request' getting served is called access time.

Cycle time: Time elapsed between 2 consecutive memory read operations.

In DRAM,



2nd request gets served after restoring the 1st request.  
Hence, restoring time is included in cycle time.

$$t_{\text{access}} < t_{\text{cycle}}$$

$$t_{\text{access}} = t_{\text{cycle}}$$

data transfer rate = no. of bits transferred per unit of time.

$$b_M = \left( \frac{N}{T} \right)$$



\*  $4K \times 8$  = Available chip size =  $2^{12}$   
 $16K \times 8$  = memory size =  $2^{14}$

Every RAM has

- CS (Chip select) = for selecting IC
  - WE (Write enable)
  - OE (Output enable)
- every IC has these 2 pins  
 ↓  
 [for reading]



⇒ For achieving  $16K \times 32$ , we have to attach 4 ( $4K \times 8$ ) chips in each line.

5/9/14  
Memory Interleaving:

Virtual memory:



→ The whole program may not be stored in the main memory, only part of the program is stored.

→ To user, it seems as if the whole program is stored in the main memory. (called virtual memory).

→ Functions of memory management unit:

- 1> transfer blocks of data from S.M. to M.M.
- 2> deallocate & reallocate blocks.
- 3> free the blocks allocated previously.

\* Any program stores all the instructions at continuous memory locations so, processor knows that the next instru would probably be at the next memory location. This is called 'Locality of Reference'.

Same is for C.M. processor. It would transfer whole block instead of one byte from C.M. to M.M.

Performance of memory system:

- 1> Size of block
- 2> Size of memory itself (size of C.M. or M.M.)
- 3> Replacement algorithm (decides which block should be removed if enough space is not available)

Spatial Locality: next consecutive locations are referred by the processor and placed into main memory.

Temporal Locality: segment in which loop is present, processor has to process the whole loop repeatedly hence, whole loop is placed inside the main memory.

Average cost of memory:  $C = \left( \frac{C_1 S_1 + C_2 S_2}{S_1 + S_2} \right)$

$C_1$  = cost of main memory.

$S_1$  = storage capacity of main memory.

$C_2$  = cost of secondary memory.

$S_2$  = storage capacity of secondary memory.

If  $C_3$  is also present;

(Cache memory)  $C = \left( \frac{C_1 S_1 + C_2 S_2 + C_3 S_3}{S_1 + S_2 + S_3} \right)$

If particular word referenced by processor is present in main memory; then it is called 'hit' and if not then 'Miss'.

'Hit' refers to the probability of particular word referenced by processor, present in Main memory.

$$H = \left( \frac{N_1}{N_1 + N_2} \right)$$

$N_1$  = no. of address references made to  $M_1$  memory by the processor.

$N_2$  = " " " " "  $M_2$  " "

Miss =  $1 - H$ .

Average Access Time  $\Rightarrow t_A = H \cdot t_{A_1} + (1-H) t_{A_2}$ .

$t_{A_1}$  = access time for main memory.

$t_{A_2} = t_B + t_{A_1}$  = Access time of S.M.  $\rightarrow$  (Secondary memory)

$t_B$  = block transfer time from SM to M.M.

$$\therefore t_A = H t_{A_1} + (1-H)(t_B + t_{A_1}) \\ = t_B - H t_B + t_{A_1} \Rightarrow t_A = t_{A_1} + (1-H)t_B$$

① Access efficiency  $= \left( \frac{t_{A_1}}{t_A} \right)$

② Access ration  $= \left( \frac{t_{A_2}}{t_A} \right)$

$$\left( \frac{t_{A_2}}{t_A} \right) = \left( \frac{H t_{A_1} + (1-H)t_B}{t_{A_1}} \right)$$

$$\therefore \left( \frac{1}{e} \right) = H + (1-H)r$$

$$\therefore e = \left( \frac{1}{H + (1-H)r} \right)$$

$$e = \left( \frac{1}{((1-r)H + r)} \right)$$

Space Utilization  $\Rightarrow$

$u = \left( \frac{su}{s} \right)$   $su$  = space frequently used by processor.  
 (Space Utilization)  $s$  = total space.

1> If the block transferred from S.M. to M.M. is not used then it will be unutilized.

2> If the transferred block is smaller than the space present in M.M., so the remaining space is wasted. [Called 'empty region'].

→ To solve the problem of 'empty region'; we use the technique of 'paging'.

- Pages are blocks of equal size.

- 'Page Frame' is a structure where we are fitting our page.

### 19/9/14 Address Translations :

- Address should be mapped . Virtual address to physical address.

- can be done by programmers.
- can be done by compiler.
- can be done by loader.

↓  
object code that decides when to load.

- can be done by Memory Management Unit (MMU). at runtime.

Dynamic Memory Allocation → Stack, Queue.  
Static      "      "      ?

$$\text{Actual address} = \text{Base Add.} + \text{Displacement (offset)}$$



= Base address : offset D

Limit address.

While executing a procedure  $P_1$ , the processor may require more memory, hence it may overlap the memory allocated to program  $P_2$ . For preventing this, a 'limit address' is used to avoid overlapping.



Advantage of using Base Address : 1) size of memory reduced

$$BA = 00010H$$

$$D = 00003H$$

- relocation of blocks is performed for 'defragmentation'.

### Translation Look Aside Buffer :

- special buffer used for mapping virtual address to base address of real address.



TLB → address cache  
( static RAM )

## Segments and Pages →

- 1) Pages are the blocks of equal size.  
Segments' boundary is decided by the program.  
(variable length).
- 2) Permission (read/write) can be given by segments.  
While pages are not at consecutive memory loc.  
It's difficult but possible.
- 3) We may remove one segment, if another  
segment is reqd, its size must be  $\leq$  removed seg.  
This doesn't create a problem in case of pages as  
they are of equal sizes.
- 4) Internal fragmentation is present in segments.

215 KB data

50 KB of page size.

Then also ~~sub~~ blocks reqd.

(35 KB of space unavail.)



e.g.: Fragmentation using paging is called external fragmentation.

Segment is divided into pages, called paged segments.

Segment table:

| AV | AR | (Permissions) |
|----|----|---------------|
|----|----|---------------|

|                |         |  |  |  |
|----------------|---------|--|--|--|
| S <sub>1</sub> | X Y Z   |  |  |  |
| S <sub>2</sub> | X Y W   |  |  |  |
| S <sub>3</sub> | X Y Z Z |  |  |  |



If change bit is 1 then write operation is done from MM to SM.

- Smaller the size of page, smaller the wastage.
- However, the no. of pages would be increased.

## Optimum Page Size :

$$S = \frac{Sp}{2} + \frac{Ss}{Sp} \quad \text{--- (1)}$$

Sp = no. of entries.

Ss = segment's size.

$\frac{Sp}{2}$  = no. of entries per segment.

$\frac{Sp}{2}$  = space which is not utilized.

S → space overhead.

$$u = \left( \frac{Ss}{Ss + S} \right) = \left( \frac{Ss}{Ss + \left[ \frac{Sp}{2} + \frac{Ss}{Sp} \right]} \right)$$

$$= \left( \frac{2Sp \cdot Ss}{2Sp \cdot Ss + Sp^2 + 2Ss} \right)$$

To maximize, differentiate & equate to 0.

$$\Rightarrow \frac{dSs}{dSp} = \frac{1}{2} - \frac{Ss}{Sp^2} \Rightarrow$$

$$\Rightarrow 2Ss = Sp^2 \Rightarrow Sp = \sqrt{2Ss}$$

$$\Rightarrow U_{opt} = \left( \frac{2ss\sqrt{2ss}}{2ss\sqrt{2ss} + 2ss + 2ss} \right)$$

$$U_{opt} = \frac{\sqrt{2ss}}{\sqrt{2ss} + 2}$$

$$\Rightarrow U_{opt} = \frac{1}{1 + \frac{\sqrt{2}}{\sqrt{ss}}}$$

### Functions of Memory Management Unit:

1) Allocate & deallocate the block.

- \* Available list: Contains the - address
  - free space available (at that address)



- \* Occupied list: Contains the address at which the space is occupied (includes size).

- \* Directory list: Directory means programs and segments related to it. It also contains the address of blocks if not present in M.M.

Preciptive Memory Allocation: Existing blocks are removed and replaced with new ones.

Non-promptive memory allocation: Existing blocks are not disturbed but fit into free space.

1) First fit: In M.M., M.M.U will scan the available list and the free space.



- M.M.U would allocate 70 KB to the available place, whichever it finds first.
- fragmentation is much more.
- 'processor time' is less.

2) Best fit: 70 KB is placed in the ~~75 KB~~ block. Only 5 KB wastage. Hence, less fragmentation and higher processing time.

### Preciptive Memory Allocation:

1) Compaction Allocation method →

- Here smaller blocks are combined to make it into a bigger block.

2) Demand Swapping: Processor demands for certain blocks. If not present, then those blocks are brought in from SM to MM. 'Miss' occurs.

$$t_A = t_B + t_{A_1}$$

$t_A$  = total access time.

$t_B$  = block transfer time from SM to MM.

$t_{A_1}$  = Access time from main memory.

If 'hit' then  $t_A = t_{A_1}$

3) Anticipatory Swapping: By prediction, some blocks/subsequent pages (may be reqd.) by the processor. These are placed from SM to MM. As size of MM is limited; the unutilized blocks are removed from MM.

4) Replacement policies:

FIFO (First In First Out): The oldest block is removed.

LRU (Least Recently Used): Some of the older blocks may be useful to the processor. So the least used blocks are removed first.

Time: 1 2 3 4 5 6 7 8

Address Space: 2 3 2 1 5 2 4 5



LRU  $\rightarrow$  3 hits ; FIFO  $\rightarrow$  2 hits.

$$\text{Hit ratio} = \frac{3}{8} ; = \frac{2}{8} = (0.25)$$

$$= 0.375$$

$\rightarrow$  Implementation of LRU is more complex than FIFO.

Chapter 7

BMG



MIMO → Multiple instrm streams, multiple data streams.

## Communication Methods →

- IntraSystem Communication. [within browser]
  - InterSystem communication. (internet)
    - FM
    - Carriers.

## Computer Networks

- Store & forward (entire msg sent at once if resources are available)
  - Packet Switching  
  - LAN
  - WAN

## OSI (Open System Interconnection)

↓  
guidelines for connection & transfer data

CSMA → (Carrier Sense Multiple Access)  
(Collision Detection)

CD → (Collision Detection)  
    ( in )  
→ ( Ethernet )

Ethernet → is a standard over which various computers can be interconnected.  
(follows 'Bus' topology)

- Topologies → 1> Ring 2> Mesh 3> Star  
4> Fully connected 5> Line 6> Tree 7> Bus

If multiple computers connected in 'Bus' topology are sending data simultaneously then collision occurs.

Internet → www... [has TCP/IP]

internet → not TCP/IP ; (not www...)

### IO and System Control

- 1> IO-mapped IO
- 2> Memory-mapped IO

### DMA (Direct Memory Access)

- increase speed of IO transfers by reducing the role of CPU.

### Operating System



FIFO  
LIFO

### Fault Tolerance

- Hardware redundancy (no articulation point)
- Software redundancy
- Information redundancy
- Time redundancy

make multiple copies of the same data.  
(backups of the server maintained)

### Reliability

H/N : minimum no. of times the HW is failing.

S/I/W / network : whatever data is being sent should be received in the exact same state.

### Data Path

- Path forwarding → whenever data is available in the intermediary stage, the next instn should run only after all the previous instruction have run (refer to pipelining).

1> (Read A, B)  
2> (ADD A, B)  
3> Read Accumulator

Step 3 should be run only after step 2 has been executed.

- Register renaming → Renaming of registers is done in such a way that the efficiency is improved by eliminating the redundancy.

Ex: 1> B := B - C  
2> C := 50  
3> D := D - C

data  
> no dependency between ① & ②  
(which hazard)

$$\text{II} \quad \begin{aligned} 1 &> B = B - C \\ 2 &> A := A + B \end{aligned}$$

- Here both forwarding would be used.
- Content can be fetched from Accumulator (where  $B-C$  would be calculated)
- thus renaming of  $C$  to ' $+B$ ' would be helpful.

III

$$B := B - C$$

$$C := S50$$

$$D := D - C$$

$$A := 1501$$

$$A := A - C$$

Q Can the number of registers used be minimized?

→ In here if the results are not to be lost then the number of registers can't be reduced.

30/9/14

Interrupts → Temporarily suspend the mainline program to serve another subroutine. comes back to the main program after completing the subroutine.

Q When do interrupts occur?



Q When do we switch over to the interrupt subroutine?  
→ After every instruction cycle, we check the presence of interrupts.

\* In DMA, we check for interrupts after every machine cycle.

$M_1 | M_2 | M_3$

\* In pipelining, if we have interrupts, then; no instrn would be over until the program is over.

1 2 3 4 5 6



⇒ Whenever we check for interrupt;

- we store all the registers in stack.
- we store PC
- then we switch to ISR.

If we have nested ISR's then again the whole procedure is followed for the 2nd interrupt.

If there is some problem in the flow of program to ISR and vice versa, then they are called imprecise interrupt.

If the flow is proper then its called precise interrupt.

Eg: Consider Direct Mapped Cache of size 32 kbytes with clock size of 32 bytes. The CPU generates 32 bit address. Find out no. of bits needed for cache indexing and no. of bits for tagbits).

