

- \* Computer Architecture  $\Rightarrow$  It refers to those attributes of a system visible to a programmer.  
 $\Rightarrow$  Attributes that have a direct impact on logical execution of a program.
  - Designing of Computer
  - Designer's point of view
- \* Eg. of Architectural attributes  $\Rightarrow$  instruction set, no. of bits used to represent various datatypes, I/O mechanism, techniques for addressing memory etc.
- \* Computer Organisation  $\Rightarrow$  It refers to operational units and their interconnections that realize architectural specs.
  - Logic and circuits
  - Programmers point of view
- \* Eg. of Organizational attributes  $\Rightarrow$  Hardware details that are transparent to programmers, interfaces b/w peripherals and computer, memory technology used.
- # COMPUTER  $\Rightarrow$  Complex system, contemporary computers contain millions of elementary electronic components.
- \* Due to complexity  $\Rightarrow$  we need to know hierarchical nature of most complex systems.

\* Designers is concerned with 2 aspects =>

- (A) Structure => the way in which components are interconnected.
- (B) Function => operation of each individual component as a part of structure.

# Functional Description =>

\* These are 4 basic functions of computers =>

- (1) Data processing
- (2) Data storage
- (3) Data movement
- (4) Control

\* functions are =>  
Bidirectional



# Structure Description =>

\* 4 main structural components =>

- (1) CPU => ○ controls the operation of computer  
○ performs data processing through processors.
- (2) Main memory => stores data
- (3) I/O => moves data b/w computer & external environment.
- (4) system interconnection => communication path (bus) b/w components of computer.

\* Structural components of CPU =>

- (1) Control unit => controls operations of CPU / computer.
- (2) ALU => performs data processing functions.
- (3) Registers => provides storage internal to CPU.
- (4) CPU interconnection => communication among components of CPU.

\* Control Unit => ○ sequencing logic

- Control unit registers & decoders
- control memory.

# 3 principles characteristic of Computers =>

- (A) It responds to a specific set of instructions in a well-defined manner.
- (B) It can execute a pre-recorded list of instructions.
- (C) It can quickly store & retrieve large amounts of data.

\* First use of word "computer" => 1613

\* History of calculating techniques =>

=> Tally sticks → Abacus → Napier's Bones → Slide Rule  
(1614) (1622)

Arithmometer ← Jacquard Loom ← Stepped Reckoner ← Pascaline  
(1620) (1661) (1672) (1642)

↳ Analytical Engine → etc....  
(1622-34)

### \* Computer Generations $\Rightarrow$ (5)

- (A) First Generation (1946-1958)  $\Rightarrow$  Vacuum tubes
- (B) Second Generation (1959-1964)  $\Rightarrow$  Transistors (40 Vac. tubes)
- (C) Third Generation (1965-1970)  $\Rightarrow$  Integrated circuits (ICs)
- (D) Fourth Generation (1971-today)  $\Rightarrow$  Microprocessors
- (E) Fifth Generation (today-future)  $\Rightarrow$  Artificial Intelligence

### # Mosse's Law $\Rightarrow$

\* Gordon Moore (cofounder of Intel) in 1965 observed that the no. of transistors that could be put on a single chip was doubling every year.

\* This pace slowed to 18 months, but has sustained it ever since.

#### ○ Consequences of Mosse's law $\Rightarrow$

- (1) Cost of chip has remained virtually unchanged during period of rapid growth in density  $\Rightarrow$  It means cost of computer logic and memory circuitry has fallen at a dramatic rate.
- (2) As logic and memory elements are placed closer together on more densely packed chips  $\Rightarrow$  Electrical path length is shortened  $\Rightarrow$  operating speed increases.
- (3) Reduction in power and cooling requirements.

(4) Computer becomes smaller  $\Rightarrow$  more convenient to place in a variety of environments.

(5) Interconnections on IC are much more reliable than before.  
 $\Rightarrow$  With more circuitry on single chip, interchip connections are less.

### # Von Neumann $\Rightarrow$

\* John Mauchly and J. Presper Eckert conceived of an easier way to change the behaviour of their calculating system.

$\Rightarrow$  They reckoned that memory devices, in the form of MERCURY DELAY LINES, could provide a way to store program instructions.

$\Rightarrow$  They documented their idea proposing it as foundation of their next computer  $\Rightarrow$  EDVAC.

○ After reading their proposal, a Hungarian mathematician John von Neumann published the idea so effectively that all stored program computers have come to be known as von Neumann systems.

$\Rightarrow$  History credited him.

\* Today's version of stored computers satisfies atleast these following characteristics =>

(1) Consists of these hardware systems ]

=> CPU, ALU, Registers, main memory & an I/O system.

(2) Capacity to carry out sequential instruction processing.

(3) Contains a single path b/w main memory and CU of CPU,

forcing alteration of instruction and execution cycles.

=> this single path known as => von Neumann Bottleneck.

\* von Neumann Architecture & working =>

=> running of program is called

(von Neuman execution cycle / fetch-decode execution cycle).

(1) CU fetches next program instruction from memory, using program counter where instruction is located.

(2) Instruction is decoded into a language that ALU can understand.

(3) Any data operands required to execute instruction is fetched from memory and placed into registers in CPU.

(4) ALU executes the instructions and places the results in memory or registers.



○ Drawback of earlier von Neumann Architecture =>

=> Only a single path (bus) was available. } slow working of

\* To overcome this ] system

Modified von Neumann Architecture =>

○ That architecture has been streamlined => System Bus Model

\* Working =>

- (1) Data bus moves data from main-memory to CPU registers (vice versa)
- (2) Address bus holds address of data that data bus is currently accessing.
- (3) Control bus carries necessary control signals that signifies how info transfer is to take place.



# Non - Von Neumann Architecture =>

\* Uses the concept of Parallel Computing.

Concept => If computer is not powerful enough ]

○ Instead of trying to develop a faster / powerful computer,

use MULTIPLE COMPUTERS.

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

.....

## # Evolution of Intel x86 Architecture =

- 8080 ⇒ ○ World's first general purpose microprocessor.
  - 8-bit machine, 8 bit data path to memory.
- 8086 ⇒ ○ 16-bit machine
  - supported an instruction cache (that prefetches the instructions before their execution).
  - 1st appearance of x86 arch.
- 80286 ⇒ ○ Extension of 8086 enabled addressing a 16-Mbyte memory instead of 1-Mbyte.
- 80386 ⇒ ○ Intel's first 32-bit machine ⇒ rivaled complexity
  - 1st processor to support multitasking, i.e. running multiple programs at same time.
- 80486 ⇒ ○ introduced more powerful cache technology.
  - offered in-built math coprocessor.
- Pentium ⇒ ○ Intel introduced use of superscalar techniques that allow multiple instructions running in parallel.
- Pentium Pro ⇒ ○ Aggressive use of registers naming, branch prediction, data flow analysis.
- Pentium II ⇒ ○ Incorporated MMX technology, designed to process video, audio and graphics data efficiently.

- Pentium III ⇒ ○ additional floating point instructions to support 3-D graphics.
- Pentium 4 ⇒ ○ more additional floating point.
- Core ⇒ ○ first Intel up with dual core, 2 processors on single chip.
- Core 2 ⇒ ○ 64 bits
  - Core 2 Quad provides 4 processors on single chip.
- \* In 1978, 8086 ⇒ ○ clock speed of 5MHz
  - 29000 transistors
- In 2008, Core 2 Quad ⇒ ○ clock speed of 3GHz (speedup factor by 600)
  - 820 million transistors (2800 times of 8086)

## # Data Representation in Computer Systems $\Rightarrow$

\* System can only understand 0/1.

\* Signed Number Representation  $\Rightarrow$

① n-bit vector,  $B = b_{n-1} \dots b_2 b_1 b_0 \Rightarrow$  Eg.  $\Rightarrow$  4 bit no.  
 $\Rightarrow$  value of  $b_i$  can be 0/1.

\* We can represent both +ve and -ve nos.

$\Rightarrow$  for this we have three representations  $\Rightarrow$

- (1) Sign-and magnitude
- (2) 1's compliment
- (3) 2's compliment.

\* In sign magnitude  $\Rightarrow$  ① if MSB is 0  $\Rightarrow$  +ve no.

$\downarrow$   
② if MSB is 1  $\Rightarrow$  -ve no.  
(Pproblem)

$\hookrightarrow$  it represents 0 as +0(0000) & -0(1000)  
which is not appropriate.

$\Rightarrow$  we find this same problem in 1's compliment.

\* But in 2's comp.  $\Rightarrow$  no such problem

\*  $\therefore$  2's comp. is used efficiently in our systems for calculations.

$\Rightarrow$  In signed  $\Rightarrow$  0, +ve, -ve nos. } everything same for

$\Rightarrow$  In unsigned  $\Rightarrow$  0 & +ve nos. } +ve numbers.

$$\text{Cases} \Rightarrow (1) +65 \quad \begin{array}{r} 01000001 \\ +5 \\ \hline +70 \end{array}$$

$$+ \underline{\underline{00000101}} \\ \underline{\underline{01000110}} \Rightarrow (70)_0 \Rightarrow \underline{\underline{\text{Ans.}}}$$

$$(2) +65 \quad \begin{array}{r} 01000001 \\ -5 \\ \hline +60 \end{array}$$

$$+ \underline{\underline{111111011}} \\ \underline{\underline{001111100}}$$

$$\Rightarrow \underline{\underline{\text{Ans.}}} \quad \left. \begin{array}{l} * 2\text{'s comp. of } -5 \\ 11111011 \end{array} \right\}$$

(discarded)

$$(3) -65 \quad \begin{array}{r} 10111111 \\ +5 \\ \hline -60 \end{array}$$

$$+ \underline{\underline{000000101}} \\ \underline{\underline{11000100}}$$

$$\downarrow \quad \begin{array}{r} 00111011 \\ +1 \\ \hline 00111100 \end{array}$$

$$\Rightarrow \underline{\underline{\text{Ans.}}} \quad \left. \begin{array}{l} * 2\text{'s comp. of } -65 \\ 10111111 \end{array} \right\}$$

$$(4) -65 \quad \begin{array}{r} 10111111 \\ -5 \\ \hline -70 \end{array}$$

$$+ \underline{\underline{11111010}} \\ \underline{\underline{10111010}}$$

$$\downarrow \quad \begin{array}{r} 01000101 \\ +1 \\ \hline 01000110 \end{array}$$

$$\Rightarrow \underline{\underline{\text{Ans.}}} \quad \left. \begin{array}{l} * 2\text{'s comp. of } -70 \\ 10111111 \end{array} \right\}$$

(discard)

# ADDER  $\Rightarrow$  \* Basic unit of ALU.

\* Half Adder  $\Rightarrow$



\* Full Adder  $\Rightarrow$



(A) Ripple Carry Adder  $\Rightarrow$



\* Here C<sub>output</sub> of FA0 will become C<sub>input</sub> of each full adder.

∴ FA1 has to wait for the output of FA0  $\Rightarrow$  lons

\* it will take for full sum  $\Rightarrow$  40ns = 4 bit no.

$\Rightarrow$  Processing speed is more

∴ cannot use this in system.  $\times$

(B) Carry Look Ahead Adder / Carry Propagation Adder  $\Rightarrow$

$\Rightarrow$  propagates the carry

\* 2 half adders  $\Rightarrow$  1 full adder



$$\Rightarrow C_{i+1} = \bar{x}_i y_i + C_i \bar{x}_i + C_i y_i$$

$$= \underbrace{\bar{x}_i y_i}_{G_i} + C_i \underbrace{(\bar{x}_i + y_i)}_{P_i}$$

$$* G_i = \bar{x}_i y_i$$

$$P_i = \bar{x}_i + y_i$$

\* for i=0  $\Rightarrow$  C<sub>1</sub> = G<sub>0</sub> + C<sub>0</sub>P<sub>0</sub>  $\Rightarrow$  depending on first carry (initial)

\* for i=1  $\Rightarrow$  C<sub>2</sub> = G<sub>1</sub> + C<sub>1</sub>P<sub>1</sub>

$$= G_1 + (G_0 + C_0 P_0) P_1$$

$$= G_1 + G_0 P_1 + P_0 P_1 C_0 \Rightarrow \text{still depending on } C_0$$

$\Rightarrow$  Carry Look Ahead adder is better

$\Rightarrow$  can be used in systems.

① Understanding  $\Rightarrow \bar{x}_i y_i + C_i (\bar{x}_i \oplus y_i)$

$$\text{of } C_{i+1} \text{ eqn. } \Rightarrow \bar{x}_i y_i + C_i (\bar{x}_i \bar{y}_i + \bar{x}_i y_i)$$

$$= \underbrace{\bar{x}_i y_i}_{(A+\bar{A}B=A+B)} + \underbrace{C_i \bar{x}_i \bar{y}_i}_{\bar{x}_i \bar{y}_i} + \underbrace{C_i \bar{x}_i y_i}_{C_i \bar{x}_i y_i}$$

$$(A+\bar{A}B=A+B)$$

$$= \bar{x}_i (y_i + \bar{y}_i c_i) + C_i \bar{x}_i y_i$$

$$= \bar{x}_i (y_i + c_i) + C_i \bar{x}_i y_i$$

$$= \bar{x}_i y_i + \underbrace{C_i c_i + C_i \bar{x}_i y_i}_{C_i c_i + C_i \bar{x}_i y_i}$$

$$= \bar{x}_i y_i + C_i (\bar{x}_i + \bar{x}_i y_i) \Rightarrow (A+\bar{A}B=A+B)$$

$$= \bar{x}_i y_i + C_i (\bar{x}_i + y_i)$$

$$= \boxed{\bar{x}_i y_i + C_i \bar{x}_i + C_i y_i}$$

# Arithmetic Operations =

\* 2 things needed to perform operation =

(1) Hardware

$$* A - B \Rightarrow A \leftarrow A + \bar{B} + 1$$

(2) Algorithm

|             | Magnitude<br>Addition | Magnitude Subtraction |         |         |
|-------------|-----------------------|-----------------------|---------|---------|
|             | A > B                 | A < B                 | A = B   |         |
| (+A) + (+B) | + (A+B)               |                       |         |         |
| (+A) + (-B) |                       | + (A-B)               | - (B-A) | + (A-B) |
| (-A) + (+B) |                       | - (A-B)               | + (B-A) | ± (A-B) |
| (-A) + (-B) | - (A+B)               |                       |         |         |
| (+A) - (+B) | + (A-B)               | - (B-A)               | + (A-B) |         |
| (+A) - (-B) | + (A+B)               |                       |         |         |
| (-A) - (+B) | - (A+B)               |                       |         |         |
| (-A) - (-B) | - (A-B)               | + (B-A)               | + (A-B) |         |

\* Hardware =



\* Algorithm = (Addition & Subtraction)



\* Multiplication  $\Rightarrow$  (multiplicand  $\times$  multiplier) = product

$$\begin{array}{r}
 23 \Rightarrow 10111 \quad (\text{multiplicand}) \\
 \times 19 \Rightarrow 10011 \quad (\text{multiplier}) \\
 \hline
 10111 \\
 10111 \\
 10111 \times \\
 00000 \\
 00000 \\
 10111 \times \times \\
 \hline
 110110101 \Rightarrow 437 = \underline{\underline{\text{Ans.}}}
 \end{array}$$

\* Same as simple multiplication.

\* Hardware  $\Rightarrow$



\* we have taken 2 registers A and Q.

as the product is very high i.e. more bits.

Final value of product = multiplicand  $\times$  multiplier  $\Leftrightarrow$  product

Product =  $(\text{multiplicand}) \times (\text{multiplier})$

(Multiplication)  $\Rightarrow$  Binary multiplication is similar to simple multiplication on paper.

④ Eg  $\rightarrow$

$$\begin{array}{r} 10111 \text{ (multiplicand)} \\ \times 10011 \text{ (multiplier)} \\ \hline 110110101 \text{ (product)} \end{array}$$

\* Two things required are again same  $\Rightarrow$  Hardware and Algorithm.

\* Hardware  $\rightarrow$



\* Sequence counter (SC) is the no. of bits ( $n$ ).

| Multiplicand B = 10011               | E | A     | Q     | SC     |
|--------------------------------------|---|-------|-------|--------|
| * multipliers in Q<br>$Q_n=1$ , AddB | 0 | 00000 | 10011 | 101(5) |
| * first partial product              | 0 | 10111 |       |        |
| * shift right EAQ                    | 0 | 01011 | 11001 | 100(4) |
| $Q_n=1$ , AddB                       | 1 | 10111 |       |        |
| * second partial product             | 0 | 00010 |       |        |
| * shift right EAQ                    | 0 | 10001 | 01100 | 011(3) |
| * $Q_n=0$ , shift right EAQ          | 0 | 01000 | 10110 | 010(2) |
| * $Q_n=0$ , shift right EAQ          | 0 | 00100 | 01011 | 001(1) |
| * $Q_n=1$ , addB                     | 1 | 10111 |       |        |
| * fifth partial product              | 0 | 11011 |       |        |
| * shift right EAQ                    | 0 | 01101 | 10101 | 000(0) |

\* Final product in AQ = 0110110101

④ multiply Algorithm

multiplicand in B  
multipliers in Q

$$\begin{aligned} A_S &\leftarrow Q_S + B_S \\ Q_S &\leftarrow Q_S + B_S \\ A &\leftarrow 0, E \leftarrow 0 \\ SC &= n \end{aligned}$$



\* Another method  $\Rightarrow$

# Booth Multiplication Algo.  $\downarrow$

④ Hardware  $\downarrow$



\* Example Table on next Page

Page ---

\* Difference b/w Normal & Arithmetic Shift right  $\Rightarrow$

④ Eg.  $\Rightarrow$  1001

$\Rightarrow$  normal shr  $\Rightarrow$  0100

$\Rightarrow$  arithmetic shr  $\Rightarrow$  1100

| $Q_n$ | $Q_{n+1}$ | $BR = 10111$ | $\overline{BR} + 1 = 01001$ | AC    | QR    | $Q_{n+1}$ | SC  |
|-------|-----------|--------------|-----------------------------|-------|-------|-----------|-----|
| 1     | 0         | Initial      |                             | 00000 | 10011 | 0         | 101 |
|       |           | Subtract BR  |                             | 01001 |       |           |     |
| 1     | 1         |              |                             | 01001 | 11001 | 1         | 100 |
| 0     | 1         |              | Add BR                      | 00100 | 01100 | 1         | 011 |
| 0     | 0         |              |                             | 10111 |       |           |     |
| 1     | 0         |              |                             | 11001 |       |           |     |
|       |           |              |                             | 11100 | 10110 | 0         | 010 |
|       |           |              |                             | 11110 | 01011 | 0         | 001 |
|       |           |              | Subtract BR                 | 01001 |       |           |     |
|       |           |              |                             | 00111 |       |           |     |
|       |           |              |                             | 00011 | 10101 | 1         | 000 |

\* Hardware =



# [UNIT 2]  $\Rightarrow$  Memory Location and Addresses  $\Rightarrow$

\* Smallest unit  $\Rightarrow$  0/1 (bit)

4 bits = nibble

8 bits = byte

16 bits or more = word

$$2^{10} = 1K$$

$$2^{20} = 1M$$

$$2^{30} = 1G$$



\* For n-bit Address bus  $\Rightarrow 2^n$  memory locations can be addressed.

The range of addresses is from 0 to  $2^n - 1$ .

④ Eg.  $\Rightarrow$  for 16 bit address  $\Rightarrow 2^{16} = 2^{10} \times 2^6 = 64K$  locations

and range will be 0 to  $(2^{16} - 1)$ .



\* 2 formats to store Data =>



\* Eg. => 10011 will be stored as =>

① 10011 in Big Endian

② 11001 in Little Endian.

\* Memory Operation =>

① Read operation

② Write operation

\* Instruction and Instruction Sequencing =>

=> Computer must have instructions performing 4 types of operations =>

- (1) Data transfer b/w memory & processor registers
- (2) Arithmetic & logic operations on data
- (3) Program sequencing & control
- (4) I/O transfers.



\* Complex Instruction Set Computer (CISC) =>

=> CISC is a computer that can execute several low-level instructions.

=> Capable of multioperations and addressing modes within single instructions. Eg. => PDP-11, VAX, Motorola 68K, PCs on intel x86.

\* Reduced Instruction Set Computer (RISC) =>

=> RISC is a computer that uses only a single instruction that can be divided into multiple instructions that perform low level operations within single clock.

=> Eg. => AMD 29K, ARC, intel i860, 960 etc.

# Types of Instructions on basis of Address Used =>

(1) Three Address Instructions => (OpCode, Destination, Source)

① Eg. => Program to evaluate =>  $(A+B)*(C+D)$

ADD R<sub>1</sub>, A, B ( $R_1 \leftarrow A+B$ )

ADD R<sub>2</sub>, C, D ( $R_2 \leftarrow C+D$ )

MUL X, R<sub>1</sub>, R<sub>2</sub> ( $X \leftarrow R_1 * R_2$ )

\* results in short programs

\* instruction becomes long

(2) Two Address Instructions =>

① Eg. => Program =>  $(A+B)*(C+D)$

$\text{MOV R1, A}$  ( $R1 \in m[A]$ )  
 $\text{ADD R1, B}$  ( $R1 \in R1 + m[B]$ )  
 $\text{MOV R2, C}$  ( $R2 \in m[C]$ )  
 $\text{ADD R2, D}$  ( $R2 \leftarrow R2 + m[D]$ )  
 $\text{MUL R1, R2}$  ( $R1 \leftarrow R1 * R2$ )  
 $\text{MOV X, R1}$  ( $m[X] \leftarrow R1$ )

(3) One Address Instructions  $\Rightarrow$  uses an implied AC register for all data manipulation.

$\circ$  Eg.  $\Rightarrow (A+B)* (C+D)$   
 $\text{LOAD A}$  ( $AC \in m[A]$ )  
 $\text{ADD B}$  ( $AC \leftarrow AC + m[B]$ )  
 $\text{STORE T}$  ( $m[T] \leftarrow AC$ )  
 $\text{LOAD C}$  ( $AC \in m[C]$ )  
 $\text{ADD D}$  ( $AC \leftarrow AC + m[D]$ )  
 $\text{MULT T}$  ( $AC \leftarrow AC * m[T]$ )  
 $\text{STORE X}$  ( $m[X] \leftarrow AC$ )

(4) Zero Address Instructions  $\Rightarrow$  (TOS  $\Rightarrow$  Top of Stack)

$\text{PUSH A}$  (TOS A)  
 $\text{PUSH B}$  (TOS B)  
 $\text{ADD}$  (TOS (A+B))  
 $\text{PUSH C}$   
 $\text{PUSH D}$   
 $\text{ADD}$  (TOS (C+D))  
 $\text{MUL}$  (TOS (C+D)\*(A+B))

\* stack is used.

\* Notations  $\Rightarrow$

- Arithmetic Expression  $\Rightarrow A+B$
- Infix Notation  $\Rightarrow A+B$
- Postfix / Reverse Polish notation  $\Rightarrow AB+$  (most suitable)
- Prefix / Polish notation  $\Rightarrow +AB$

\* Eg.  $\Rightarrow A * B + C * D \Rightarrow AB * CD * +$

\* Eg.  $\Rightarrow (3*4)+(5*6) \Rightarrow 34*56*+$

$\Rightarrow$  On stack =



\* Addressing Modes  $\Rightarrow$

(1) Implicit  $\Rightarrow$  Eg.  $\Rightarrow$  INCR (increment)

COMP (complement)

(2) Immediate  $\Rightarrow$  Data is directly transferred to a memory / registers

location. Eg.  $\Rightarrow \text{MOV A, } \#32H$

$\text{ADD A, } \#22H$

(3) Direct Addressing  $\Rightarrow$





\* whatever at address 26, will be moved to A.

(A) Base registers  $\Rightarrow R \rightarrow \text{index}$

(B) Indexing  $\Rightarrow A \rightarrow \text{starting base add.}$   
 $R \rightarrow \text{index}$ .

\* 

|        |      |         |
|--------|------|---------|
| opcode | mode | operand |
|--------|------|---------|

○ Mode  $\Rightarrow$  The way operands are chosen during program execution is dependent on addressing mode of instruction.

- $\Rightarrow$  An addressing mode specifies how to calculate effective memory address of an operand by using info. held in registers and / or constants contained within an instruction.
- $\Rightarrow$  Addressing mode specifies a rule for interpreting or modifying address field of instruction before operand is actually reserved.

(A) Implied Addressing Mode  $\Rightarrow$

$\Rightarrow$  Operands are specified implicitly in definition of instruction.

○ Eg  $\Rightarrow$  Complement Accumulators

Comp A  $\Rightarrow$  acc. is complemented  
\* All register reference instructions that use an Accumulator are implied mode instructions.

(B) Immediate Addressing Mode  $\Rightarrow$  No Address field  $\times$

\* Operand field  $\ominus$

○ Eg.  $\Rightarrow$  LOAD R1, #50H

- Advantage  $\Rightarrow$  (1) fastest Execution  
(2) no memory reference to fetch data.
  - Disadvantage  $\Rightarrow$  (1) less flexible  
(2) Low range
  - (C) Direct Addressing mode  $\Rightarrow$  Operands resides in memory, and its address is given directly by address field of instruction.
  - Eg.  $\Rightarrow$  LOAD R1, [1000]  
 $R1 \leftarrow H[1000]$
  - Advantage  $\Rightarrow$ 
    - single memory ref. to access data.
    - no additional calc. to calculate effective address.
  - (D) Indirect Addressing Mode  $\Rightarrow$  Address field of instruction given the address where the effective address is stored. (address of operand).
  - $\Rightarrow$  Control fetches the instruction from memory and uses its address part to access memory again to read eff. address.
- Add  $\rightarrow$  Adds.  
 $\boxed{\quad} \rightarrow \boxed{\quad}$   
 $\downarrow$   
 (operand)



E.A.  $\Rightarrow$  Address stored in register

\* Eg.  $\Rightarrow$  LOAD R1, B



○ Advantages 2

(1) less no. of bits are required to specify registers.

(2) one fewer memory access than indirect addressing.

(G) Autoincrement / Autodecrement Addressing mode  $\Rightarrow$

MOV R1, #R2 | Relative Addressing Mode

INR RI

DEC RI

(H) Relative Addressing Mode  $\Rightarrow$  Content of program counter (PC) is added to address part of inst. in order to obtain E.A.

$$E.A. = PC + \text{Address part of instruction}$$

$\Rightarrow$  E.As position in memory is relative to the address of the next instruction.

$\Rightarrow$  This mode is often used with branch type instruction where the branch address is in the area surrounding instruction.

○ Eg.  $\Rightarrow$  JUMP.

(I) Index Addressing Mode  $\Rightarrow$  Content of an index register is added to the address part of instruction to obtain two E.A.

$\Rightarrow$  Index register stores an index value.

$\Rightarrow$  Address field of instruction defines the starting address (base) of data array in memory.

$\Rightarrow$  Each operand in array is stored in memory relative to base address.



$\Rightarrow$  LOAD R, 1000[B]

$\downarrow$  (base add.)  $\rightarrow$  (Displacement)

(J) Base Register Addressing Mode  $\Rightarrow$  Base register holds a base address and address field of instruction gives a displacement relative to base address.

LOAD RI, X[200]

$\downarrow$  (displacement)

Scanned by TapScanner

\* Execution of a Complete Instruction  $\Rightarrow$

$\Rightarrow$  Single bus architecture.



\* Execution  $\Rightarrow$  (1) Fetch the instruction from memory to IR.

- (2) Fetch the 1st operand from memory.
- (3) Perform addition operation
- (4) Load result into RI.

$$\left. \begin{array}{l} \text{ADD } (R_3), R_1 \\ R_3 - 1000 \\ R_1 - 30 \end{array} \right\}$$

\* Control Sequence  $\Rightarrow$

- (1) PCout, MARin Read, Select4, Add, Zin
- (2) Zout, PCin, WMFC (wait for memory functn. to compl.)
- (3) MDRout, IRin
- (4) R3out, MARin, Read
- (5) R1out, Y, WMFC
- (6) MDRout, Select4, Add, Zin
- (7) Zout, RIin, EN (end).

# Hardwired v/s Microprogrammed CPU  $\Rightarrow$

(Design of control unit)

$\Rightarrow$  To execute an instruction, control unit of CPU must generate the seq. signal in proper sequence.

\* Hardwired CU  $\Rightarrow$  In this, control logic is implemented with gates, flipflops, decoders and digital circuits.

\* fixed logic circuit



=> Hardwired CU consists of =>

- (1) Instruction registers
- (2) No. of control logic gates
- (3) Two decoders
- (4) 4-bit seq. counter

\* Hardwired is faster than u-prog. control.

\* Controllers using this approach can approach at high speed.

\* For large CU => this organization is complicated.

\* Modification of all combinational circuit is very difficult.

\* Microprogrammed CU =>



Next address information

# Pipelining =>

\* Instruction Pipeline => In instruction pipeline, a stream of instruction can be executed by overlapping fetch, decode and execute phases of an instruction cycle.

\* allows us to execute multiple instructions in a single amnt. of time.

○ throughput => no. of instructions / per unit of time.

=> This type of pipelining is used to increase the throughput of computer system.

\* Process of execution of instruction involves the foll. steps =>

- (1) fetch the inst. from main memory.
- (2) Decode the instruction.
- (3) fetch the operands
- (4) Execute the decoded instruction.



\* Advantages => (1) more efficient use of processor.

(2) It can save circuitry.

\* Disadvantages => (1) Pipelining involves adding ALU to the chip which increase cost of system.

(2) inability to continuously run the pipeline at full speed because of pipeline hazard, which disrupt the smooth execution of pipeline.

○ Pipelining is an implementation technique where multiple instrn. are overlapped in execution.

\* Used where operation can be partitioned into subtasks.

\* To improve CPU performance  $\Rightarrow$  2 options  $\Rightarrow$

- (1) Improve hardware by introducing faster circuits, which is costly.
- (2) Pipelining.

\* Example  $\Rightarrow$



\* Design of Pipeline  $\Rightarrow$

$\Rightarrow$  A pipeline has 2 ends  $\Rightarrow$  the ilp end and olp end. B/w these ends there are multiple stages such that olp of one stage is connected to ilp of next stage and each stage performs specific operation.

\* Performance of Pipeline  $\Rightarrow$

- \* Speed up ratio ( $S_R$ )
- \* frequency ( $f$ )
- \* Efficiency ( $E_R$ )
- \* Throughput ( $H_R$ )

\* Speed up ratio = non pipeline execution time / pipelined exec. time

$$S_R = \frac{n \times t_n}{(R+n-1) \times T}$$

Inst. 1



Inst. 2



Inst. 3



IF  $\Rightarrow$  inst. fetch

ID  $\Rightarrow$  inst. decode

OF  $\Rightarrow$  operand fetch

OD  $\Rightarrow$  operand decode

OS  $\Rightarrow$  result store

$$T = \max_i \left( \frac{t_i}{\tau} \right) + T_0$$

Speed up ratio ( $S_R$ )

$$\frac{n \times t_n}{(R+n-1) \times T}$$



$$* If n \gg R \text{ so } R+n-1 \approx n \Rightarrow (S_R) = \frac{n \times t_n}{n \times T} = \frac{t_n}{T}$$

\* Types of Pipelines  $\Rightarrow$

(1) Arithmetic Pipelining  $\Rightarrow$  It is mostly used and found in most of the computers.

$\Rightarrow$  used for floating point arithmetic operations, multiplication of fixed point nos. etc.

\* It can be applied to various complex and slow arithmetic ops. to speed up processing time.

\* Eg.  $\Rightarrow$  floating point Arithmetic Pipelining.



$$* \text{frequency}(f) = \frac{1}{T}$$

$$* \text{Efficiency } (E_R) = \frac{S_R}{R}$$

$$S_R = \frac{n \times t_n}{(R+n-1) \times T}$$

$$t_n = R \times t_p \text{ or } R \times T$$

$$E_R = \frac{S_R}{R} = \frac{n}{R+n-1}$$

$$* \text{Throughput } (H_R) = \frac{n}{(R+n-1) T}$$

$$\frac{n}{R+n-1}$$

\* Amdahl's Law  $\Rightarrow$  \* used to obtain performance gain.

\* It states that the performance improvement to be gained from using some faster mode of execution is limited by fraction of time the faster mode can be used.

\* Speed up =  $\frac{\text{Performance for entire task using enhancement when possible}}{\text{--- without using enhancement.}}$

OR

Speed up =  $\frac{\text{Execution time without enhancement}}{\text{Exe. time using enhancement.}}$

$\Rightarrow$  Amdahl law gives a quick way to find the speedup from some enhancement which depends on two factors  $\Rightarrow$

(1) The functn. of the computation time in the original computer that can be converted to take advantage of the enhancement -  $\Rightarrow$  value which we call fraction enhanced is always less than 1.

(2) The improvement gained by enhanced mode, i.e., how much faster the task would run if enhanced mode were used for entire program.

\* This value is the time of the original mode over the time of enhanced mode. If enhanced mode takes 2 sec. for the portion of program, while it is 5 sec. in original mode, improvement is  $5/2$ . we will call this value (greater than 1) , speedup enhanced.

$$\textcircled{O} \text{ Execution Time}_{(\text{new})} = \text{Ex. time}_{(\text{old})} \times \left[ \frac{(1-\text{fraction})}{\text{enhanced}} + \frac{\text{frac. enhanced}}{\text{speedup enhanced}} \right]$$

$$* \text{ Speedup overall} = \frac{\text{Ex. time old}}{\text{Ex. time new}}$$

$$\# \text{ Performance of Processor} = \frac{\text{Ex. time old}}{\text{Ex. time new}}$$



$F \Rightarrow$  freq. | clock rate

$CC \Rightarrow$  cycle count

$CT \Rightarrow$  cycle time

$$\text{Time period} = \frac{1}{f}$$

$CPI \Rightarrow$  cycle per instruction

$IC \Rightarrow$  inst. count

$MFLOPS \Rightarrow$  million floating point operation per sec.

$MIPS \Rightarrow$  millions inst. per second.

$\text{Execution time} \Rightarrow CC \times CT$

$$\Rightarrow IC \times CPI \times CT$$

$$\Rightarrow \frac{IC \times CPI}{f}$$

$CPI \Rightarrow$  Total CPU clk cycle for program / Instruction count.

$$\Rightarrow \sum_{i=1}^n \frac{CPI_i \times I_i}{IC}$$

$MFLOPS \Rightarrow$  no. of floating point operations in prog. / Ex. time x

$$MIPS \Rightarrow \frac{IC}{\text{Execution time} \times 10^6} = \frac{F}{CPI \times 10^6}$$

Q Eg.  $\Rightarrow$  Calculate CPI and MIPS for a CPU with 200MHz freq. which is executing a performance program with full. inst. avg.  $\Rightarrow$

| Inst. Category | Performance of occurrence | No. of cycle per instruction |
|----------------|---------------------------|------------------------------|
| ALU            | 38                        | 1                            |
| Load & Store   | 15                        | 3                            |
| Branch         | 42                        | 4                            |
| Others         | 5                         | 5                            |
|                | 100                       |                              |

$$\underline{\text{Soh.}} \Rightarrow CPI = \sum_{i=1}^n \frac{CPI_i \times I_i}{\text{Instruction Count}}$$

$$\Rightarrow \frac{38 \times 1 + 15 \times 3 + 42 \times 4 + 5 \times 5}{100} \Rightarrow \frac{276}{100} = 2.76$$

$$MIPS = \frac{\text{clock rate}(t)}{CPI \times 10^6} \Rightarrow \frac{200 \times 10^6}{2.76 \times 10^6} = 70.24$$