

Bhalodia Shruti

Computer System Architecture

Sem -IV

Computer Architecture / Computer Organization

what to do  
(ex. For building, civil  
architec plans where  
should be living room,  
kitchen )

## How to operate (ex. control signals)

- Actual implementation of various functional units required to build compt.

## Three notions of Reasonable computer

1. computer should provide the answers whose programs are given.
  2. It should operate in a finite speed.
  3. It can't store answers to all possible problems.



|    | Present state | I/P | R/W  | next state      |
|----|---------------|-----|------|-----------------|
| 1. | S0            | b   | R    | S1              |
| 2. | S1            | 1   | R    | S1              |
| 3. | S1            | b   | 1    | S2 (left shift) |
| 4. | S2            | 1   | L    | S2              |
| 5. | S2            | b   | R    | S3              |
| 6. | S3            | 1   | b    | S3              |
| 7. | S3            | b   | Hold | S3              |

we get continuous 11111111 which is the sum of n1 & n2.

### → Instruction set architecture

ADD R1, R2 where R1 & R2 are registers



- Selection is done by CU (control unit)
- Selection of operation (add, subtract...) is done by CU.
- Select which operation is done (R/W)

ex.  $R_1 \leftarrow R_1 + R_2$

Read from R1 & R2, write on R1

→ OPcode - Operation code

(for every operation there is code associated with it)

37

## TAS Machine:

Block diagram:

control signals



size of instruction = 20 bits

→ There are three types of bus

- (1) Address bus - processor to memory
- (2) data bus - memory to processor /
- (3) Control bus - R/W operation

→ Programs are stored into the memory. (In sequence). These might be 100 of programs into memory.

→ 18 bits | 12 bits | → 20 bits. (1 instruction)  
 opcode add. of memory

→ Total size of memory =  $2^{12}$  = 4K (0 to 4095)

→ Memory size = 8 bits

8 different memory location.

so  $2^3 = 8$ , [3] bits are required



→ At 1 memory location, we can store 2 instruction

→ IR → 8 bits

AR → 12 bits

PC → increment counter in memory

(If we are at M(0) → it will incremented by one (M(1))

eg

Addition of data at M(101) + M(102) and result should be stored at M(103)

$$M(103) \leftarrow M(101) + M(102)$$

Instruction  
and data is of 40 bits.

$$\textcircled{1} \quad AC := M(101)$$

[load the content into AC]

$$\textcircled{2} \quad AC := AC + M(102)$$

[Add the content of 102 to the AC]

$$\textcircled{3} \quad M(103) := AC$$

[store the content of AC  
in M(103)]

- ① LOAD  $M(101)$  ( $AC = M_{101}$ )
- ② ADD  $M(102)$  ( $AC = AC + M_{102}$ )
- ③  $M(103) = AC$  (STORE)
- ④ Hault

|     |               |        |
|-----|---------------|--------|
|     | 101           | 102    |
| 80: | load M        | load M |
| 81  | Add and store | Hault  |
| 101 | 63            |        |
| 102 | 45            |        |

### → Data transfer Instruction

1.  $AC = MQ$

Move the data from  $MQ$  to  $AC$ .

2.  $AC = M(X)$  (load inst<sup>1</sup>)

Move the data at  $X$  into  $AC$

3.  $M(X) = AC$  (store inst<sup>2</sup>)

4.  $M(Q) - MQ = M(X)$

5.  $AC = -M(X)$

6.  $AC = |M(X)|$

7.  $AC = -|M(X)|$

→ For every loc. opcode would be provided which is of 8 bit.

## → Data Processing Inst<sup>n</sup>

1. ADD M(X)  $\overline{AC = AC + M(X)}$

↳ one operand is in  $M(X)$  and other is by default in  $AC$ . and after performing add. result is by default in  $AC$ .

2.  $AC = AC + |M(X)|$

3.  $AC = AC - M(X)$

4.  $AC = AC - |M(X)|$

7.  $AC = AC \times 2$  (left shift)

5.  $AC \cdot M(X) = M(4) \times M(4)$

8.  $AC = AC / 2$  (right shift)

6.  $M(X) \cdot AC = AC / M(X)$



↳ add. of next memory

by default in  
AC

→  $n$  bit no.  $\times n$  bit no. =  $2n$  bit no.

## → Program control Inst<sup>n</sup>

1. go to  $M(X, 0 : 19)$

(0 to 19) 20 bits

2. go to  $M(X, 20 : 39)$

3. If  $AC > 0$  then

go to  $M(X, 0 : 19)$

4. If  $AC > 0$  then go to  $M(X, 20 : 39)$ .

5.  $M(X, 8 : 19) = AC (28 : 39)$

↳ contains address

6.  $M(X, 28 : 39) = AC (28 : 39)$

## Drawbacks of 1st gen. comp.

~~IAS~~~~Comp.~~~~Not in~~~~5<sup>th</sup> genera-~~~~non comp.~~

1)

Text processing was not included.

2)

I/O operations can't done.

3)

Index Registers was not provided.

4)

Structural programming can't done.

5)

No. of Registers were very less.

Critique

- In 2<sup>nd</sup> generation of computers, Transistors.

(Digital) SMDA = DA + 08

(Microscopic) SDA = DA + 08

(X) MUX DM = DA + 08

→ It also uses flag registers.

Indicates  
status of  
each  
operation

→ To increase speed of processor

1) Pipelining

2) Cache

3) Parallel processing

t = ① IF (Fetch)  
IF (Decoding)

② IF (Decoding)  
(Fetch)

③ EX (Execution)  
(Op. Read)

④ SR (Store Reg.)

⑤ end of  
Instruction

pipelining

## 2) Cache.

cache memory < size of main memory &

static RAM  
(costlier)

built using dynamic RAM

speed of processor > main memory

## 3) parallel processing

- no. of processor is placed in single IC.

① Von Neumann architecture      ② Harvard

processor  
having  
two  
types  
of  
architecture

- Inst<sup>n</sup> / Data placed  
in same common  
memory

- Inst<sup>n</sup> / Data placed  
in separate memory

(all earlier systems uses  
Von Neumann structure)

- Sections of code & data are at separate mem address.

32 bit data

| 31   | 24   | 16 | 8  | 8  | 8  | 0 | 4 bits |
|------|------|----|----|----|----|---|--------|--------|--------|--------|--------|--------|
| 0001 | 0010 | 12 | 34 | 56 | 78 | 0 | 0001   | 0000   | 78     | 12     | 34     | 56     |
| 0002 | 0011 | 12 | 34 | 56 | 78 | 0 | 0001   | 0001   | 56     | 12     | 34     | 56     |
| 0003 | 0100 | 12 | 34 | 56 | 78 | 0 | 0001   | 0002   | 34     | 12     | 34     | 56     |
|      |      |    |    |    |    |   |        |        |        |        |        |        |

- little endian - LSB is placed

is stored at the lower add. (0000) is  
known as little endian      (Intel)

- Big endian - stored at the higher address (0003)  
(Motorola)

## → Factors affecting performance of a CPU

### (1) clock rate



for performing one inst<sup>n</sup> (i)

$$5T = 5 \times \frac{1}{2 \times 10^9}$$

from previous  
 $t=1$  to 5

$$(i) 5T = 5 \times \frac{1}{5 \times 10^9}$$

Therefore, (ii) is more faster

### (2) Instruction count

### (3) Cycles Per Inst<sup>n</sup> (CPI) Cycles/Inst<sup>n</sup>

ex ACT+DR  $\rightarrow$  less

ACT+MCX  $\rightarrow$  more

→ Execution time of program depends on

- VLSI
- Computer organization
- ISA (Instruction set Architecture)
- Compiler Technology
- Size of the program

| Instruction<br>Count | Clock per<br>Instruction<br>(CPI) | Clockrate |
|----------------------|-----------------------------------|-----------|
| VLSI                 | X                                 | ✓         |
| Com. org.            | X                                 | ✓         |
| ISA                  | ✓                                 | ✓         |
| Com. Tech.           | ✓                                 | ✓         |
| Program              | ✓                                 | ✓         |

→ Execution Time =  $IC \times CPI \times \text{clock rate}$   
of a program

→ performance of  $= \frac{1}{\text{Execution time}}$

→ speedup  $= \frac{\text{Performance of CPU A}}{\text{Performance of CPU B}}$

$$\rightarrow \frac{\text{Execution time of CPU A}}{\text{Execution time of CPU B}}$$

Ex. 1

A program is run in machine A, B, C.  
And execution time of it is given that  
10, 25, 75 units.

Processor A is 2.5 times faster than B.

Processor A is 7.5 times faster than C.

$T_{CI} \Rightarrow$  Instruction count of type i



1 ADD AC, DR

2 JMP next  $\leftarrow$  unconditional jump

3 MUL AC, DR

4 next

$CPI_i \Rightarrow$  Cycles / Instruction

$$\text{CPU clock cycles} = \sum_{i=1}^n T_{CI} \times CPI_i$$

$$\text{Inst}^n \text{ count} = \sum_{i=1}^n T_{CI}$$

$$\text{CPI for entire Inst}^n \text{ set} = \frac{\sum_{i=1}^n CPI \times CPI_i}{IC}$$

$$\text{CPI} = \sum_{i=1}^n \left( \frac{IC_i}{IC} \right) \times CPI_i$$

Ex 2 Consider an implementation of ISA where instructions are classified into 4 diff. types with CPI values of 1, 2, 3, 4. Two code sequences have foll<sup>n</sup> inst<sup>n</sup>-count (IC)

|                 | 1 clock cycle   | 2               | 3               | 4               |
|-----------------|-----------------|-----------------|-----------------|-----------------|
| IE <sub>i</sub> | IC <sub>1</sub> | IC <sub>2</sub> | IC <sub>3</sub> | IC <sub>4</sub> |
| code seq. 1     | 20              | 15              | 15              | 2               |
| code seq. 2     | 10              | 12              | 10              | 4               |

Find which is more better?

$$\sum_{i=1}^4 IC_i \times CPI_i$$

$$\begin{aligned} CPI_{1,2} &= (20 \times 1) + (15 \times 2) + (15 \times 3) + (2 \times 4) \\ \text{cycles} &= 73 \\ (\text{clock}) & \end{aligned}$$

$$\begin{aligned} CPI &= 73 / \text{Instruction} = \frac{73}{42} = 1.74 \\ \text{for entire instruction set} & \\ \text{Inst<sup>n</sup> set} & \end{aligned}$$

$$\begin{aligned} CPU \frac{\text{clock}}{\text{cycles}} &= (10 \times 1) + (12 \times 2) + (10 \times 3) + (4 \times 4) \\ & = 80 \end{aligned}$$

$$CPI = \frac{80}{36} = 2.22$$

Therefore A is better.

EX.3

Machine A executes programs with avg CPI of 2.3, Machine B executes with same inst<sup>n</sup> set but better compiler a same program with 20% less inst<sup>n</sup>s and CPI of 1.7 at 1.0 GHz. What should be the clock rate of A so that two machines are same performance?

$$\rightarrow (\text{Execution time})_A = (\text{Execution time})_B$$

$$\therefore ICA \times CPI_A \times \frac{\text{clock}}{\text{rate}_A} = ICB \times CPI_B \times \frac{\text{clock}}{\text{rate}_B}$$

~~$$ICA \times 2.3 \times \frac{\text{clock}}{\text{rate}_A} = (ICA - 0.2) \times 1.7 \times 1.0 \text{ GHz}$$~~

~~$$\therefore \frac{\text{clock}}{\text{rate}_A} = 0.49.$$~~

$$\boxed{(\text{clock rate})_A = 2.04 \text{ GHz}}$$



RISC / CISC

Reduced Inst<sup>n</sup> set computers

Complex Inst<sup>n</sup> set computers



- Pipelining is done easily

- More no. of clock cycles

- Increase No. of Inst<sup>n</sup> per program

- Fetching, executing is done in one cycle

- Decrease CPI and clock cycle time as inst<sup>n</sup>s and program

- Pipelining is difficult

From experiments,

RISC architecture gives better performance.

Ex. For implementation of RISC FSA machine,

| Instruction | Type | freq. | CPI |
|-------------|------|-------|-----|
| Load        |      | 20%   | 4   |
| Store       |      | 8%    | 3   |
| ALU         |      | 60%   | 1   |
| Branch      |      | 12%   | 2   |

$$\text{Overall CPI} = \text{Avg. } \sum_{i=1}^n \left( \frac{IC_i}{IC} \right) \times CPI_i$$

$$= 11.88$$

$$= (0.2 \times 4) + (0.08 \times 3) +$$

$$(0.6 \times 1) + (0.12 \times 2)$$

Ex Suppose that a program is running on a machine with a following instruction types,

| Type | CPI | freq. |
|------|-----|-------|
|------|-----|-------|

|   |     |     |
|---|-----|-----|
| A | 1.3 | 60% |
|---|-----|-----|

|   |     |     |
|---|-----|-----|
| B | 2.2 | 10% |
|---|-----|-----|

|   |     |     |
|---|-----|-----|
| C | 2.0 | 30% |
|---|-----|-----|

The CPU designer gives two options, first is reduce CPI of type A to 1.1 (i) Reduce CPI of type B to 1.6 (ii) Reduce CPI of type C to 1.8

→ CPI Avg. =  $(1.1 \times 0.6) + (2.2 \times 0.1) + (2.0 \times 0.3)$   
 in option (i) = 1.48

CPI Avg. =  $(1.8 \times 0.6) + (1.6 \times 0.1) + (2.0 \times 0.3)$   
 in option (ii) = 0.84 1.54

(i) is better than (ii) as  $1.48 < 1.54$

→ MIPS (Millions of Inst<sup>n</sup> per second)

$$\text{MIPS} = \frac{\text{IC}}{\text{execution time}} \times 10^{-6}$$

→ Performance measurement of

MFLOPS (Millions of floating point Inst<sup>n</sup>/sec)

→ MIPS depends upon: (factors)

1) Inst<sup>n</sup> set.

2) MIPS varies from one program running on same processor to other program running on the same processor.

3) Higher MIPS may not mean better performance

(Compilers should also be same)

→ MIPS is only valid if full<sup>2</sup> cond<sup>n</sup>s are satisfied

1) same program is used

2) same ISA

3) same compiler is used.

Ex Consider a processor with three inst<sup>n</sup> classes A, B, C with the corresponding CPIs are 1, 2, 3. The processor runs at 1GHz. For a given prog. written in C two compilers produce the foll<sup>n</sup> inst<sup>n</sup> count

|             | JCA | JCB | JCC |
|-------------|-----|-----|-----|
| Compiler 1. | 7   | 2   | 1   |
| Compiler 2  | 12  | 1   | 1   |

Compute the MIPS rating and CPU time for two program versions

→ For compiler 1,

$$CPI_1 = (7 \times 1) + (2 \times 2) + (1 \times 3)$$

$$= 7 + 4 + 3 = 14 \text{ CPI}$$

$$CPI_1 = 1.4$$

(MIPS rating)<sub>1</sub> = CPU frequency

unit of freq  
2. Hz = cycles/sec

unit of CPI  
cycles/instr

cycles/sec

$$= \frac{1 \times 10^9}{1.4} = 714.3 \text{ MIPS}$$

→ For compiler 2,

$$CPI_2 = (12 \times 1) + (1 \times 2) + (1 \times 3) = 1.21$$

$$(MIPS rating)_2 = \frac{1 \times 10^9}{1.21} = 826.45 \text{ MIPS}$$

$$\rightarrow \text{CPU time using compiler} = \frac{IC \times CPJ \times \text{clock rate}}{\text{no. of instructions}}$$

$$= (7+2+1) \times 1.4 \times \frac{10^6}{10^9}$$

$$= 0.014 \text{ secs}$$

$$\rightarrow \text{CPU time using compiler} = (12+1+1) \times 1.21 \times \frac{1}{10^9}$$

$$= 0.016 \text{ secs}$$

By looking CPU time, compiler is more faster

$\rightarrow$  SPEC (standard performance evaluation corporation) - used for comparing two programs

Execution time = no. of seconds / program

$$IC = \frac{\text{clock cycle} \times \text{secs}}{\text{Inst}^n \text{ click}}$$

$\rightarrow$  It contains set of Bench mark programs.

$$IC = \frac{(EX1+H)(CX1)+(IX1)}{Inst^n}$$

$$IC = \frac{(EX1+H)(CX1)+(IX1)}{Inst^n}$$

### 3. Processor Basics

DATE: \_\_\_\_\_  
PAGE: \_\_\_\_\_

placed bet<sup>1</sup>  
CPU and MM



→ Processor working in two modes by OS.

1.) User mode — written by users

2) supervisor mode — written by OS

→ If resources are not enough there is switching bet<sup>2</sup> these two modes.



→ If MM is full and CPU requires prog. T4 then there is switching bet<sup>2</sup> prog. of Hard disk and prog. of Main Memory (which is not required for now).

This switching is bet<sup>2</sup> user mode & supervisor mode (carried out by OS).

→ every I/O device is accessed by some port address

→ 2 types of I/O inst<sup>3</sup> to access the I/O ports

1) Memory mapped I/O

2) I/O memory mapped

Programs are classified into two groups

- ① User programs / App<sup>2</sup> programs
- ② OS programs

DATE:

PAGE:

## 1) Memory mapped I/O

LDA 3000H

$$AC = M(3000H)$$

↑  
add. bit<sup>1</sup> 0000 to ffff  
therefore this is known

as memory. (Same set of inst<sup>n</sup>'s are used).

LDAH fff000 ← memory mapped I/O, device

## 2) I/O mapped I/O

separate set of inst<sup>n</sup> for memory and I/O device.

LDA / STA → inst<sup>n</sup> for memory add.

IN / OUT → inst<sup>n</sup> for I/O device

No separately reserved

space in based upon MM

types of inst<sup>n</sup> it will have been differentiated

→ Begin →

Are there any Inst<sup>n</sup> to be processed

Fetch the Inst<sup>n</sup>

Execute the Inst<sup>n</sup>

Are there any interrupts

↓ Yes

Interrupt service prog.

Overview  
of behavior  
of processor  
processor  
basic

→ Hypothetical System which is known as Accumulator based processor

### Block diagram:



### Data transfer Inst<sup>n</sup>

$$LOAD \quad AC = M(X)$$

$$STORE \quad M(X) = AC$$

$$\text{Move to Reg. } DR = AC$$

$$MOVE \quad DR, AC \quad ] 1 \text{ bit}$$

$$ADD \quad AC = AC + DR$$

$$SUB \quad AC = AC - DR$$

$$AND \quad AC = AC \text{ and } DR$$

$$OR \quad AC = AC \text{ or } DR$$

$$XOR \quad AC = AC \text{ xor } DR$$

$$NOT \quad AC = \neg AC$$

$$SHL \quad AC = AC \ll 1$$

$$SHR \quad AC = AC \gg 1$$

$$SWAP \quad AC = \text{swap}(AC)$$

$$ROT \quad AC = \text{rotate}(AC)$$

$$ROTL \quad AC = \text{rotate left}(AC)$$

$$ROTR \quad AC = \text{rotate right}(AC)$$

NOT  $AC = -AC$

Program Control : Branch

Branch zero

$PC = M(adr)$

if  $AC = 0$  then

$PC = M(adr)$

$$\rightarrow z = x + y : LD \ X$$

$MOV DR, AC$

ABP Proc.

$LD \ X$

$ADD AC, M(Y)$

|   |    |
|---|----|
| X | 5  |
| Y | 8  |
| Z | 13 |

$LD \ Y$

$ADD$

$ST Z$

$M \uparrow$

operation can  
be perform  
directly with the  
memory

can't perform  
directly with the  
memory (need to  
store and load)

## Data Representation:

Data can be either fixed point or floating point number. It also numeric or non-numeric no.

fixed or floating using ASCII

Any Binary no. can be represented by BCD or can be Binary format.

Weight of 1 in BCD are changing

$\begin{array}{r} 3 \\ 2 2 2 2 \\ \hline 0 1 0 0 \end{array} \quad \begin{array}{r} 3 \\ 2 2 2 2 \\ \hline 0 1 1 0 \end{array} + 3A = DA = 00A$

while in case of Binary, weight is continue

- BCD is not preferable in addition of 2 BCD number because answer which we get can be wrong.

→ signed number : (+ve) as well as (-ve)

MSB before denotes the no. is +ve or -ve.

Unsigned number : the numbers only.

→ signed number : range is between  $(-2^{n-1} - 1)$  to  $(2^{n-1} - 1)$

signed      0000 : +0      1000 : -0

Magnitude    0001 : +1      1001 : -1

Rep.          0010 : +2      1010 : -2

              0011 : +3      1011 : -3

              0100 : +4      1100 : -4

              0101 : +5      1101 : -5

              0110 : +6      1110 : -6

              0111 : +7      1111 : -7

$$\text{Range} = -(2^{n-1} - 1) \text{ to } (2^{n-1} - 1)$$

- For +ve numbers all rep.s remains same  
(signed magnitude Rep, 1's comp, 2's comp.)

• 1's complement :

$$1000 : -7$$

$$1100 : -3$$

$$1001 : -6$$

$$1101 : -2$$

$$1010 : -5$$

$$1110 : -1$$

$$1011 : -4$$

$$1111 : -0$$

2's complement: ordinary form of any.

1000 : -8

1100 : -4

10010 : -7 (9+1) : 11010 : -3 (8+1)

10100 : -6 (7+1) : 11001 : -2 (7+1)

1011 : -5

1111 : -1

What's minimum & maximum bit values

In signed magnitude and 1's comp.  
we have two diff. rep. for +0 or -0

so there is comparison. Set<sup>3</sup> them and  
we need extra bits (space) for it.

while in 1's complement range is  
bet<sup>n</sup> + (2<sup>n-1</sup>-1) to -2<sup>n-1</sup> [-8 to +7]

(7) 1011 (-8) 1100

(-7) 1011 (+8) 1100

(-8) 1011 (+7) 1100

(-9) 1011 (+6) 1100

(-10) 1011 (+5) 1100

(-11) 1011 (+4) 1100

(-12) 1011 (+3) 1100

(-13) 1011 (+2) 1100

(-14) 1011 (+1) 1100

(-15) 1011 (+0) 1100

(-16) 1011 (-1) 1100

(-17) 1011 (-2) 1100

(-18) 1011 (-3) 1100

(-19) 1011 (-4) 1100

(-20) 1011 (-5) 1100