

## Average CPI in COA

| Instruction Category | Numbers of Instructions | CPI |
|----------------------|-------------------------|-----|
| ALU                  | 40                      | 2   |
| Load & Store         | 15                      | 3   |
| Branch               | 35                      | 4   |
| Other                | 10                      | 5   |

$$CPI_{avg} = \frac{\sum_{i=1}^x n_i CPI_i}{\sum_{i=1}^x n_i}$$

$$CPI_{avg} = \frac{40 \times 2 + 15 \times 3 + 35 \times 4 + 10 \times 5}{100}$$

$$= 3.15$$



## Average CPI in COA

| Instruction Category | Numbers of Instructions | CPI |
|----------------------|-------------------------|-----|
| ALU                  | 40                      | 2   |
| Load & Store         | 15                      | 3   |
| Branch               | 35                      | 4   |
| Other                | 10                      | 5   |

$$\text{CPI}_{\text{Avg}} = \frac{40 \times 2 + 15 \times 3 + 35 \times 4 + 10 \times 5}{100} = 3.15$$



- GATE CS – If memory size is 64GB. If word size is 32bits. Then find the memory capacity in Words, Bytes and bits. Also find the numbers of bits required to represent words, bytes and bits.

$$\Rightarrow \text{No of locations } \{ \text{Address} \}_{\text{word}} = \frac{\text{Size of Memory}}{\text{Size of word}} = \frac{64 \text{ GB}}{32} = \frac{64 \times 2^{30}}{32} = \frac{2^2 \times 2^{30}}{2^4} = 2^{34}$$

$\Rightarrow n_{\text{word}} = 34 \text{ bits.}$

$$\Rightarrow \text{No of locations } \{ \text{Address} \}_{\text{Byte}} = \frac{\text{Size of Memory}}{\text{Size of word}} = \frac{64 \text{ KB}}{8} = 64 \text{ K} = 2^6 \times 2^{30}$$

$\Rightarrow n_{\text{byte}} = 36 \text{ bits.}$

$$\Rightarrow \text{No of locations } \{ \text{Address} \}_{\text{bits}} = \frac{\text{Size of Memory}}{\text{Size of word}} = \frac{64 \text{ GB}}{1} = 64 \times 2^{30} \times 2^3 \\ = 2^6 \times 2^{30} \times 2^3 \\ = 2^{39} = 512 \text{ G}$$

## CPU Execution Time in COA

Program → Instructions → Micro Operation/ Instruction.  
↓  
 $\text{CPI}$   
[ Clock per instruction ].

$$\rightarrow \text{Execution time, } t = \frac{n \times \text{CPI}_{\text{avg}}}{f_{\text{clock}}}.$$



- GATE CS – How many 64K X 8 RAM chips are required to build 1MB of RAM. Also Calculate the size of decoder.

$$\Rightarrow \text{Number of Chips} = \frac{\text{Total Size}}{\text{Available Size}} = \frac{1\text{ MB}}{64\text{ K} \times 8} = \frac{2^{20} \times 2^3}{2^6 \times 2^{10} \times 2^3} = 2^4 = 16 \text{ chips}$$

$$\Rightarrow \boxed{\frac{4}{\uparrow} \times \frac{16}{\uparrow}} \quad \leftarrow \text{Size of decoder.}$$



□ GATE CS – If the size of memory is 512GB. Size of word is 16bits. Find the numbers of bits required to represent each word uniquely. Also Find the number of locations required to represent each bit uniquely.

$$\begin{aligned} \rightsquigarrow \text{No of Locations} &= \frac{\text{Size of Memory}}{\text{Size of Location}} \\ &= \frac{512 \text{ GB}}{16 \text{ B}} \\ &= 256 \text{ K} \\ &= 2^8 \times 2^{30} \\ &= 2^{38} \end{aligned}$$

$$\rightsquigarrow n = 38 \text{ bits.}$$

$$\begin{aligned} \rightsquigarrow \text{Number of Locations} &= \frac{\text{Size of Memory}}{\text{Size of Location}} \\ &= \frac{512 \text{ GB}}{1} \\ &= 2^9 \times 2^{30} \times 2^3 \\ &= 2^{42} \end{aligned}$$

$$\rightsquigarrow n = 42 \text{ bits.}$$



## Memory Interfacing Questions of COA

- GATE CS – If there are 512G locations in the memory, then find the number of bits to represent each location.

$$\begin{aligned} \rightarrow \text{No of Locations} &= 512 \\ &= 2^9 \cdot 2^{30} \\ &= 2^{39} = 2^n \end{aligned}$$

$$\rightarrow n = 39 \text{ bits}$$

- GATE CS – If there are 26 bits available to represent each word of memory, then how many locations can be addressed uniquely?

$$\rightarrow n = 26 \text{ bits}$$

$$\begin{aligned} \rightarrow \text{No. of Locations (Address)} &= 2^n \\ &= 2^{26} \\ &= 2^6 \times 2^{20} \\ &= 64 M \end{aligned}$$

## Memory Interfacing Questions of COA

- GATE CS – How many 256MB RAM chips required to build 4GB RAM?

$$\begin{aligned}\text{Number of chips} &= \frac{\text{Total size}}{\text{Available size}} \\ &= \frac{4\text{GB}}{256\text{ MB}} = \frac{2^2 \times 2^{30}}{2^8 \times 2^{20} \times 2^3} = 2^4 = 16 \text{ chips}\end{aligned}$$

- GATE CS – How many 128KB RAM chips required to build 1MB RAM?

$$\begin{aligned}\text{Number of chips} &= \frac{\text{Total size}}{\text{Available size}} \\ &= \frac{1\text{MB}}{128\text{ KB}} = \frac{2^{20}}{2^7 \times 2^{10} \times 2^3} = 2^3 = 8 \text{ chips}\end{aligned}$$

## MIPS in COA

→ MIPS - Million Instructions per second.

$$\begin{aligned}\rightarrow \text{MIPS} &= \frac{n}{t \times 10^6} \\ &= \frac{\cancel{n}}{\cancel{t} \frac{\text{CP1avg}}{f_{\text{clock}}} \times 10^6} \\ \Rightarrow \text{MIPS} &= \boxed{\frac{f_{\text{clock}}}{\text{CP1avg} \times 10^6}}\end{aligned}$$



- GATE CS – If memory is byte addressable and size of memory is 512KB. Calculate the number of bits required to represent each word.

↑  
 with each  
 address we have  
 one byte data

Address ↑  
 data ↑

$$\begin{aligned}
 \rightarrow \text{No of Locations } \{ \text{Address} \} &= \frac{\text{Size of Memory}}{\text{Size of each location}} \\
 &= \frac{512 \text{ KB}}{8} \\
 &= 512 \text{ K} \\
 &= 2^9 \times 2^{10} \\
 &= 2^{19}
 \end{aligned}$$



$$\rightarrow n = 19 \text{ bits.}$$

□ **GATE CS** – The stage delays in a 4 stage pipeline are 5ns, 6ns, 4ns and 5ns. The second stage is replaced with a functionally equivalent design involving two stages with respective delays 4ns and 5ns. The throughput increase of the pipeline is .....%.

$$\rightarrow \underline{\underline{P-1}} \quad \{ 5\text{ns}, \underline{6\text{ns}}, 4\text{ns}, 5\text{ns} \} \quad \rightarrow \underline{\underline{P-2}} \quad \{ \underline{5\text{ns}}, 4\text{ns}, 5\text{ns}, 4\text{ns}, 5\text{ns} \}.$$

$$\rightarrow \text{Throughput} = \frac{n}{(n+k-1)t_c} = \frac{1}{t_c} \text{ Ideally.}$$

$$\rightarrow t_{c1} = 6\text{ns}, \quad t_{c2} = 5\text{ns}$$

$$\rightarrow TP_1 = \frac{1}{t_{c1}} = \frac{1}{6\text{ns}}$$

$$\rightarrow TP_2 = \frac{1}{t_{c2}} = \frac{1}{5\text{ns}}.$$

$$\begin{aligned} \rightarrow \% \uparrow_{th} &= \frac{TP_2 - TP_1}{TP_1} \times 100 \\ &= \frac{V_5 - V_6}{V_6} \times 100 \\ &= \frac{6}{5} - 1 = 0.2 \times 100 = 20\%. \end{aligned}$$



GATE CS – A Machine has 24 bits instruction format. It has 32 registers and each of which is 32 bits long. It needs to support 49 instructions. Each instruction has two register operands and one immediate operand. If the immediate operand is signed integer, the minimum value of immediate operand is .....

- A. -64       $\rightarrow$  Size of Inst. = 24 bits.
- B. -128      Reg. = 32
- C. -256
- D. -32      Inst. = 49

| Opcode | R.O. 1 | R.O. 2  | Immediate Operand |
|--------|--------|---------|-------------------|
| 6 bits | 5 bits | 5 bits. |                   |

$$\rightarrow \text{Immediate Operand} = 24 - 6 - 5 - 5 = 8 \text{ bits.}$$

$$\begin{array}{rcl} -2^{\frac{n_1}{2}} & \xrightarrow{\text{to}} & +\frac{(2-1)}{2^{\frac{n_1}{2}}} \\ -128 & \xrightarrow{\text{to}} & +127 \end{array}$$



- GATE CS – Assume the frequency of CPU is given as 4.5GHz. Assume we have different type of instructions which have different numbers of instruction count and CPI. Calculate

1. Average instruction execution time
2. MIPS

3. Program execution time

| Instruction Type | Instruction Count | CPI |
|------------------|-------------------|-----|
| Load             | 100               | 10  |
| Store            | 200               | 8   |
| Arithmetic       | 300               | 5   |
| Logical          | 150               | 4   |
| Shift            | 250               | 3   |
| Branch           | 400               | 2   |

$$\rightarrow \text{CPI}_{\text{avg}} = \frac{\sum_{i=1}^x n_i \cdot \text{CPI}_i}{\sum_{i=1}^x n_i} = \frac{100 \times 10 + 200 \times 8 + 300 \times 5 + 150 \times 4 + 250 \times 3 + 400 \times 2}{100 + 200 + 300 + 150 + 250 + 400}$$

$$\Rightarrow \text{CPI}_{\text{avg}} = 4.4642$$

$$\rightarrow t_{\text{inst}} = \frac{\text{CPI}_{\text{avg}}}{f_{\text{clock}}} = \frac{4.4642}{4.5 \times 10^9} = 0.99 \times 10^{-9} \text{ sec}$$

$$\rightarrow \text{MIPS} = \frac{f_{\text{clock}}}{\text{CPI}_{\text{avg}} \times 10^6} = \frac{4.5 \times 10^9}{4.4642 \times 10^6} = 1.008 \times 10^3 \text{ MIPS}$$

$$\rightarrow t_{\text{prog}} = n \cdot t_{\text{inst}}$$

$$= (100 + 200 + 300 + 150 + 250 + 400) \times 0.99 \times 10^{-9}$$

$$= 1386 \times 10^{-9}$$

- GATE 2014 CS – Consider two processor P1 and P2 executing the same instruction set. Assume that under identical conditions, for the same input, a program running on P2 takes 25% less time but incurs 20% more CPI as compared to the program running on P1. If the clock frequency of P1 is 1GHz, then the clock frequency of P2 (in GHz) is ..... ?

| <u>P<sub>1</sub></u>  | <u>P<sub>2</sub></u> |
|-----------------------|----------------------|
| $n_1 = n$             | $n_2 = n$            |
| $t_1 = t$             | $t_2 = 0.75t$        |
| $CPI_1 = CPI$         | $CPI_2 = 1.2 CPI$    |
| $f_1 = 1 \text{ GHz}$ | $f_2 = ?$            |

$$\rightarrow t = \frac{n \text{ CPI}_{\text{avg}}}{f_{\text{clock}}} \Rightarrow n = \frac{t f_{\text{clock}}}{\text{CPI}_{\text{avg}}}.$$

$$\Rightarrow n_1 = n_2$$

$$\Rightarrow \frac{t_1 f_1}{CPI_1} = \frac{t_2 f_2}{CPI_2}$$

$$\Rightarrow \frac{\cancel{n} \times 1 \text{ GHz}}{\cancel{CPI}} = \frac{0.75 \cancel{n} \times f_2}{1.2 \cancel{CPI}}$$

$$\Rightarrow f_2 = \frac{1 \times 1.2}{0.75} = 1.6 \text{ GHz}.$$



Questions on Registers of Computer: Solved Problems and Explanations | COA

- GATE 2020 CS – Consider the following data path diagram

- Execute :  $R_0 \leftarrow R_1 + R_2$

- The following steps are used to execute it over the given data path. Assume that PC is incremented appropriately. The subscripts r and w indicate read and write operations, respectively

1. R2<sub>r</sub>, TEMP1<sub>r</sub>, ALU<sub>ADD</sub>, TEMP2<sub>w</sub> (4)
  2. R1<sub>r</sub>, TEMP1<sub>w</sub> (3)
  3. PC<sub>r</sub>, MAR<sub>w</sub>, MEM<sub>r</sub> (1)
  4. TEMP2<sub>r</sub>, R0<sub>w</sub> (5)
  5. MDR<sub>r</sub>, IR<sub>w</sub> (2) ←

- Find the correct order:

- A. 2 1 4 5 3 X  
B. 1 2 4 3 5 X  
C. 3 5 2 1 4  
D. 3 5 1 2 4



## Examples on Instruction Formats in COA

- GATE CS – A Processor has 512 distinct instructions and 80 general purpose registers. A 24 bits instruction word has an opcode, one register operand and a memory operand. How many bits are reserved for the memory operand field.
- A. 6  
B. 7  
C. 8  
D. 9



## Examples on CPU Performance in COA

- **GATE CS** – In a certain processor, we have a program which takes 1 million instructions to execute with processor whose speed is given as 2GHz. Suppose 40% of the instruction takes 2 clock cycles, 30% of the instructions takes 3 clock cycles and remaining 30% takes 5 clock cycles for their respective execution. What is the total execution time for program?

$$\rightarrow n = 10^6$$
$$f_{clock} = 2 \times 10^9 \text{ Hz}$$
$$\rightarrow CPI_{avg} = \frac{\sum_{i=1}^x n_i CPI_i}{\sum_{i=1}^x n_i} = \frac{0.4 \times 2 + 0.3 \times 3 + 0.3 \times 5}{1} = 3.2$$
$$\rightarrow t_{exe} = \frac{n CPI_{avg}}{f_{clock}} = \frac{10^6 \times 3.2}{2 \times 10^9} = 1.6 \times 10^{-3} = 1.6 \text{ msec.}$$



GATE 2014 CS – Consider the following processors. Assume that the pipeline registers have zero latency.

- P1: Four stage pipeline with stage latencies 1ns, 2ns, 2ns, 1ns.  $\sum t_{P1} = 2\text{ nsec}$
- P2: Four stage pipeline with stage latencies 1ns, 1.5ns, 1.5ns, 1.5ns.  $\sum t_{P2} = 1.5\text{ nsec}$
- P3: Five stage pipeline with stage latencies 0.5ns, 1ns, 1ns, 0.6ns, 1ns.  $\sum t_{P3} = 1\text{ nsec}$
- P4: Five stage pipeline with stage latencies 0.5ns, 0.5ns, 1ns, 1ns, 1.1ns.  $\sum t_{P4} = 1.1\text{ nsec}$

Which processor has the highest peak clock frequency?

$$f_{P3} = \frac{1}{t_{P3}} = \frac{1}{1\text{ nsec}} = 1\text{ GHz}$$

$$f_{P4} = \frac{1}{t_{P4}} = \frac{1}{1.1\text{ nsec}} =$$





$$\rightarrow T_{wp} = nk t_c$$

$$T_p = (n+k-1)t_c$$

$$\rightarrow \text{Speed up } S = \frac{T_{wp}}{T_p} = \frac{nk t_c}{(n+k-1)t_c} = \left\lceil \frac{nk}{(n+k-1)} \right\rceil$$

$$\rightarrow \text{Throughput} = \frac{n}{(n+k-1)t_c} = \left( \frac{1}{t_c} \right) \leftarrow \text{ideal}$$



## CPU Performance Parameters in COA

→ CPU performance is based on,

1) Execution time  $t$ ,  $P \propto \frac{1}{t}$

2) Throughput.

→ Speed up factor  $S$ ,

$$S = \frac{P_x}{P_y} = \frac{1/t_x}{1/t_y} = \frac{t_y}{t_x} \dots \text{Performance of } x \text{ w.r.t. } y.$$

→ If processor 1 takes 1 msec to execute one program and processor 2 takes 5 msec to execute same program. Then,

1) Speed up factor of 1 w.r.t. 2

2) Speed up factor of 2 w.r.t. 1.

$$\therefore S_{12} = \frac{t_2}{t_1} = \frac{5}{1} = 5 \quad \rightarrow S_{21} = \frac{t_1}{t_2} = \frac{1}{5} = 0.2$$



## Examples on CPU Performance in COA

- GATE CS – Find CPI<sub>avg</sub> and MIPS for given specifications. {Clock = 1GHz}
- Also Find Execution time for 350 instructions.

| Instruction Category | % Occurrence | Cycles of Instruction |
|----------------------|--------------|-----------------------|
| ALU                  | 40           | 2                     |
| Load & Store         | 15           | 3                     |
| Branch               | 35           | 4                     |
| Other                | 10           | 5                     |

$$f_{clock} = 1 \text{ GHz}$$

$$\rightarrow CPI_{avg} = \frac{\sum_{i=1}^x n_i CPI_i}{\sum_{i=1}^x n_i} = \frac{40 \times 2 + 15 \times 3 + 35 \times 4 + 10 \times 5}{100} = 3.15$$

$$\rightarrow MIPS = \frac{f_{clock}}{CPI_{avg} \times 10^6} = \frac{10^9}{3.15 \times 10^6} = 317.46 \text{ MIPS}$$



- **GATE CS** – Assume a pipeline P which operates at 3GHz clock rate. It has a speed up factor of 10 and efficiency of 40%. Calculate the number of stages in the above pipeline.

$$\rightarrow f_{clock} = 3 \text{ GHz}$$

$$S = 10$$

$$\eta = 40\% = 0.4$$

$$\rightarrow S = K\eta \Rightarrow K = \frac{10}{0.4} = 25$$



## Examples on Pipelining in COA

- GATE CS – Consider a 5 stage pipeline with cycle time of 5ns. Calculate the execution time of 100 instructions and Speed up due to pipeline. Also find the utilization.

$$\rightarrow k = 5$$

$$t_c = 5 \text{ ns}$$

$$n = 100$$

$$\rightarrow T_p = (n+k-1)t_c = (100+5-1)5 = 520 \text{ nsec}$$

$$\rightarrow S = \frac{n k}{n+k-1} = \frac{100 \times 5}{100+5-1} = 4.8077$$

$$\rightarrow \eta = \frac{n}{n+k-1} = \frac{100}{100+5-1} = 0.9415$$



GATE 2022 CS – Let R1 and R2 be two 4 bits registers that stores numbers in 2's complement form. For the operation R1 + R2, which one of the following values of R1 and R2 gives an arithmetic overflow?

- A. R1 = 1011 and R2 = 1110
- B. R1 = 1100 and R2 = 1010
- C. R1 = 0011 and R2 = 0100
- D. R1 = 1001 and R2 = 1111

$$\begin{array}{r} \textcircled{1}\textcircled{0} \\ 1011 \\ 1110 \\ \hline 1001 \\ 1\oplus 1 = 0 \end{array}$$

$$\begin{array}{r} \textcircled{1}\textcircled{0} \\ 1100 \\ 1010 \\ \hline 0110 \\ 1\oplus 0 = 1 \end{array}$$

$$\begin{array}{r} \textcircled{0}\textcircled{0} \\ 0011 \\ 0100 \\ \hline 0111 \\ 0\oplus 0 = 0 \end{array}$$

$$\begin{array}{r} \textcircled{0}\textcircled{1} \\ 1001 \\ 1111 \\ \hline 1000 \\ 1\oplus 1 = 0 \end{array}$$



- **GATE CS** – Assume we have two pipelines P1 and P2, respectively. P1 has 6 stages, having execution time of 12ns, 14ns, 19ns, 20ns, 22ns and 25ns. P2 has 4 stages each having execution time of 10ns. Calculate the time that can be saved while using P2 pipeline over P1 pipeline, If 2000 instructions are executed.

$$\rightarrow P_1, K_1 = 6, t_{c_1} = 25 \text{ nsec}$$

$$\rightarrow P_2, K_2 = 4, t_{c_2} = 10 \text{ nsec}$$

$$\rightarrow n = 2000$$

$$\rightarrow T_{P_1} = (n + K_1 - 1) t_{c_1} = (2000 + 6 - 1) 25 = 50125 \text{ nsec}$$

$$\rightarrow T_{P_2} = (n + K_2 - 1) t_{c_2} = (2000 + 4 - 1) 10 = 20030 \text{ nsec}$$

$$\rightarrow \Delta T = T_{P_1} - T_{P_2} = 50125 - 20030 = 30095 \text{ nsec}$$



- GATE CS – Consider a 2GHz processor which is used to execute a benchmark program with the following instruction and clock cycle count:

| Instruction Count | CPI |
|-------------------|-----|
| 80000             | 2   |
| 40000             | 3   |
| 5000              | 3   |
| 10000             | 4   |

- What is the effective CPI of the program and total execution time?

$$\rightarrow CPI_{avg} = \frac{\sum_{i=1}^x n_i CPI_i}{\sum_{i=1}^x n_i} = \frac{80000 \times 2 + 40000 \times 3 + 5000 \times 3 + 10000 \times 4}{80000 + 40000 + 5000 + 10000}$$

$$\rightarrow CPI_{avg} = 2.4814$$

$$\rightarrow t_{exec} = \frac{n \cdot CPI_{avg}}{f_{clock}} = \frac{(80000 + 40000 + 5000 + 10000) \times 2.4814}{2 \times 10^9} = 167.494 \times 10^{-9}$$

$$= 167.494 \times 10^{-6}$$

**GATE 2021 CS** – If the numerical value of a 2 byte unsigned integer on a little endian computer is 255 more than that on a big endian computer, which of the following choices represents the unsigned integer on a little endian computer?

- A. 0x6665.
- B. 0x0001
- C. 0x4243
- D. 0x0100

$$\begin{array}{r} \text{5 } \overset{16}{\text{6 }} \text{6 } \overset{16}{\text{5 }} \\ \underline{\text{6 } \overset{16}{\text{6 }} \text{6 } \overset{16}{\text{5 }}} \\ \text{6 } \overset{16}{\text{5 }} \text{6 } \overset{16}{\text{6 }} \\ \hline \text{00 FF} \\ = 255 \end{array}$$

L.E.

B.E.



□ GATE 2021 CS – If the numerical value of a 2 byte unsigned integer on a little endian computer is 255 more than that on a big endian computer, which of the following choices represents the unsigned integer on a little endian computer?

- A. 0x6665
- B. 0x0001
- C. 0x4243
- D. 0x0100

$$\begin{array}{rcl}
 & \begin{array}{c} 16 \\ 5\ 5\ 16 \\ \hline 6\ 6\ 6\ 5 \end{array} & \text{L.E.} \\
 & \begin{array}{c} 6\ 5\ 6\ 6 \\ \hline 0\ 0\ \text{FF} \end{array} & \text{B.E.} \\
 & = 255 &
 \end{array}$$

$$\begin{array}{rcl}
 & \begin{array}{c} 0\ 1\ 5\ 16 \\ 0\ 1\ 0\ 0 \\ \hline 0\ 0\ 0\ 1 \\ \hline 0\ 0\ \text{FF} \end{array} & \text{L.E.} \\
 & \begin{array}{c} 4\ 2\ 4\ 3 \\ 4\ 3\ 4\ 2 \\ \hline -X \end{array} & \text{B.E.} \\
 & = 265 &
 \end{array}$$



**GATE CS** – Which of the following register contains the address of a main memory location from where instruction has to be fetched?

- A. PC
- B. MAR
- C. IR
- D. MDR

**GATE CS** – Which of the following register contains the contents found at the address held by MAR

- A. PC
- B. AC
- C. IR
- D. MDR

## Examples on Number Representation

GATE 2019 CS – In 16 bit 2's Complement representation, the decimal number -28 is:

- A. 1111 1111 0001 1100
  - B. 0000 0000 1110 0100
  - C. 1111 1111 1110 0100
  - D. 1000 0000 1110 0100

$$\begin{array}{r}
 & 32 & 16 & 8 & 4 & 2 & 1 \\
 \dots & 0 & 0 & 1 & 1 & \underline{1} & 0 & 0 \\
 & & & & & & = & 28 & \overline{01} \\
 \downarrow & & & & & & & & \\
 & & & & & & 2's \text{ Complement} \\
 \dots & 1 & 1 & 0 & 0 & 1 & 0 & 0
 \end{array}$$



## Examples on CPU Performance in COA

- **GATE CS** – Assume in hypothetical system cache is 20 times faster than main memory and CPU refers to the cache 80% of the time. Calculate the overall speed up of system.

1 Example of CPU Performance in COA



## **Parameters of Pipelining in Computer Organization & Architecture**

- Pipeline is used to improve speed of execution of program.
- Lets assume ARM9 architecture for 5 stage pipeline.
  - Fetch {Instruction Fetch}
  - Decode {Instruction Decode}
  - Execute {Instruction Execution}
  - Memory {Memory Write}
  - Write Back {Register Write}
- Lets have 8 instructions execution from I1 to I8.



## Examples on Pipelining in COA

- GATE 2015 CS – Consider a non pipelined processor with a clock rate of 2.5GHz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to internal pipeline delay, the clock speed is reduced to 2GHz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is .....

$$\rightarrow f_{c_1} = 2.5 \text{ GHz}, \text{ CPI} = 4$$

$$\rightarrow K = 5, f_{c_2} = 2 \text{ GHz}$$

$$\rightarrow S = \frac{T_{np}}{T_p} = \frac{4/2.5 \times 10^9}{1/2 \times 10^9} = \frac{8}{2.5} = \underline{\underline{3.2}}$$



- GATE 2016 CS – A Processor has 40 distinct instructions and 24 general purpose registers. A 32 bits instruction word has an opcode, two register operands and an immediate operand. The numbers of bits available for the immediate operand is .....

- A. 16      → Instructions = 40  
B. 17      Registers = 24  
C. 18      Size of Inst. = 32 bits.  
D. 19

| Opcode | R.O.-1 | R.O.-2 | Immediate Operand. |
|--------|--------|--------|--------------------|
| 6 bits | 5 bits | 5 bits |                    |

$$\rightarrow \text{Immediate Operand} = 32 - 6 - 5 - 5 \\ = 16 \text{ bits.}$$



# Memory Interfacing Questions of COA

□ GATE CS – We are given with 64K X 2 RAM chips. How many chips required to build 128KB RAM. Also calculate the size of Decoder.

$$\Rightarrow \text{No of chips} = \frac{\text{Total Size}}{\text{Available Size}} = \frac{128 \text{ KB}}{64 \text{ K} \times 2} = \frac{2^7 \times 2^{10} \times 2^3}{2^6 \times 2^{10} \times 2^1} = 2^3 = 8 \text{ chips}$$



$$\Rightarrow \text{Decoder Output} = \text{No of chips} \times \frac{\text{Available data lines}}{\text{Req'd data lines}} = 8 \times \frac{2}{8} = 2$$

**GATE CS – Which of the following register contains the address of a main memory location from where instruction has to be fetched?**

- A. PC
- B. MAR
- C. IR
- D. MDR

**GATE CS – Which of the following register contains the contents found at the address held by MAR**

- A. PC
- B. AC
- C. IR
- D. MDR

**ISRO 2008 CS – The Memory Address Register**

- A. Is a hardware ~~memory~~ device which denotes the location of the current instruction being executed
- B. Is a group of electrical circuit, that performs the intent of instructions fetched from memory
- C. Contains the address of memory location that is to be read from or store into
- D. Contains a copy of the designated memory location specified by the MAR after “READ” or the new contents of the memory prior to a “Write”

- **GATE CS** – In hypothetical system, Floating point instructions are improved to run five times as fast, but only 40% of the time was spent on these instructions originally. How much speed up is achieved by new machine?

$$\rightarrow F = 0.4$$

$$S = 5$$

$$\begin{aligned}\rightarrow S_{\text{new}} &= \frac{1}{(1-F) + \frac{F}{S}} \\ &= \frac{1}{0.6 + \frac{0.4}{5}} \\ &= \frac{5}{3.4} \\ &= 1.47\end{aligned}$$



# Questions on Registers of Computer

GATE 2020 CS – Consider the following data path diagram

Execute :  $R_0 \leftarrow R_1 + R_2$

The following steps are used to execute it over the given data path. Assume that PC is incremented appropriately. The subscripts r and w indicate read and write operations, respectively

1.  $R_2_r, TEMP1_r, ALU_{ADD}, TEMP2_w$  ④
2.  $R_1_r, TEMP1_w$  ③
3.  $PC_r, MAR_w, MEM_r$  ①
4.  $TEMP2_r, R_0_w$  ⑤
5.  $MDR_r, IR_w$  ②

Find the correct order:

- A. 2 1 4 5 3 X  
B. 1 2 4 3 5 X  
C. 3 5 2 1 4  
D. 3 5 1 2 4



## Amdahl's Law in COA

→ It explains overall speed up factor of given CPU after some



# Memory Interfacing Questions of COA

- GATE CS – We are given with 64K X 2 RAM chips. How many chips required to build 128KB RAM. Also calculate the size of Decoder.

$$\Rightarrow \text{No of chips} = \frac{\text{Total Size}}{\text{Available Size}} = \frac{128 \text{ KB}}{64 \text{ K} \times 2} = \frac{2^7 \times 2^{10} \times 2^3}{2^6 \times 2^{10} \times 2^1} = 2^3 = 8 \text{ chips}$$



□ GATE 2013 CS – A RAM chip has capacity of 1024 words of eight bits each (1K X 8). The Number of 2 X 4 decoders with enable input line needed to construct a 16K X 16 from 1K X 8 RAM is .....

- A. 4
- B. 5       $\Rightarrow \text{No of chips} = \frac{\text{Total Size}}{\text{Available Size}} = \frac{16 \times 16^2}{1 \times 8} = 32 \text{ chips}$
- C. 6
- D. 7

$$\Rightarrow \text{No of Outputs with decoder} = \text{No of chips} \times \frac{\text{Available data lines}}{\text{Req'd data lines}}$$

$$\Rightarrow \frac{4}{n} \times \frac{16}{2^n} = 32 \times \frac{8}{16} = 16$$



## **Memory Interfacing Questions of COA**

**GATE 2009 CS / ISRO 2015 CS – How many 32K X 1 RAM chips are needed to provide memory capacity of 256K bytes?**

- A. 8
- B. 32
- C. 64
- D. 128



- **GATE 2011 CS** – Consider an instruction pipeline with four stages (S1, S2, S3 & S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delay for the stages and for the pipeline registers are as given in the figure.



- What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non pipeline implementation?

$$S = \frac{T_{cycP}}{T_p} = \frac{[5 + 6 + 11 + 8]}{[11 + 1]} = \underline{\underline{2.5}}$$



## Examples on CPU Performance in COA

- GATE CS – Find CPI<sub>avg</sub> and MIPS for given specifications. {Clock = 1GHz}
- Also Find Execution time for 350 instructions.

| Instruction Category | % Occurrence | Cycles of Instruction |
|----------------------|--------------|-----------------------|
| ALU                  | 40           | 2                     |
| Load & Store         | 15           | 3                     |
| Branch               | 35           | 4                     |
| Other                | 10           | 5                     |

$$\rightarrow \text{CPI}_{\text{avg}} = \frac{\sum_{i=1}^x n_i \text{CPI}_i}{\sum_{i=1}^x n_i} = \frac{40 \times 2 + 15 \times 3 + 35 \times 4 + 10 \times 5}{100} = 3.15$$

$$\rightarrow \text{MIPS} = \frac{f_{\text{clock}}}{\text{CPI}_{\text{avg}} \times 10^6} = \frac{10^9}{3.15 \times 10^6} = 317.46 \text{ MIPS}$$

$$\rightarrow t = \frac{n \times \text{CPI}_{\text{avg}}}{f_{\text{clock}}} = \frac{350 \times 3.15}{10^9} = 1102.5 \times 10^{-9} \text{ sec} = 1.1025 \text{ ns}$$

$f_{\text{clock}} = 1 \text{ GHz}$



## **Examples on Instruction Formats in COA**

- GATE CS** – A Processor has 512 distinct instructions and 80 general purpose registers. A 24 bits instruction word has an opcode, one register operand and a memory operand. How many bits are reserved for the memory operand field.

- A. 6       $\rightarrow$  Instruction = 512  
 B. 7      Registers = 80  
 C. 8  
 D. 9      Size of Inst." = 24 bits.

| Opcode | Reg. Operand | Mem. Operand. |
|--------|--------------|---------------|
| 9 bits | 7 bits       | ?             |

$$\rightarrow \text{Mem. Operand} = 24 - 7 - 7 \\ \rightarrow 8 \text{ bits}$$



## Memory Interfacing Questions of COA

□ GATE 2009 CS / ISRO 2015 CS – How many 32K X 1 RAM chips are needed to provide memory capacity of 256K bytes?

- A. 8
- B. 32
- C. 64
- D. 128

→ No of chips =  $\frac{\text{Total Size}}{\text{Available Size}}$

$$= \frac{256 \text{ KB}}{32 \text{ K} \times 1}$$
$$= \frac{2^8 \times 2^10 \times 2^3}{2^5 \times 2^6}$$
$$= 2^6$$
$$= 64 \text{ chips}$$

□ GATE CS – How many 256M X 8 RAM chips are required to build 32GB of RAM. Also Calculate the size of decoder.

$$\begin{aligned} \text{Number of Chips} &= \frac{\text{Total Size}}{\text{Available size}} = \frac{32 \text{ GB}}{256 \text{ M} \times 8} \\ &= \frac{2^5 \times 2^{30}}{2^8 \times 2^{20} \times 2^3} \\ &= 2^7 \\ &= \underline{128 \text{ chips}} \end{aligned}$$

$$\begin{array}{c} \text{V/P} \quad \text{O/P} \\ \boxed{7 \times 128} \leftarrow \underline{7 \times 128} \\ \uparrow \quad \uparrow \\ n \quad 2^n \end{array}$$

- **GATE 2004 CS** – A 4 stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds, respectively. Registers that are used between the stages have a delay of 5 ns each. Assuming constant clock rate, the total time taken to process 1000 data items on this pipeline will be ..... Microseconds

$$n = 1000$$

$$k = 4$$

$$t_c = 160 + 5 = 165 \text{ nsec}$$

$$\begin{aligned} \rightarrow T_p &= (n+k-1)t_c \\ &= (1000+4-1)165 \\ &= 165495 \text{ nsec} \\ &= 165495 \text{ microsec} \end{aligned}$$



- **GATE CS** – Consider a 5 stage pipeline. Delay of each stage is given as per 10ns, 16ns, 12ns, 11ns & 14ns, respectively. Calculate the execution time of 100 instructions and Speed up due to pipeline.

$$\rightarrow K = 5$$

$$t_c = 16 \text{ nsec}$$

$$n = 100$$

$$\rightarrow T_p = (n + K - 1) t_c = (100 + 5 - 1) 16 = 1664 \text{ nsec} = 1.664 \text{ usec}$$

$$\rightarrow S = \frac{T_{opt}}{T_p} = \frac{100(10 + 16 + 12 + 11 + 14)}{1664} = 3.786$$



- GATE 2011 CS** – Consider an instruction pipeline with four stages (S1, S2, S3 & S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delay for the stages and for the pipeline registers are as given in the figure.



- What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non pipeline implementation?

## CPU Performance Parameters in COA

→ CPU performance is based on,

1) Execution time  $t$ ,  $P \propto \frac{1}{t}$

2) Throughput.

→ Speed up factor  $S$ ,

$$S = \frac{P_x}{P_y} = \frac{1/t_x}{1/t_y} = \frac{t_y}{t_x} \dots \text{Performance of } x \text{ w.r.t. } y.$$

→ If processor 1 takes 1 msec to execute one program and processor 2 takes 5 msec to execute same program. Then,

1) Speed up factor of 1 w.r.t. 2

2) Speed up factor of 2 w.r.t. 1.

•  $S$



## Examples on CPU Performance in COA

- GATE CS – Find CPI<sub>avg</sub> and MIPS for given specifications. {Clock = 1GHz}
- Also Find Execution time for 350 instructions.

| Instruction Category | % Occurrence | Cycles of Instruction |
|----------------------|--------------|-----------------------|
| ALU                  | 40           | 2                     |
| Load & Store         | 15           | 3                     |
| Branch               | 35           | 4                     |
| Other                | 10           | 5                     |

$$f_{clock} = 1 \text{ GHz}$$

$$\rightarrow CPI_{avg} = \frac{\sum_{i=1}^x n_i CPI_i}{\sum_{i=1}^x n_i} = \frac{40 \times 2 + 15 \times 3 + 35 \times 4 + 10 \times 5}{100} = 3.15$$

$$\rightarrow MIPS = \frac{f_{clock}}{CPI_{avg} \times 10^6} = \frac{10^9}{3.15 \times 10^6} = 317.46 \text{ MIPS}$$

$$\rightarrow t = \frac{n \times CPI_{avg}}{f_{clock}} = \frac{350 \times 3.15}{10^9} = 1102.5 \times 10^{-9} \text{ sec} = 1.1025 \times 10^{-6} \text{ sec} = 1.1025 \text{ usec.}$$



$$\rightarrow T_{wp} = nK t_c$$

$$T_p = (n+K-1)t_c$$

$$\rightarrow \text{Speed up } S = \frac{T_{wp}}{T_p} = \frac{nK t_c}{(n+K-1)t_c} = \boxed{\frac{nK}{(n+K-1)}}$$

$$\rightarrow \text{Throughput} = \frac{n}{(n+K-1)t_c} = \left(\frac{1}{t_c}\right) \leftarrow \text{ideal}$$

$$\rightarrow \text{Utilization or Efficiency } \eta = \frac{nK}{(n+K-1)K} = \frac{n}{n+K-1}$$

$$\rightarrow \boxed{S = K\eta}$$



## Examples on CPU Performance in COA

- **GATE CS** – Assume in hypothetical system cache is 20 times faster than main memory and CPU refers to the cache 80% of the time. Calculate the overall speed up of system.

$$\rightarrow F = 0.8$$

$$S = 20$$

$$\rightarrow S_{\text{all}} = \frac{1}{(1-F) + \frac{F}{S}}$$

$$= \frac{1}{0.2 + \frac{0.8}{20}}$$

$$= \frac{20}{4.8}$$

$$= 4.1666$$



## Examples on Pipelining in COA

- **GATE CS** – Consider a pipeline having 5 stages with duration 10ns, 30ns, 45ns, 80ns and 35ns. If the buffer delay is 20ns, calculate the speed up of pipelined processor.

$$S = \frac{T_{NP}}{T_P} = \frac{[10 + 30 + 45 + 80 + 35]}{[80 + 20]} = 2$$





$$\rightarrow T_{wp} = nk t_c$$

$$T_p = (n+k-1)t_c$$

$$\rightarrow \text{Speed up } S = \frac{T_{wp}}{T_p} = \frac{nk t_c}{(n+k-1)t_c} = \boxed{\frac{nk}{(n+k-1)}}$$



- **GATE CS** – Assume a microprocessor given which is widely used for scientific applications. It has both integer and floating point instructions. Now floating point instructions are enhanced and made 3 times faster than before and integer instructions are unenhanced. If there are 20% of floating point instructions in the program then find the overall speed up.

$$\rightarrow F = 0.2$$

$$S = 3$$

$$\begin{aligned}\rightarrow S_{\text{all}} &= \frac{1}{(1-F) + \frac{F}{S}} \\ &= \frac{1}{0.8 + \frac{0.2}{3}} \\ &= \frac{3}{2.6} \\ &\approx 1.1538\end{aligned}$$



- **GATE CS** – We are given with  $16K \times 4$  RAM chips. How many chips required to build 64KB RAM. Also calculate the size of Decoder.

$$\rightarrow \text{No of chips} = \frac{\text{Total Size}}{\text{Available Size}} = \frac{64 \text{ KB}}{16 \text{ K} \times 4} = \frac{2^6 \times 2^{10} \times 2^3}{2^4 \times 2^{10} \times 2^2} = 2^3 = 8 \text{ chips}$$

$$\begin{aligned} \rightarrow \text{Decoder Outputs} &= \text{No of chips} \times \frac{\text{Available data lines}}{\text{Req'd data lines}} \\ &= 8 \times \frac{4}{8} = 4 \end{aligned}$$

$$\begin{array}{c} \text{I/P} \quad \text{O/P} \\ \rightarrow \frac{2}{\uparrow} \times \frac{4}{\uparrow} \\ n \quad 2^n \end{array}$$

## Amdahl's Law in COA

→ It explains overall speed up factor of given CPU after some enhancement.



$$\rightarrow S_{\text{all}} = \frac{t_{\text{old}}}{t_{\text{new}}} = \frac{1}{\frac{1-F+F/S}{1}}$$

↑ Unenhanced



## Instruction Space time Diagram with Pipelining



- GATE CS – Assume the frequency of CPU is given as 4.5GHz. Assume we have different type of instructions which have different numbers of instruction count and CPI. Calculate

1. Average instruction execution time
2. MIPS
3. Program execution time

| Instruction Type | Instruction Count | CPI |
|------------------|-------------------|-----|
| Load             | 100               | 10  |
| Store            | 200               | 8   |
| Arithmetic       | 300               | 5   |
| Logical          | 150               | 4   |
| Shift            | 250               | 3   |
| Branch           | 400               | 2   |

$$\rightarrow \text{CPI}_{\text{avg}} = \frac{\sum_{i=1}^x n_i \times \text{CPI}_i}{\sum_{i=1}^x n_i} = \frac{100 \times 10 + 200 \times 8 + 300 \times 5 + 150 \times 4 + 250 \times 3 + 400 \times 2}{100 + 200 + 300 + 150 + 250 + 400}$$

$$\Rightarrow \text{CPI}_{\text{avg}} = 4.4642$$

$$\rightarrow t_{\text{inst}} = \frac{\text{CPI}_{\text{avg}}}{f_{\text{clock}}} = \frac{4.4642}{4.5 \times 10^9} = 0.99 \times 10^{-9} \text{ sec}$$

$$\rightarrow \text{MIPS} = \frac{f_{\text{clock}}}{\text{CPI}_{\text{avg}} \times 10^6} = \frac{4.5 \times 10^9}{4.4642 \times 10^6} = 1.008 \times 10^3$$

$$\begin{aligned} \rightarrow t_{\text{prog}} &= n \cdot t_{\text{inst}} \\ &= (100 + 200 + 300 + 150 + 250 + 400) \times 0.99 \times 10^{-9} \\ &= 1386 \times 10^{-9} = 1.386 \times 10^{-6} \text{ sec.} \end{aligned}$$



## Instruction Space time Diagram with Pipelining



$$\rightarrow T_{wp} = nKt_c$$

$$T_p = (n+K-1)t_c$$



**GATE CS** – Which of the following register is connected to system bus?

- A. PC
- B. MAR
- C. IR
- D. MDR

**GATE CS** – Which of following register always points to the next instruction to be executed?

- A. PC
- B. MAR
- C. IR
- D. MDR

**GATE CS** – Which of following register contains the most recently fetched instruction?

- A. PC
- B. MAR
- C. IR
- D. MDR