

**GATE 2020  
PSUs 2020**



**MADE EASY**  
Publications

Revised & Updated Edition

# **POSTAL STUDY PACKAGE**



**COMPUTER SCIENCE & IT**  
**Computer Organization & Architecture**

**Objective Practice Sets**

# **POSTAL** **Study Package**

# **2020**

## **Computer Science & IT**

### **Objective Practice Sets**

#### **Computer Organization & Architecture**

*Contents*

| <b>Sl. Topic</b>                           | <b>Page No.</b> |
|--------------------------------------------|-----------------|
| 1. Basics of Computer Design .....         | 2               |
| 2. CPU Design .....                        | 13              |
| 3. Instruction Pipelining .....            | 19              |
| 4. Memory Hierarchy Design .....           | 29              |
| 5. Input-Output and Secondary Memory ..... | 44              |
| 6. Data Representation .....               | 49              |



**MADE EASY**  
Publications

Note: This book contains copyright subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means. Violators are liable to be legally prosecuted.

# 1

## CHAPTER

# Basics of Computer Design

1. The computer performs all mathematical and logical operations inside its
    - (a) memory unit
    - (b) central processing unit
    - (c) output unit
    - (d) visual display unit
  2. Which of the following is not involved in a memory write operation?
    - (a) MAR
    - (b) PC
    - (c) MDR
    - (d) data bus
  3. In Flynn's classification of computers, the vector and array classes of machines belong to
    - (a) Single instruction/single data category
    - (b) Single instruction/multiple data category
    - (c) Multiple instruction/single data category
    - (d) Multiple instruction/multiple data category
  4. The following are four statements regarding what a CPU with only a set of 32 bit registers can perform.
    1. Hold and operate on 32 bit integers
    2. Hold and operate on 16 bit integers
    3. Hold and operate on 64 bit floating point arithmetic
    4. Hold and operate on 16 bit UNICODE charactersWhich of the following is true about such a CPU?
    - (a) All are true
    - (b) 1, 2 and 3 only
    - (c) 1, 2 and 4 only
    - (d) 1, 3 and 4 only
  5. The following are four statements about Reduced Instruction Set Computer (RISC) architectures.
    1. The typical RISC machine instruction set is small, and is usually a subset of a CISC instruction set.
    2. No arithmetic or logical instruction can refer to the memory directly.
  3. A comparatively large number of user registers are available.
  4. Instructions can be easily decoded through hard-wired control units.
- Which of the above statements is true?
- (a) 1 and 3 only    (b) 1, 3 and 4 only  
(c) 1, 2 and 3 only    (d) all of these
6. Which of the following statements is false about CISC architectures?
    - (a) CISC machine instructions may include complex addressing modes, which require many clock cycles to carry out.
    - (b) CISC control units are typically micro-programmed, allowing the instruction set to be more flexible.
    - (c) In the CISC instruction set, all arithmetic/logic instructions must be register based.
    - (d) CISC architectures may perform better in network centric applications than RISC.
  7. Consider the following register-transfer language
$$R_3 \leftarrow R_3 + M[R_1 + R_2]$$
where  $R_1$ ,  $R_2$  are the CPU registers and  $M$  is a memory location in primary memory. Which addressing mode is suitable for above register-transfer language?
    - (a) immediate
    - (b) indexed
    - (c) direct
    - (d) displacement
  8. Consider a high-level language statement while  $[*i--]$  then which addressing mode is suitable for it?
    - (a) autoincrement
    - (b) indexed
    - (c) displacement
    - (d) autodecrement
  9. Match List-I with List-II and select the correct answer using the codes given below the lists:

**List-I**

- $\text{Regs}[R_4] \leftarrow \text{Regs}[R_4] + \text{Regs}[R_3]$
- $\text{Regs}[R_4] \leftarrow \text{Regs}[R_4] + 3$
- $\text{Regs}[R_4] \leftarrow \text{Regs}[R_4] + \text{Mem}[\text{Regs}[R_1]]$

**List-II**

- Immediate
- Register
- Displacement

**Codes:**

| A     | B                 | C |
|-------|-------------------|---|
| (a) 3 | 2                 | 1 |
| (b) 2 | 1                 | 3 |
| (c) 1 | 2                 | 3 |
| (d)   | None of the above |   |

10. Consider an  $(n + k)$  bit instruction with a  $k$ -bit opcode and single  $n$ -bit address. Then this instruction allow \_\_\_\_\_ operations and \_\_\_\_\_ addressable memory cells.
- $2^k, 2^{n+k}$
  - $2^{n+k}, 2^{n+k}$
  - $2^k, 2^n$
  - $2^{n+k}, 2^{n+1}$

11. The following diagram shows, which addressing mode?



- immediate addressing mode
- indirect addressing mode
- extended addressing mode
- none of the above

12. A CPU has an arithmetic unit that adds bytes and then sets its  $V$ ,  $C$  and  $Z$  flag bits as follows. The  $V$ -bit is set if arithmetic overflow occurs (in 2's complement arithmetic). The  $C$ -bit is set if a carry-out is generated from the most significant bit during an operation. The  $Z$ -bit is set if the result is zero. What are the values of the  $V$ ,  $C$  and  $Z$  flag bit after 8-bit byte 1100 1100 and 1000 1111 are added?

| V     | C | Z |
|-------|---|---|
| (a) 0 | 0 | 0 |
| (b) 1 | 1 | 0 |
| (c) 1 | 1 | 1 |
| (d) 0 | 1 | 0 |

13. Consider the following sequence of instructions intended for execution on a stack machine. Each arithmetic operation pops the second operand, then pops the first operand, operates on them, and then pushes the result back onto the stack

|      |     |
|------|-----|
| Push | $b$ |
| Push | $x$ |
| add  |     |
| Pop  | $c$ |
| Push | $c$ |
| Push | $y$ |
| add  |     |
| Push | $c$ |
| sub  |     |
| Pop  | $z$ |

Which of the following statements is/are true?

- If push and pop instructions each require 5 bytes of storage, and arithmetic operations each require 1 byte of storage then the instruction sequence as a whole requires a total of 40 bytes of storage.
- At the end of execution,  $z$  contains the same value as  $y$ .
- At the end of execution, the stack is empty.
- 1 only
- 2 only
- 2 and 3 only
- 1 and 3 only

14. Match List-I with List-II and select the correct answer using the codes given below the lists:

**List-I**

- MOV X, R1
  - STORE X
  - POP X
  -
- Three-address instruction
  - Zero-address instruction
  - One-address instruction
  - Two-address instruction

**Codes:**

| A     | B             | C |
|-------|---------------|---|
| (a) 4 | 3             | 2 |
| (b) 3 | 2             | 1 |
| (c) 2 | 3             | 4 |
| (d)   | None of these |   |

15. Match List-I with List-II and select the correct answer using the codes given below the lists:

**List-I**

- A. 0-address instruction
- B. 1-address instruction
- C. 2-address instruction
- D. 3-address instruction

**List-II**

1.  $T = \text{TOP} (T - 1)$
2.  $Y = Y + X$
3.  $Y = A - B$
4.  $ACC = ACC - X$

**Codes:**

| A     | B | C | D |
|-------|---|---|---|
| (a) 1 | 2 | 3 | 4 |
| (b) 3 | 2 | 4 | 1 |
| (c) 2 | 3 | 1 | 4 |
| (d) 1 | 4 | 2 | 3 |

16. A certain processor executes the following set of machine instructions sequentially.

```
MOV R0, #0
MOV R1, 100 (R0)
ADD R1, 200 (R0)
MOV 100 (R0), R1
```

Assuming that memory location 100 contains the value 35 (Hex), and the memory location 200 contains the value A4 (Hex), what could be said about the final result?

- (a) Memory location 100 contains value A4
- (b) Memory location 100 contains value DA
- (c) Memory location 100 contains value D9
- (d) Memory location 200 contains value 35

17. A stack-based processor executes the following set of machine instructions sequentially.

```
PUSH 100
PUSH 200
ADD
POP 300
```

Assuming that

1. memory location 100 contains the value 53 (Hex) and memory location 200 contains the value 4C (Hex),
2. the stack is byte organized and the stack pointer is at 00FF, and that

3. all PUSH and POP instructions have a memory operand.

Which of the following could the final result be?

- (a) Memory location 300 contains the value 9F
- (b) Memory location 00FD contains the value 9F
- (c) Memory location 00FF contains a value 100
- (d) Memory location 00FE contains a value 200

18. In a certain processor, a 2 byte Jump instruction is encountered at memory address 3010H, the Jump instruction is in PC relative mode. The instruction is JMP - 7 where - 7 is signed byte. Determine the Branch Target Address

- (a) 300B H
- (b) 3009 H
- (c) 3003 H
- (d) 3007 H

19. Processor XYZ supports only the immediate and the direct addressing modes. Which of the following programming language data structures cannot be implemented on this processor?

1. Pointers
  2. Arrays
  3. Records
- (a) 1, 2 and 3
  - (b) 2 and 3
  - (c) 1 and 2
  - (d) Only 1

20. Word 20 contains 40  
Word 30 contains 50  
Word 40 contains 60  
Word 50 contains 70

Which of the following instructions loads 60 into the accumulator?

- (a) load immediate 20
- (b) load direct 30
- (c) load indirect 20
- (d) load indirect 30

21. Match List-I with List-II and select the correct answer using the codes given below the lists:

**List-I**

- A. A[1] = B[J];
- B. while [\*A++];
- C. int temp = \*x;

**List-II**

1. Indirect addressing
2. Indexed addressing
3. Autoincrement

**Codes:**

| A     | B | C |
|-------|---|---|
| (a) 3 | 2 | 1 |
| (b) 1 | 3 | 2 |
| (c) 2 | 3 | 1 |
| (d) 1 | 2 | 3 |

22. Consider the following program segment. Here  $R_1$ ,  $R_2$  and  $R_3$  are the general purpose registers.

| Instruction      | Operation                  | Instruction Size<br>(no. of words) |
|------------------|----------------------------|------------------------------------|
| MOV $R_1$ , 3000 | $R_1 \leftarrow M[3000]$   | 2                                  |
| Loop:            |                            |                                    |
| MOV $R_2, (R_1)$ | $R_2 \leftarrow M[R_1]$    | 1                                  |
| ADD $R_2, R_1$   | $R_2 \leftarrow R_1 + R_2$ | 1                                  |
| MOV $(R_3), R_2$ | $M[R_3] \leftarrow R_2$    | 1                                  |
| INC $R_3$        | $R_3 \leftarrow R_3 + 1$   | 1                                  |
| DEC $R_1$        | $R_1 \leftarrow R_1 - 1$   | 1                                  |
| BNZ LOOP         | Branch on not zero         | 2                                  |
| Halt             | Stop                       | 1                                  |

Assume that the content of memory location 3000 is 10 and the content of the register  $R_3$  is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 100. All the numbers are in decimal.

Assume that the memory is word addressable. The number of memory references for accessing the data in executing the program completely is

- (a) 10                         (b) 11  
(c) 20                         (d) 21

23. A CPU has 24-bit instructions. A program starts at address 300 (in decimal). Which one of the following is a legal program counter (all values in decimal)?

- (a) 400                         (b) 500  
(c) 600                         (d) 700

24. The memory locations 1000, 1001 and 1020 have data values 18, 1 and 16 respectively before the following program is executed.

MOVI Rs, 1;                  Move immediate  
LOAD Rd, 1000(Rs);       Load from memory  
ADDI Rd, 1000;              Add immediate  
STOREI 0(Rd), 20;           Store immediate

Which of the statements below is TRUE after the program is executed?

- (a) memory location 1000 has value 20  
(b) memory location 1020 has value 20  
(c) memory location 1021 has value 20  
(d) memory location 1001 has value 20

25. Consider the following assembly language program for a hypothetical processor.  $A$ ,  $B$  and  $C$  are 8 bit integers. The meaning of various instructions are shown as comments.

MOV  $B, \#0$ ;               $B \leftarrow 0$   
MOV  $C, \#8$ ;               $C \leftarrow 8$   
 $Z : \text{CMP } C, \#0$ ;    compare  $C$  with 0  
JZX;                      jump to  $X$  if zero flag is set  
SUBC, #1;                 $C \leftarrow C - 1$   
LRCA, #1;                Left rotate  $A$  through carry by  
                              one bit. Thus if the initial value  
                              of  $A$  and the carry flag are  
                               $a_7a_6\dots a_0$  and  $c_0$  respectively,  
                              their values after the execution  
                              will be  $a_6a_5\dots a_0c_0$  and  $a_7$   
                              respectively.  
JC  $Y$ ;                      Jump to  $Y$  if carry flag is set  
JMP  $Z$ ;                      Jump to  $Z$   
 $Y : \text{ADD } B, \#1$ ;     $B \leftarrow B + 1$   
JMP  $Z$ ;                      Jump to  $Z$   
 $X$ :

If the initial value of register  $A$  is  $A_0$ , the value of register  $B$  after the program execution will be

- (a) The number of 0 bits in  $A$   
(b) The number of 1 bits in  $A$   
(c)  $A_0$   
(d) 8

26. In the program below, the number of times the FIRST and SECOND JNZ instructions cause the control to be transferred to LOOP respectively.

|          |          |
|----------|----------|
| MVI      | H, 02H   |
| MVI      | L, 05H   |
| LOOP :   | DCR L    |
| FIRST :  | JNZ LOOP |
|          | DIR H    |
| SECOND : | JNZ LOOP |

- (a) 5 and 2                     (b) 4 and 1  
(c) 259 and 1                 (d) 260 and 1

27. Consider the following program segment for a hypothetical CPU having three user registers  $R_1$ ,  $R_2$  and  $R_3$ .

| Instruction   | Operation                           | Instruction Size<br>(no. of words) |
|---------------|-------------------------------------|------------------------------------|
| MOV R1, 5000; | $R1 \leftarrow \text{Memory}[5000]$ | 2                                  |
| MOV R2, (R1); | $R2 \leftarrow \text{Memory}[(R1)]$ | 1                                  |
| ADD R2, R3;   | $R2 \leftarrow R2 + R3$             | 1                                  |
| MOV 6000, R2; | $\text{Memory}[6000] \leftarrow R2$ | 2                                  |
| HALT;         | Machine halts                       | 1                                  |

Let the clock cycles required for various operations be as follows:

Register to / from memory transfer: 3 clock cycles

**ADD with both operands in register: 1 clock cycle**

**Instruction fetch and decode:** 2 clock cycles per word

The total number of clock cycles required to execute the program is

28. Consider a three word machine instruction

**ADD A[R<sub>0</sub>], @ B**

The first operand (destination) "A[R<sub>0</sub>]" uses indexed addressing mode with R<sub>0</sub> as the index register. The second operand (source) "@B" uses indirect addressing mode. A and B are memory addresses residing at the second and the third words, respectively. The first word of the instruction specifies the opcode, the index register designation and the source and destination addressing modes. During execution of ADD instruction, the two operands are added and stored in the destination (first operand).

The number of memory cycles needed during the execution cycle of the instruction is



then how many one address instructions can be formulated?



| Instruction (log) | CPI |
|-------------------|-----|
| A                 | 1   |
| B                 | 2   |
| C                 | 3   |

| Code sequence | Instruction count |   |   |
|---------------|-------------------|---|---|
|               | A                 | B | C |
| 1             | 2                 | 1 | 2 |
| 2             | 4                 | 1 | 1 |

- (a) Code sequence 1 is faster  
(b) Code sequence 2 is faster  
(c) Both are same  
(d) None of these

33. Consider a 16-bit processor in which the following one address Instruction appears in main memory starting at location 200.

|     |                  |
|-----|------------------|
| 200 | Opcode           |
| 201 | 500              |
| 202 | Next Instruction |
| :   |                  |
| 500 | 999              |

There is also a base register that contains the value 100.

**Match List-I (Mode) with List-II (Effective Address) and select the correct answer using the codes given below the lists:**

| List-I |                 | List-II |     |
|--------|-----------------|---------|-----|
| A.     | Immediate       | 1.      | 600 |
| B.     | Direct          | 2.      | 999 |
| C.     | Memory Indirect | 3.      | 201 |
| D.     | PC-relative     | 4.      | 500 |
| E.     | Base-register   | 5.      | 702 |

Codes:

| A     | B | C | D | E |
|-------|---|---|---|---|
| (a) 3 | 5 | 2 | 4 | 1 |
| (b) 4 | 5 | 2 | 1 | 3 |
| (c) 3 | 4 | 2 | 5 | 1 |
| (d) 3 | 4 | 5 | 1 | 2 |

34. Consider the evaluation of the following expression tree on a machine in which memory can be accessed only through load and store instructions. The variables *a*, *b*, *c*, *d*, *e*, and *f* are initially stored in memory. The binary operators used in this tree can be evaluated by the machine only when the operands are in registers. The instructions produce result only in a register. If no intermediate results can be stored in memory, what is the minimum number of registers needed to evaluate this expression?



- (a) 5  
(b) 4  
(c) 3  
(d) 2

35. A MADEEASY machine runs at a clock frequency of 1 GHz. Two programs CSE and IT executions are listed in the following table with their CPI's and their dynamic frequencies. Assume CSE and IT programs contain a total of 100 and 200 dynamic instructions respectively.

| Instruction | CPI | Frequency   |            |
|-------------|-----|-------------|------------|
|             |     | CSE Program | IT Program |
| Load/Store  | 15  | 30%         | 40%        |
| Add/Sub     | 10  | 20%         | 20%        |
| Mul./Div.   | 15  | 20%         | 10%        |
| Jump/Branch | 5   | 15%         | 20%        |
| Other       | 5   | 15%         | 10%        |

Find the time taken to execute the programs CSE and IT respectively? (In  $\mu$ sec)  
 (a) 0.1 and 2.2      (b) 1.1 and 0.2  
 (c) 0.1 and 1.1      (d) 1.1 and 2.2

36. Consider the following program segment used to execute on a hypothetical processor. Consider all the registers are of 16 bit size

$I_1$  MOV CX,0005 ; CX  $\leftarrow$  0005  
 $I_2$  MOV BX,OFF7H ; BX  $\leftarrow$  OFF7H  
 $I_3$  MOV AX,OBCAH ; AX  $\leftarrow$  OBCAH  
 $I_4$  OR BX,AX ; BX  $\leftarrow$  BX (OR) AX  
 $I_5$  AND DX,AX ; DX  $\leftarrow$  DX (AND) AX  
 $I_6$  LOOP  $I_3$  ; LOOP till CX = 0

Processor clock frequency is 1 MHz in which data transfer operations takes 6 cycles, data manipulation operations takes 4 cycles and transfer of control operations takes 2 cycles to execute. How much time is required to execute the program (in  $\mu$  sec)?

37. A machine has 24 bit instruction format. It has 32 registers and each of which is 32 bit long. It needs to support 49 instructions. Each instruction has two register operands and one immediate operand. If the immediate operand is signed integer, find the minimum value of immediate operand.

38. Consider the following program segment used to execute on a hypothetical processor.

$I_1$  LOAD  $r_1, (r_0)$  ;  $r_1 \leftarrow M[(r_0)]$   
 $I_2$  ADD  $r_1, r_2$  ;  $r_1 \leftarrow r_1 + r_2$   
 $I_3$  AND  $r_3, r_2$  ;  $r_3 \leftarrow r_3 \& \& r_2$   
 $I_4$  ADD  $r_3, r_4$  ;  $r_3 \leftarrow r_3 + r_4$   
 $I_5$  LOAD  $(r_0), r_3$  ;  $M[(r_0)] \leftarrow r_3$   
 $I_6$  SUB  $r_1, r_3$  ;  $r_1 \leftarrow r_1 - r_3$   
 $I_7$  HALT ; HALTS  
 $I_8$  OR  $r_2, r_1$  ;  $R_2 \leftarrow R_2 || R_1$   
 $I_9$  ADD  $r_2, r_1$  ;  $R_2 \leftarrow R_2 + R_1$

In the processor data transfer operations are 64 bit instruction, ALU operations are 32 bit instructions and branch instructions are 16 bit instructions.

Program has been loaded in the memory with a starting address of 2000 decimal onwards. If an interrupt is occurred during the execution of the halt instruction what could be the return address pushed onto the stack.

39. A PC-relative mode branch instruction is 8 byte long. The address of instruction in decimal is '2016'. If signed displacement in the instruction is -11 then what is the branch target address of instruction?
 

|          |          |
|----------|----------|
| (a) 2014 | (b) 2024 |
| (c) 2013 | (d) 2023 |
40. Consider 1GHz clock frequency processor, uses different operand access modes shown below:

| Operand Access Mode | Frequency (%) |
|---------------------|---------------|
| Register            | 20            |
| Immediate           | 20            |
| Memory Indirect     | 40            |
| Auto Indexed        | 20            |

Assume that 8 cycle consumed for memory reference, 4 cycles consumed for arithmetic computation and 0 cycles consumed when the operand is in register instruction itself. What is the average operand fetch rate (in million words/sec) of the processor?

- (a) 117.45 M words/sec
- (b) 113.63 M words/sec
- (c) 217.45 M words/sec
- (d) 316.45 M words/sec

41. The format of double operand instruction of a CPU consist of 5 bits opcode and 6 bits for source and destination. 26 double operand instructions and 184 single operand instructions must be implemented. What will be the total number of zero operand instructions can be implemented?
  - (a) 819200
  - (b) 409600
  - (c) 12800
  - (d) None of these

42. Consider the hypothetical processor which has 256 words memory. A 19 bits instruction is placed in 1 memory cell. It supports 2-address, 1-address and 0-address instructions. It uses expanding opcode technique. If there exist five 2-address instructions and 760 1-address instructions, then number of 0-address instructions are \_\_\_\_\_.


**Answers**   **Basics of Computer Design**

1. (b)
2. (b)
3. (b)
4. (c)
5. (d)
6. (c)
7. (b)
8. (d)
9. (b)
10. (c)
11. (c)
12. (b)
13. (c)
14. (a)
15. (d)
16. (c)
17. (a)
18. (a)
19. (c)
20. (c)
21. (c)
22. (d)
23. (c)
24. (d)
25. (b)
26. (c)
27. (b)
28. (c)
30. (d)
31. (b)
32. (b)
33. (c)
34. (c)
35. (d)
39. (c)
40. (b)
41. (a)

## Explanations Basics of Computer Design

7. (b)

The Register Transfer Language (RTL) is

$$R_3 \leftarrow R_3 + M[R_1 + R_2]$$

In the given statement  $R_1$  is the base of an array and  $R_2$  represents an index so indexed addressing mode is suitable for given RTL.

8. (d)

In given while loop  $i$  is a pointer variable so autodecrement addressing mode is suitable.

11. (c)

The diagram shows, extended addressing mode. In this, the effective memory address directly specified and it is used by some of the processor and address specified is 16 bit address.

12. (b)

Initially

|       |          |         |
|-------|----------|---------|
| V C Z | 1 1 0 0  | 1 1 0 0 |
| 0 0 0 | 1 0 0 0  | 1 1 1 1 |
| After | Addition |         |
| 1 1 0 | 0 1 0 1  | 1 0 1 1 |

V-bit is set to 1 due to arithmetic overflow.

C-bit is set to 1 because most significant digits generates a carry.

Z-bit is set to 0 because the result of addition is not zero.

13. (c)

There are 7 pushes and pops for a cost of 35 bytes plus 3 arithmetic instructions for a total of 38 bytes storage.

After Instruction Stack Contains New Variables

|        |                                 |
|--------|---------------------------------|
| Push b | b                               |
| Push x | b, x                            |
| add    | b + x                           |
| pop c  | c = b + x                       |
| push c | b + x      c = b + x            |
| push y | b + x, Y      c = b + x         |
| add    | b + x + y      c = b + x        |
| push c | b + x + y, b + x      c = b + x |
| sub    | y      c = b + x                |
| pop z  | c = b + x, z = y                |

14. (a)

1. Two address instruction
2. One address instruction
3. Zero address instruction

15. (d)

3 address instruction

Two operand locations and a result location are explicitly contained in the instruction word.

$$\text{e.g. } Y = A - B$$

2 address instruction

One of the addresses is used to specify both an operand and the result location.

$$\text{e.g. } Y = Y + X$$

1 address instruction

Two addresses are implied in the instruction and accumulator based operations.

$$\text{e.g. } ACC = ACC + X$$

0 address instructions

They are applicable to a special memory organization called a stack. It interact with a stack using push and pop operations. All addresses are implied as in register based operations

$$T = \text{Top}(t-1)$$

17. (a)

PUSH 100

PUSH 200

ADD

POP 300

PUSH 100 will push the value at location 100 ( $53_{16}$ ) onto stack.

PUSH 200 will push the value at location 200 ( $4C_{16}$ ) onto stack.

ADD will add the value and push the result onto stack

$$+ \begin{pmatrix} 53 \\ 4C \\ \hline 9F \end{pmatrix}_{16}$$

POP 300 will pop the top value which is (9 F) from stack and store the value at location 300.

18. (a)

The Jump instruction is at address 3010 H and instruction is 2 bytes. Therefore, PC points to 3012 H on execution of this instruction.

Now Branch Target PC = PC + (-7) = 3012 H - 7  
 $H = 300 \text{ BH}$

20. (c)

The given information can be understood as

|    |    |
|----|----|
| 20 | 40 |
| 30 | 50 |
| 40 | 60 |
| 50 | 70 |

Now load indirect 20 will load 60 into as follows



Hence (c) is correct option.

22. (d)

|              |                         |                         |                    |
|--------------|-------------------------|-------------------------|--------------------|
| MOV R1, 3000 | $R1 \leftarrow M[3000]$ | 1 memory reference      |                    |
| Loop:        | MOV R2, R1              | $R2 \leftarrow M[R3]$   | 1 memory reference |
|              | ADD R2, R1              | $R2 \leftarrow R1 + R2$ |                    |
|              | MOV (R3), R2            | $M[R3] \leftarrow R2$   | 1 memory reference |
|              | INC R3                  | $R3 \leftarrow R3 + 1$  |                    |
|              | DEC R1                  | $R1 \leftarrow R1 - 1$  |                    |
|              | BNZ loop                |                         |                    |
|              | HALT                    |                         |                    |

Total memory references = 2 × Number of times loop runs + 1

$$= 2 \times 10 + 1 = 21$$

23. (c)

As the instruction are 24 bit or 3 bytes, the value of program counter at any time should be multiple of 3 starting from 300 like 300, 303, 306 ... from options, '600' is multiple of 3 or is included in above series.

24. (d)

|                                              |                                             |
|----------------------------------------------|---------------------------------------------|
| $M[1000] = 18$                               |                                             |
| $M[1001] = 1$                                |                                             |
| $M[1020] = 16$                               |                                             |
| MOV R <sub>s</sub> , 1                       | $R_s = 1$                                   |
| LOAD R <sub>d</sub> , 1000 (R <sub>s</sub> ) | $R_d \leftarrow M[1000+R_s]$                |
|                                              | $R_d \leftarrow M[1000+1] = M[1001]$        |
|                                              | $R_d \leftarrow 1$                          |
| ADD I R <sub>d</sub> , 1000                  | $R_d \leftarrow R_d + 1000 = 1+1000 = 1001$ |
| STORE I 0(R <sub>d</sub> ), 20               | $M[R_d] = 20$                               |
|                                              | $M[1001] = 20$                              |

25. (b)

|                               |                      |
|-------------------------------|----------------------|
| $A, B, C$ are 8 bit integers. |                      |
| MOV B, # 0                    | $B \leftarrow 0$     |
| MOV C, # 8                    | $C \leftarrow 8$     |
| Z: CMP C, # 0                 | Compare C with 0     |
| JZX                           |                      |
| SUB C, # 1                    | $C \leftarrow C - 1$ |
|                               | $C = 7$              |
| LRCA, # 1                     |                      |
| JCY                           |                      |
| JMPZ                          |                      |
| Y: ADD B, # 1                 | $B \leftarrow B + 1$ |
| JMPZ                          |                      |
| X                             |                      |

After execution of LRCA, #1, JCY is executed and control goes to the instruction Y: ADD B, #1. Control will come to this condition that number of times as number of '1' bits in A. Initial value of B is 0 and when the statement  $B \leftarrow B + 1$  is executed, 1 is added to B.

So finally will contain the value which represents the number of '1' bits in A.

28. (c)

ADD A[R<sub>0</sub>], @ B

A[R<sub>0</sub>] : It uses 2 memory cycles, one to get the value in R<sub>0</sub> and 2<sup>nd</sup> to get contents of A[ ].

@ B : It value uses 2 memory cycles for indirect addressing mode.

Finally, after addition the result is stored back in A[R<sub>0</sub>] and it will take 1 memory cycle as the values are already known (instruction is fetched and decoded).

29. (369)

4 byte instruction storage



$$\text{Effective address} = \text{PC} + \text{Relative value}$$

$$\text{Relative value} = \text{EA} - \text{PC}$$

$$= 885 - 516$$

$$= (369)_{10}$$

30. (d)



$2^{14}$  two address instructions are possible.

Here 400 two addresses are needed.

So  $(2^{14} - 400)$  op-codes are free.

We can store  $(2^{14} - 400) \times 2^9$  one address instructions.

31. (b)



$$N = 2^4 = 16$$

$$\# \text{Free opcodes} = 16 - 2 = 14$$

$$\# \text{1-Address Instruction} = 14 \times 64 = 896$$

$$\text{Free opcodes} = 896 - 60 = 836$$

$$\# \text{Zero Address Instruction} = 836 \times 64 = 53,504$$

32. (b)

$$\text{CPU clock cycles} = \Sigma \text{CPI} \times c,$$

This yields

$$\text{CPU clock cycles}_1 = 2 \times 1 + 1 \times 2 + 2 \times 3 = 10$$

$$\text{CPU clock cycles}_2 = 4 \times 1 + 1 \times 2 + 1 \times 3 = 9$$

So sequence 2 is faster it takes less clock cycles.

33. (c)

$$\text{PC-relative} \rightarrow \text{PC} + 500 = 202 + 500 = 702$$

$$\text{Base-register} \rightarrow 100 + 500 = 600$$

34. (c)



Load  $r_1, e$

Load  $r_2, f$

Add  $r_1, r_2$

Load  $r_2, d$

Sub  $r_2, r_1$

Load  $r_1, a$

Load  $r_3, b$

Add  $r_1, r_3$

Load  $r_3, c$

Sub  $r_1, r_3$

Add  $r_1, r_2$

35. (d)

| Instruction | CPI | # instructions |          | #cycle |      |
|-------------|-----|----------------|----------|--------|------|
|             |     | CSE (100)      | IT (200) | CSE    | IT   |
| Load/Store  | 15  | 30             | 80       | 450    | 1200 |
| Add/Sub     | 10  | 20             | 40       | 200    | 400  |
| Mul./Div.   | 15  | 20             | 20       | 300    | 300  |
| Jump/Branch | 5   | 15             | 40       | 75     | 200  |
| Other       | 5   | 15             | 20       | 75     | 100  |

$$\text{Cycle time} = \frac{1}{1\text{GHz}} = 1 \text{nsec}$$

Total # cycles of CSE = 1100

⇒ Time taken to execute CSE

$$= 1100 * 1 \text{nsec} = 1.1 \mu\text{sec}$$

Total # cycles of IT = 2200

⇒ Time taken to execute CSE

$$= 2200 * 1 \text{nsec} = 2.2 \mu\text{sec}$$

36. (92)

$I_1$  6 cycles

$I_2$  6 cycles

Label  $I_3$  6 cycles

$I_4$  4 cycles

$I_5$  4 cycles

$I_6$  2 cycles

} 5 times

**Example:**

$$\text{Time} = 12 \text{ cycles} + 80 \text{ cycles} = 92 \text{ cycles}$$

$$\text{Cycle time} = \frac{1}{1\text{MHz}} \text{ sec} = 1 \mu\text{sec}$$

$$\text{Program execution time} = 92 \text{ cycles} \times 1 \mu\text{sec} \\ = 92 \mu\text{sec.}$$

**37. (-128)**

49 instructions  $\Rightarrow$  6 bits needed for Op-code  
 32 registers  $\Rightarrow$  5 bits needed for register operands  
 Immediate operand bits = 8  
 Minimum value =  $-2^7 = -128$   
 Since 1 bit is gone for sign representation.

**38. (2032)**

- 2000-2007:  $I_1$
- 2008-2011:  $I_2$
- 2012-2015:  $I_3$
- 2016-2019:  $I_4$
- 2020-2027:  $I_5$
- 2028-2031:  $I_6$
- 2032-2033:  $I_7$

Return address pushed on to the stack is 2032.

**39. (c)**

8 byte instruction storage:



$$\text{Effective address} = \text{PC} + \text{Relative value} \\ = 2024 + (-11) = 2013$$

**40. (b)**

Average of time

$$= ((0.2 \times 0) + (0.2 \times 0) + (0.4 \times 16) + (0.2 \times 12))$$

$$= \{6.4 + 2.4\} = 8.8 \text{ cycles}$$

So, average of time = 8.8 nsec

1 operand 8.8 nsec

Number of number of operands in 1 sec

$$\text{Number of operands} = \frac{1 \text{ operand}}{8.8 \text{ nsec}}$$

$$= 0.113636 \times 10^9 \text{ operand / sec}$$

Operand fetch rate = 113.636 million words/sec

**41. (a)**

Total number of opcode possible with 5 bit = 32

16 opcode for double operand remaining

$$= 32 - 16 = 16$$

$$\text{Remaining} = 2^6 \times 6 = 384$$

Out of 384, 184 single operand instruction, so remaining are  $384 - 184 = 200$

Total number of free instruction (zero operand)

$$= 200 \times 2^6 = 2^6 \times 200 = 12800$$

**42. (2048)**

Instruction format



Since number of words = 256

So number of bits used for address =  $\log_2(256)$

$$= 8 \text{ bits}$$

$$\text{Number of opcodes} = 2^3 = 8$$

$$\text{Number of free opcode} = 8 - 5 = 3$$

Number of 1-address instructions

$$= 3 \times 2^8 = 256 \times 3 = 768$$

Number of free opcode after allotting 1-address instruction =  $[768 - 760] = 8$

Number of 0-address instructions

$$= 8 \times 2^8 = 8 \times 256 = 2048$$

# 2

## CHAPTER

# CPU Design

1. The bus system of a machine has the following propagation delay times: 40 ns for the signals to propagate through the multiplexers, 90 ns to perform the ADD operation in the ALU, 30 ns delay in the destination decoder, and 20 ns to store the data into the destination register. What is the minimum cycle time that can be used for the clock?  
(a) 120 ns      (b) 150 ns  
(c) 160 ns      (d) 180 ns
2. The given circuit represents \_\_\_\_\_ states  
(a)  $n$  states      (b)  $(n - 1)$  states  
(c)  $2^n$  states      (d)  $2^{n-1}$  states
3. If the last operation performed on a computer with an 8-bit word was an addition in which the two operands were 00000010 and 00000011, what would be the value of the Overflow ,Sign and Half-Carry flags respectively?  
(a) 0, 0, 0      (b) 0, 1, 0  
(c) 1, 0, 1      (d) 0, 1, 1
4. Which set of instruction transfers the memory word specified by the effective address to AC or Load to AC?  
(a)  $DR \leftarrow M[AR]$   
 $AC \leftarrow AC + DR, E \leftarrow C_{out}, SC \leftarrow 0$   
(b)  $DR \leftarrow M[AR]$   
 $AC \leftarrow DR, SC \leftarrow 0$   
(c)  $M[AR] \leftarrow AC, SC \leftarrow 0$   
(d)  $DR \leftarrow M[AR]$   
 $AC \leftarrow AC \wedge DR, SC \leftarrow 0$

5. What is the control unit's function in the CPU?  
(a) to decode program instructions  
(b) to transfer data to primary storage  
(c) to perform logical operations  
(d) to store program instructions
6. Which of following registers processor used for fetch and execute operations?
  1. Program counter
  2. Instruction register
  3. Address register

(a) 1 and 3      (b) 1 and 2  
(c) 2 and 3      (d) 1, 2 and 3

The processor used program counter and instruction register for fetch and execute operations.
7. Microinstruction length is determined by \_\_\_\_\_.
  1. The maximum number of simultaneous micro operations that must be specified.
  2. The way in which the control information is represented or encoded.
  3. The way in which the next microinstruction address is specified.

(a) 1 and 2      (b) 2 and 3  
(c) 1 and 3      (d) all of these
8. Following are some statements associated with microprocessors. Identify the false statement.
  - (a) The control unit is the component of the CPU that implements the microprocessors instruction set.
  - (b) The instruction received by the CPU is decoded by the arithmetic logic unit.
  - (c) A CPU register is used to hold a binary value temporarily for storage, for manipulation, and/or for simple calculations.
  - (d) The control units are programmed and not hardwired.



The above circuit represents

1. Hard-wired control unit  
2. Micro programmed control unit  
(a) Only 1                   (b) Only 2  
(c) Both 1 and 2           (d) Either 1 or 2

15. The sequence of events that happen during a typical fetch operation is
  - (a) PC → Mar → Memory → MDR → IR
  - (b) PC → Memory → MDR → IR
  - (c) PC → Memory → IR
  - (d) PC → Mar → Memory → IR
  
16. Determine the width of Micro-instruction having following Control signal field, in a Vertical Micro-programmed Control Unit
  1. Next Address field of 7 Bits
  2. ALU Function field selecting 1 out of 13 ALU Function
  3. Register-in field selecting 1 out of 13 ALU Function
  4. Register-out field selecting 1 out of 8 Registers
  5. Shifter field selecting no shift, right shift or left shift
  6. Auxiliary control field of 4 bits
    - (a) 22                          (b) 23
    - (c) 24                          (d) 25
  
17. Hardwired control units are faster than Micro-programmed control unit because
  - (a) They do not consist of slower memory elements
  - (b) They do not have slower element such as Gates and Flip-flops
  - (c) They are made using faster VLSI design technology
  - (d) They contain high speed digital components
  
18. The microinstructions stored in the control memory of a processor have a width of 26 bits. Each microinstruction is divided into three fields: a micro-operation field of 13 bits, a next address field ( $X$ ), and a MUX select field ( $Y$ ). There are 8 status bits in the inputs of the MUX.



How many bits are there in the  $X$  and  $Y$  fields, and what is the size of the control memory in number of words?

- (a) 10, 3, 1024                  (b) 8, 5, 256
- (c) 5, 8, 2048                   (d) 10, 3, 512

19. An instruction set of a processor has 125 signals which can be divided into 5 groups of mutually exclusive signals as follows:

Group 1 : 20 signals, Group 2 : 70 signals, Group 3 : 2 signals, Groups 4 : 10 signals, Group 5 : 23 signals.

How many bits of the control words can be saved by using vertical microprogramming over horizontal microprogramming?

- (a) 0                                  (b) 103
- (c) 22                                (d) 55

20. A hardwired CPU uses 10 control signals  $S_1$  to  $S_{10}$  in various time steps  $T_1$  to  $T_5$  implement 4 instructions  $I_1$  to  $I_4$  as shown below.

|       | $T_1$           | $T_2$              | $T_3$           | $T_4$      | $T_5$      |
|-------|-----------------|--------------------|-----------------|------------|------------|
| $I_1$ | $S_1, S_3, S_5$ | $S_2, S_4, S_6$    | $S_1, S_7$      | $S_{10}$   | $S_3, S_8$ |
| $I_2$ | $S_1, S_3, S_5$ | $S_8, S_9, S_{10}$ | $S_5, S_6, S_7$ | $S_6$      | $S_{10}$   |
| $I_3$ | $S_1, S_3, S_5$ | $S_7, S_8, S_{10}$ | $S_2, S_6, S_9$ | $S_{10}$   | $S_1, S_3$ |
| $I_4$ | $S_1, S_3, S_5$ | $S_2, S_6, S_7$    | $S_2, S_{10}$   | $S_6, S_9$ | $S_{10}$   |

Which of the following pairs of expressions represent the circuit for generating control signals  $S_5$  and  $S_{10}$  respectively [ $I_j + I_k \cdot T_n$ ]  $T_n$  indicates that the control signal should be generated in time step  $T_n$  if the instruction being executed is  $I_j$  or  $I_k$ ?

- (a)  $S_5 = T_1 + T_2 \cdot T_3$  and  $S_{10} = (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$
- (b)  $S_5 = T_1 + (I_2 + I_4) \cdot T_3$  and  $S_{10} = (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$
- (c)  $S_5 = T_1 + (I_2 + I_4) \cdot T_3$  and  $S_{10} = (I_2 + I_3 + I_4) \cdot T_2 + (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$
- (d)  $S_5 = T_1 + (I_2 + I_4) \cdot T_3$  and  $S_{10} = (I_2 + I_3) \cdot T_2 + I_4 \cdot T_3 + (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$

21. In a register/memory type CPU, the instruction lengths are typically variable. This presents a problem when the program counter (PC) of the CPU is incremented during the fetch-execute cycle. Which statements is/are true with regard to PC incrementing?

- (a) PC is incremented by the largest possible fixed value, irrespective of the variability.

(b) Increment value is known when the current instruction is decoded within the IR.

(c) Increment value is known when the current instruction has completed execution.

(d) The binary loader overcomes the problem by positioning instructions at word boundaries so that PC can be incremented by a constant amount.

22. Which statement is false in case of microprogram control?

(a) In a microprogram control, the control variables that initiate microoperations are stored in memory. The control memory is usually a ROM, since the control sequence is permanent and needs no alteration

(b) The control variables stored in memory are read one at a time to initiate the sequence of microoperations for the system

(c) The words stored in a control memory are micro instructions, and each micro instruction specifies one or more microoperations for the components in the system

(d) Once these microoperations are executed, the control unit must determine its next address. Therefore, all bits of the micro instruction are used to control the generation of the address for the next micro instruction

23. Consider the following data path of a CPU

The, ALU, the bus and all the registers in the data path are of identical size. All operations including incrementation of the PC and the GPRs are to be carried out in the ALU. Two clock cycle are needed for memory read operation – the first one for loading address in the MAR and the next one for loading data from the memory but into the MDR.

The instruction "add R0, R1" has the register transfer interpretation  $R0 \leftarrow R0 + R1$ . The minimum number of clock cycles needed for execution cycle of this instruction is



25. Assume that the control memory is 30 bits wide. The microinstruction format is divided into 3 fields. A micro operation field of 15 bits specifies the micro-operations to be performed. An address selection field specifies a condition based on flags and control memory address field. There are 8 flags. How many bits are there in address selection field, address field and what is the size of control memory in words respectively?

(a) 2, 12, 2048      (b) 2, 10, 2048  
(c) 3, 12, 4096      (d) 3, 10, 1024



**Answers CPU Design**

1. (b) 2. (c) 3. (a) 4. (b) 5. (a) 6. (b) 7. (d) 8. (b) 9. (c)  
 10. (c) 11. (b) 12. (a) 13. (d) 14. (c) 15. (a) 16. (b) 17. (a) 18. (a)  
 19. (b) 20. (d) 21. (c) 22. (d) 23. (b) 25. (c)

**Explanations CPU Design**

1. (b)  
 $(40 + 90 + 20) = 150 \text{ ns}$   
 Selection of destination register that takes 30 ns can be done parallelly with other operations.
3. (a)
- |       |   |   |   |   |   |   |   |
|-------|---|---|---|---|---|---|---|
| 0     | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0     | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| <hr/> |   |   |   |   |   |   |   |
| 0     | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
- Overflow = 0, sign = 0, Half carry = 0  
 Half carry indicate addition of packed decimal numbers. When carry takes place out of the lower digit order, this flag is set.  
 Auxiliary carry is also known as half carry.
6. (b)  
 The processor used program counter and instruction register for fetch and execute operations.
10. (c)  
 Initially  $R_0 \leftarrow X, R_1 \leftarrow Y, R_2 \leftarrow Z$   
 The sequence of operations to compute  $XY + XYZ + YZ$  is as follows
- | Operation           | Meaning                        | # clock cycles |
|---------------------|--------------------------------|----------------|
| MUL $R_0, R_1, R_3$ | $R_3 \leftarrow XY$            | 1              |
| MUL $R_1, R_2, R_5$ | $R_5 \leftarrow YZ$            | 1              |
| MUL $R_2, R_3, R_4$ | $R_4 \leftarrow XYZ$           | 1              |
| ADD $R_3, R_4, R_6$ | $R_6 \leftarrow XY + XYZ$      | 1              |
| ADD $R_5, R_6, R_7$ | $R_7 \leftarrow XY + XYZ + YZ$ | 2              |
- Total 6 clock cycles
11. (b)
- | Instructions      | No. of clock cycles |
|-------------------|---------------------|
| $AC := M(X)$      | LOAD A 1            |
| $AC := AC + M(Y)$ | ADD Y 1             |
| $M(Z) := AC$      | STORE Z 1           |
- Total = 3
13. (d)  
 Horizontal microinstructions have following attributes  
 1. Long formats  
 2. Ability to express a high degree of parallelism  
 3. Little encoding of the control information
14. (c)  
 Since PLA is neither a hard-wired model nor a memory.  
 $\therefore$  We can say control unit with PLA as Hard-wired or micro programmed control unit.
16. (b)  
 Next Address Field = 7 Bits  
 ALU Function Field is of 4 Bits (to select possible 1 of 13 ALU fns)  
 Register-in and Register-out Fields of 3 Bits each  
 Shifter Field of 2 bits to select 1 out of three possibilities  
 Auxiliary Field of 4 Bits  
 Total Bits in  $\mu$ -instruction  
 $= 7 + 4 + 3 + 3 + 2 + 4 = 23$
17. (a)  
 Hardwired CU use digital logics while Micro-programmed CU uses Control Store Gate propagation delays are much less as compared to Memory access times of control stores.
19. (b)  
 $G1 : 20 \text{ signals}$     $G2 : 70 \text{ signals}$   
 $G3 : 2 \text{ signals}$     $G4 : 10 \text{ signals}$   
 $G5 : 23 \text{ signals}$   
 Using horizontal microprogramming, 125 bits of control words are used. But using vertical microprogramming, the number of bits according to groups are:  
 $G1 : 20 \text{ signals} = 5 \text{ bits}$

**G2** : 70 signals = 7 bits

**G3** : 2 signals = 1 bits

**G4** : 10 signals = 4 bits

**G5** : 23 signals = 5 bits

Total = 22 bits

Total 22 bits are needed for vertical micro-programming.

So, total bits saved =  $125 - 22 = 103$

23. (b)

Add  $R_0, R_1$

$$R_0 \leftarrow R_0 + R_1$$

|  |  |  |
|--|--|--|
|  |  |  |
|--|--|--|

1 1 1 clock cycles.

Total 3 clock cycles are needed to execute the instruction.

24. (1.27)

Let total operation be 100 and total time taken is  $t = 100$  sec.

$$\text{Floating point operations are } = \frac{6}{10} \times 100 = 60$$

$$\text{Fixed point operations are } = \frac{4}{10} \times 100 = 40$$

Let time taken by floating point is  $t_1$  and time taken by fixed point is  $t_2$ .

$$\text{So } t_1 + t_2 = 100 \quad \dots(1)$$

Time taken by 60 floating point operation =  $t_1$

$$1 \text{ floating operation} = \frac{t_1}{60}$$

$$\text{Similarly 1 fixed point operation takes} = \frac{t_2}{40}$$

$$\therefore \frac{t_1}{60} = 2 \frac{t_2}{40}$$

$$2t_1 = 6t_2 \quad \dots(2)$$

From equation 1 and 2

$$t_1 = \frac{600}{8} \text{ and } t_2 = \frac{200}{8}$$

After speed-up

$$t'_1 = \frac{t_1}{1.3} = \frac{600}{1.3 \times 8} = \frac{6000}{104}$$

$$t'_2 = \frac{t_2}{1.2} = \frac{100}{1.2 \times 8} = \frac{2000}{96}$$

$$t' = t'_1 + t'_2 = \frac{6000}{104} + \frac{2000}{96}$$

$$S_2 = \frac{t_1}{t'} \times S_1$$

$$S_2 = \frac{100}{\left[ \frac{6000}{104} + \frac{2000}{96} \right]} \times S_1$$

$$= \frac{100}{57.69 + 20.83} \times S_1$$

$$S_2 = \frac{100}{78.52} \times S_1 = 1.27 S_1$$

25. (c)



$$\text{CM size} = 2^{12} \text{ CW} = 4096 \text{ CW}$$

# 3

## CHAPTER

# Instruction Pipelining

1. Pipelining improves CPU performance due to
  - (a) reduced memory access time
  - (b) increased clock speed
  - (c) the introduction of parallelism
  - (d) additional functional units
2. An instruction cycle refers to
  - (a) fetching an instruction
  - (b) clock speed
  - (c) fetching, decoding and executing an instruction
  - (d) executing an instruction
3. A 5 stage pipeline with the stages taking 1, 1, 3, 1, 1, units of time has a throughput of
  - (a) 1/3
  - (b) 1/7
  - (c) 7
  - (d) 3
4. Given a 5 stage pipeline with stages taking 1, 2, 3, 1, 1 units of time, the clock period of the pipeline is
  - (a) 8
  - (b) 1/8
  - (c) 1/3
  - (d) 3
5. Which of the following statements is false with regard to instruction pipelining?
  - (a) The basic goal of instruction pipelining is to achieve a CPI of 1.
  - (b) Instruction set architectures having simple registry/register addressing modes can be easily pipelined than those having complex addressing modes.
  - (c) By the use of hardware special features and compiler design techniques, pipeline hazards can be removed.
  - (d) The basic goal of instruction pipelining is to achieve a CPI of more than 1.
6. The performance of a pipelined processor suffers if
  - (a) the pipeline stages have different delays
  - (b) consecutive instructions are dependent on each other
  - (c) the pipeline stages share hardware resources
  - (d) all of the above
7. Consider an unpipelined processor assume that it has a 1 ns clock cycles and that it uses 4 cycles for ALU operation and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose due to clock skew and setup, pipelining the processor adds 0.2 ns of overhead to the clock. Ignore any latency impact. What speedup gain after pipelined the processor?
  - (a) 3.4 ns
  - (b) 3.8 ns
  - (c) 3.7 ns
  - (d) 1.2 ns
8. Consider a case where ( $k$ ) 4-segment pipeline with a clock cycle time ( $tp$ ) 20 ns in each sub operation to execute ( $n$ ) 100 tasks. Assume that a non pipeline unit that can perform the same operation. Pipelined system will take how much time to complete the task?
  - (a) 2060 ns
  - (b) 2000 ns
  - (c) 20 ns
  - (d) 1901 ns
9. Pipelined operation is interrupted whenever two operations being attempted in parallel need same hardware component. Such a conflict can occur if
  - (a) The execution phase of an instruction requires access to the main memory, which also contains the next instruction that should be fetched at the same time
  - (b) The prefetch phase of an instruction requires access to the main memory, which also contains the next instruction that should be fetched at the same time

- (c) The decode phase of an instruction requires access to the main memory, which also contains the next instruction that should be fetched at the same time  
 (d) None of these
10. Consider the unpipelined machine with 10 ns clock cycles. It uses four cycles for ALU operations and branches, whereas five cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20%, 40% respectively. Suppose that due to clock skew and setup, pipelining the machine adds 1 ns overhead to the clock. How much speed up in the instruction execution rate will we gain from a pipeline?  
 (a) 5 times                                 (b) 8 times  
 (c) 4 times                                     (d) 4.5 times
11. Using a sequential implementation, it takes a total of 320 ns for each instruction, 300 ns for the combinational logic to complete, and 20 ns to store the result (in a register). This means that a throughput will be about 3.12 millions instructions/second. Assuming you switch to a 3 stage pipeline by splitting the combinational logic into three equal parts and all registers take 20 ns to store results.
- 
- How long will it take for a single instruction to execute in the pipelined implementation?  
 (a) 300 ns                                     (b) 360 ns  
 (c) 100 ns                                     (d) 380 ns
12. A 5 stage pipeline is used to overlap all the instructions except the branch instructions. The target of the branch can't be fetched till the current instruction is completed. What is throughput of the system if 20% of instructions are branch instructions? Ignore the overhead of buffer register. Each stage is having same amount delay. The pipeline clock is 10 ns. Branch penalty is of 4 cycles.
13. Assume that the time required for the eight functional units, which operate in each of the eight cycles, are as follows :  
 5 ns, 8 ns, 6 ns, 10 ns, 15 ns, 12 ns, 6 ns, 8 ns. Assume that pipe lining adds 1 ns of overhead. Find the speedup versus the single cycle data path.  
 (a) 4.67                                     (b) 4.375  
 (c) 4.44                                     (d) 4.285
14. Consider the unpipelined machine with 12 ns clock cycles. It uses four cycles for ALU operations and branches, whereas five cycles for memory operations. Assume that the relative frequencies of these operations are 30%, 20%, 20% respectively. Suppose that due to clock skew and setup, pipelining the machine adds 1 ns overhead to the clock. How much speed up in the instruction execution rate will we gain from a pipeline?  
 (a) 2.0 times                                     (b) 3.1 times  
 (c) 2.8 times                                     (d) 3.5 times
15. In a 32 bit machine to execute an instruction the following steps are carried out: Fetch, Decode, Execute and Store, each of which takes one clock period. In a pipelined execution of a four-step task a new instruction is read and its nanosecond and there are 100 instructions in sequence then what is the speedup ratio of pipeline processing system over an equivalent non-pipeline processing system?  
 (a) 3.88                                     (b) 1.88  
 (c) 3.68                                     (d) 2.723
16. A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be  
 (a) 120.4 microseconds  
 (b) 160.5 microseconds  
 (c) 165.5 microseconds  
 (d) 590.0 microseconds

17. Consider a pipeline processor with 4 stages S1 to S4. We want to execute the following loop:

for ( $i = 1$ ;  $i \leq 1000$ ;  $i++$ ) {I1, I2, I3, I4}

Where the time taken (in ns) by instructions I1 to I4 for stages S1 to S4 are given below.

|      | S1 | S2 | S3 | S4 |
|------|----|----|----|----|
| I1 : | 1  | 2  | 1  | 2  |
| I2 : | 2  | 1  | 2  | 1  |
| I3 : | 1  | 1  | 2  | 1  |
| I4 : | 2  | 1  | 2  | 1  |

The output of I1 for  $i = 2$  will be available after

- (a) 11 ns                          (b) 12 ns  
(c) 13 ns                            (d) 28 ns

18. The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

- (a) 10                                (b) 12  
(c) 14                                (d) 16

19. We have two designs D1 and D2 for a synchronous pipeline processor. D1 has 5 pipeline stages with execution times of 3 nsec, 2 nsec, 4 nsec, 2 nsec and 3 nsec while the design D2 has 8 pipeline stages each with 2 nsec execution time. How much time can be saved using design D2 over design D1 for executing 100 instructions?

- (a) 214 nsec                        (b) 202 nsec  
(c) 86 nsec                            (d) 200 nsec

20. Suppose that an unpipelined processor has a cycle time of 25ns, and that its datapath is made up of modules with latencies of 2, 3, 4, 7, 3, 2 and 4 ns (in that order). In pipelining this processor, it is not possible to rearrange the order of the modules (for example, putting the register read stage before the instruction decode stage) or to divide a module into multiple pipeline stages (for complexity reasons). Given pipeline latches with 1 ns latency:

If the processor is divided into the fewest number of stages that allow to achieve the minimum latency from part 1, what is the latency of the pipeline?

- (a) no latency                      (b) 35 ns latency  
(c) 40 ns latency                    (d) 56 ns latency

21. We have two designs  $P_1$  and  $P_2$  for a synchronous pipeline processor.  $P_1$  has 8 pipeline stages with execution time of 3 nsec, 2 nsec, 4 nsec, 8 nsec, 2 nsec, 5 nsec, 4 nsec and 1 nsec while design  $P_2$  has 5 stages each with 5 nsec execution time. How much time (in  $\mu$ sec) can be saved using design  $P_2$  over design  $P_1$  for executing 500 instructions? (upto 3 digit).

- (a) 2.536                            (b) 1.365  
(c) 1.536                            (d) 1.653

22. A branch mark program is running on a 40 MHz processor. The executed program consists of 100,000 instruction executions, with the following instruction mix and clock cycle count.

| Instruction Type   | Instruction Count | Cycles per Instruction |
|--------------------|-------------------|------------------------|
| Integer arithmetic | 45000             | 1                      |
| Data transfer      | 32000             | 2                      |
| Floating point     | 15000             | 2                      |
| Control transfer   | 8000              | 2                      |

The execution time in msec is \_\_\_\_\_.

23. Consider the following instructions.

- $I_1 : R_1 = 100$   
 $I_2 : R_1 = R_2 + R_4$   
 $I_3 : R_2 = R_4 + 25$   
 $I_4 : R_4 = R_1 + R_3$   
 $I_5 : R_1 = R_1 + 30$

Calculate sum of (WAR, RAW and WAW) dependencies the above instructions.

- (a) 10                                (b) 12  
(c) 7                                    (d) 8

- Q.24 Consider a pipelined system with four stages: IF, ID, EX, WB. Following chart shows the clock cycles required by each instruction to complete each stage.

| Instructions | Instruction Fetch (IF) | Instruction Decode (ID) | Instruction Execute (EX) | Write Back (WB) |
|--------------|------------------------|-------------------------|--------------------------|-----------------|
| $I_0$        | 1                      | 1                       | 2                        | 1               |
| $I_1$        | 2                      | 2                       | 3                        | 1               |
| $I_2$        | 2                      | 2                       | 2                        | 2               |
| $I_3$        | 2                      | 1                       | 1                        | 1               |
| $I_4$        | 3                      | 2                       | 1                        | 2               |

How many clock cycles are required to complete the above instructions?

- (a) 16                                  (b) 9  
 (c) 14                                    (d) 13

25. Suppose that in 1000 memory references there are 150 misses in first level and 100 miss in second level cache. Assume that miss penalty from  $L_2$  cache to memory is 120 cycles. The hit time of  $L_2$  cache is 50 cycles.

If there are 4 memory references per instruction, the average stall per instruction is \_\_\_\_\_.

26. Two processor M-5 and M-7 implement the same instruction set. Processor M-5 uses a 5-stage pipeline and a clock cycle of 10 nanoseconds. Processor M-7 uses a 7-stage pipeline and a clock cycles of 5 nanoseconds. Which is true?

1. M-7's pipeline has better maximum throughput than M-5's pipeline.
  2. The latency of a single instruction is shorter on M-7's pipeline than on M-5's pipeline.
  3. Program executing on M-7 will always run faster than program executing on M-5.
- (a) 1 only                                (b) 2 only  
 (c) 1 and 3 only                        (d) 1, 2 and 3

27. Consider an instruction pipeline with four stages ( $S_1, S_2, S_3$  and  $S_4$ ) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure.



What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

- (a) 4.0                                    (b) 2.5  
 (c) 1.1                                    (d) 3.0

28. Consider a 5 stages instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle time overhead of pipelining. When application is executed on 5 stage pipeline, then how many instructions (in %) incur 3 pipeline stall cycles if the speedup achieved with respect to non pipeline is 3?

29. Assume the individual stages of the data path have the following latencies.

|                         |         |
|-------------------------|---------|
| Instruction Fetch (IF)  | : 10 ns |
| Instruction Decode (ID) | : 8 ns  |
| Execution (EX)          | : 6 ns  |
| Memory (MEM)            | : 9 ns  |
| Write Back (WB)         | : 5 ns  |

Find the clock cycle time in a non-pipelined (single cycle) processor?

- (a) 10 ns                                (b) 18 ns  
 (c) 28 ns                                (d) 38 ns

30. Consider a pipeline 'x' consist of 5 stages named as IF, ID, OF, EX and WB with the respective stage delays of 2 ns, 5 ns, 6 ns, 8 ns and 1 ns. The alternative pipeline 'y' contain the same number of stages but EX stage is divided into 4 sub stages, (EX1, EX2, EX3 and EX4) with equal delay i.e. (8 ns/4) and ID stage is divided into 2 sub stages (ID1 and ID2) with equal delays of (5 ns/2). In the pipeline x and y memory reference instructions are not overlapped so the penalty of memory reference instructions in the pipeline 'x' is 4 cycles and in the pipeline 'y' is 8 cycles. If the program contain 30% of the instructions which are memory based instructions, what is the ratio of speedup of x to speedup of y?

31. Consider execution of 100 instructions on a 5 stage pipeline. Let P be the probability of an instruction being a branch. What must be the value of P such that speed up is atleast 4?

(Assume each stage takes 1 cycle to perform its task and branch is predicted on fourth stage of the pipeline).

32. A 5 stage pipelined processor has IF, ID, EXE, MEM and WB. WB stage operation is divided into two parts. In the first part register write operation and in the second part register read operation is performed. The latencies of all those stages are 300, 400, 500, 500 and 300 (in nano second) respectively. Consider the following code is executed on this processor

$I_1 : ADD R_3, R_2, R_4; R_3 \leftarrow R_2 + R_4$   
 $I_2 : SUB R_6, R_4, R_3; R_6 \leftarrow R_4 - R_3$   
 $I_3 : ADD R_7, R_5, R_3; R_7 \leftarrow R_5 + R_3$   
 $I_4 : SUB R_1, R_7, R_4; R_1 \leftarrow R_7 - R_4$

Find minimum number of nop instructions (no operation) to eliminate hazards without using operand forwarding. (Assume each instruction takes one cycle to complete its operation in every stage).

33. Assume  $SUB x, y, z$ ; denote  $x \leftarrow y - z$ ,  $ADD x, y, z$ ; denote  $x \leftarrow y + z$ . Consider the following assembly code given below:

ADD,  $R_1, R_2, R_3$ ;  
 SUB  $R_3, R_1, R_2$ ;  
 ADD  $R_2, R_1, R_3$ ;  
 ADD  $R_1, R_2, R_3$ ;

What is the number of (RAW) read after write dependencies in the execution of the above code?

- (a) 2                                  (b) 3  
 (c) 4                                    (d) 5

34. A non-pipeline processor has a clock rate 3 GHz and an average CPI of 4. An upgrade to the processor introduce 5 stage pipeline. However due to internal delay the clock rate of the new processor has to be reduces to 2 GHz. What is the speed-up of pipeline over non-pipeline?

- (a) 3.1                                (b) 3.3  
 (c) 3.5                                (d) 3.8

35. An instruction pipeline consists of following 5 stages:

IF = Instruction Fetch,  
 ID = Instruction Decode  
 EX = Execute  
 MA = Memory Access and  
 WB = Register Write Back.

Now consider the following code:

1. LOAD  $R_8, 0(R_5); R_8 = \text{Memory}[R_5]$
2. LOAD  $R_9, 4(R_5); R_9 = \text{Memory}[R_5 + 4]$
3. ADD  $R_7, R_8, R_9; R_7 = R_8 + R_9$
4. SUB  $R_6, R_7, R_8; R_6 = R_7 - R_8$

Assume that each stage takes 1 clock cycle for all the instructions. How many cycles are required to execute the code, without operand forwarding over a bypass network?

- (a) 9                                    (b) 10  
 (c) 11                                (d) 14

36. Consider a 6 stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle time overhead of pipelining when an application is executing on this 6 stage pipeline. If speedup achieved with respect to non-pipeline execution is 3, then what % of instructions incur 2 pipeline stall cycle \_\_\_\_\_.

37. Consider a machine with 20 nsec clock and it takes 10 clock cycle per ALU instructions, 10 clock cycles per branch instructions, 12 clock cycles per memory instructions. There exist 40% ALU instructions, 20% branch instructions and 40% memory instructions. If 12 nsec overhead is present in the pipeline implementation then the speedup achieved by system \_\_\_\_\_ (upto 2 decimal place).

38. Assume that execution of 200 instruction on a 6 stage pipeline where target address will be available at 4<sup>th</sup> stage. Let X be the probability of an instruction not being branch. The value of such that speedup is atleast 5 is \_\_\_\_\_.

39. Assume a stage pipeline with the following stages:

- $S_1$ : Instruction fetch  
 $S_2$ : Instruction decode  
 $S_3$ : Execute  
 $S_4$ : Memory write  
 $S_5$ : Write back

Program consists of 16 instructions ( $I_1, I_2, I_3, \dots, I_{16}$ ). In which  $I_6$  is a unconditional branch instruction transfer the controls of  $I_{12}$ . In the pipeline, branch target address will be available

at the end of execute state. Each instruction spends the same amount of time in all the pipeline stages. The cycle time of the pipeline is 10 nsec. What is the time (in nsec) required to execute the above program without using branch prediction?

- (a) 170                          (b) 180  
(c) 200                            (d) 140

### Answers Instruction Pipelining

1. (c)    2. (c)    3. (a)    4. (d)    5. (d)    6. (d)    7. (c)    8. (a)    9. (a)  
10. (c)    11. (b)    12. (a)    13. (b)    14. (c)    15. (a)    16. (c)    17. (c)    18. (a)  
19. (b)    20. (c)    21. (c)    23. (c)    24. (a)    26. (d)    27. (b)    29. (d)    33. (b)  
34. (b)    35. (d)    39. (a)

### Explanations Instruction Pipelining

#### 1. (c)

In pipelining, a number of functional units are employed in sequence to perform a single computation.

#### 3. (a)

First output will come after 5 time units, then after every 3 time units, we will get the output.  
So throughput = 1/3.

#### 4. (d)

Clock period of pipeline = Maximum delay of stages  
= Max (1, 2, 3, 1, 1) = 3

#### 6. (d)

The performance of a pipelined processor depends upon delays of different stage and its hardware resources also it depends upon consecutive instructions format.

#### 7. (c)

Average instruction time pipelined

$$= 1 + 0.2 = 1.2 \text{ ns}$$

$$\text{Speedup from pipelined} = \frac{4.4}{1.2} = 3.7 \text{ ns}$$

#### 9. (a)

Pipelined operation is interrupted whenever two operations being attempted in parallel need the same hardware component, such a conflict can occur if the execution phase of an instruction requires access to the main memory, which also contains the next instruction that should be fetched at the same time.

#### 10. (c)

Average instruction execution time

$$\begin{aligned} &= \text{Clock cycle} * \text{Average CPI} \\ &= 10 \text{ ns} * [40\% + 20\%] * 4 + 40\% * 5 \\ &= 10 \text{ ns} * 4.4 \\ &= 44 \text{ ns} \end{aligned}$$

In the pipelined implementation, the clock must run at the speed of the slowest stage plus overhead which will be 10 + 1 or 11 ns, this is the average instruction execution time. Thus, the speed up from pipelining is (Speed up) pipelining

$$\begin{aligned} &= \frac{\text{Average instruction time unpipelined}}{\text{Average instruction time pipelined}} \\ &= \frac{44 \text{ ns}}{11 \text{ ns}} = 4 \text{ times} \end{aligned}$$

11. (b)

Pipelined version will take  $3 \times 100 + 3 \times 20 = 360$  ns per instruction.

12. (a)

$$T_{\text{avg}} = (1 + \text{stall freq} * \text{stall cycles}) * T_{\text{clk}} = (1 + 20\% * 4) * 10 \text{ n.s} = 18 \text{ n.s}$$

∴ Throughput = Number of instructions executed/sec

18 ns ..... one instruction

$10^9$  ns .....?

$$\Rightarrow \frac{10^9}{18} = 55 \text{ Mips}$$

13. (b)

Since the unpipelined machine executes all instructions in a single clock cycle, its average time per instruction is simply clock time. The clock is equal to the sum of the times for each step in the execution.

$$\therefore \text{Average instructions execution time} = 5 + 8 + 6 + 10 + 15 + 12 + 6 + 8 = 70$$

The clock cycle time on the pipelined machine must be the largest time for any stage in the pipeline (15 ns) + the overhead of 1 ns for a total of 16 ns.

∴ Speed from pipelining

$$= \frac{\text{Average instruction time unpipelined}}{\text{An instruction time pipelined}} = \frac{70 \text{ ns}}{16 \text{ ns}} = 4.375 \text{ times}$$

14. (c)

Average instruction execution time

= Clock cycle \* Average CPI

$$= 12 \text{ ns} * [(30\% + 20\%) * 4] + (20\% * 5)$$

$$= 12 \text{ ns} * 3 = 36 \text{ ns}$$

15. (a)

Total number of stages =  $K = 4$ .

Total number of instructions/tasks =  $n = 100$ .

$$\text{Speedup} = \frac{nk}{n+k-1} = \frac{100 \times 4}{100+4-1}$$

$$= \frac{400}{103} = 3.88$$

16. (c)

Number of stages =  $K = 4$

( $\tau_i$ ) stage delays = 150, 120, 160, 140 ns

(d) register delay = 5 ns

$$\text{Cycle time } \tau = \text{Max } (\tau_i) + d = 160 + 5 = 165 \text{ ns}$$

Number of instructions =  $n = 1000$

$$T_k = (k + n - 1)\tau$$

$$= (4 + 1000 - 1)165 \text{ ns}$$

$$= 1003 \times 165 \text{ ns}$$

$$\text{or } = \frac{1003 \times 165}{1000} \mu\text{s} = 165.5 \mu\text{s}$$

17. (c)

For ( $i = 1, i \leq 1000; i++$ )  $\{I_1, I_2, I_3, I_4\}$

|                  | S <sub>1</sub> | S <sub>2</sub> | S <sub>3</sub> | S <sub>4</sub> |
|------------------|----------------|----------------|----------------|----------------|
| I <sub>1</sub> : | 1              | 2              | 1              | 2              |
| I <sub>2</sub> : | 2              | 1              | 2              | 1              |
| I <sub>3</sub> : | 1              | 1              | 2              | 1              |
| I <sub>4</sub> : | 2              | 1              | 2              | 1              |

For  $i = 2$ , the output of  $I_1$  will be available after:

| 1              | 2              | 3              | 4              | 5              | 6              | 7              | 8              | 9              | 10             | 11             | 12             | 13             | 14             | 15 |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----|
| I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>5</sub> | I <sub>1</sub> | I <sub>1</sub> | I <sub>1</sub> |                |                |                |                |                |    |
| S <sub>2</sub> |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |                |                |                |                |                |                |                |                |    |
| S <sub>3</sub> |                |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>4</sub> | I <sub>1</sub> | I <sub>1</sub> | I <sub>1</sub> |                |    |
| S <sub>4</sub> |                |                |                | I <sub>1</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>3</sub> | I <sub>4</sub> | I <sub>1</sub> | I <sub>1</sub> |    |

$i = 1$

$i = 2$

After 13 clock cycles, output of  $I_1$  will be available for  $i = 2$ .

19. (b)

D1: Number of stages,  $k = 5$ .

$$\tau_i = 3, 2, 4, 2, 3 \text{ ns}$$

$i = 1 \text{ to } 5$

$$\tau = \max(\tau_i) + d \text{ here } d \text{ is negligible}$$

$$\tau = 4 \text{ ns}$$

D2: Number of stages,  $k = 8$

$$\tau_i = 2 \text{ ns each}$$

$i = 1 \text{ to } 8$

$$\tau = \max(\tau_i) = 2 \text{ nsec}$$

Number of instructions,  $n = 100$ .

$$\text{D1: } T_k = (n + k - 1)\tau$$

$$= (100 + 5 - 1) \times 4$$

$$= 104 \times 4 = 416 \text{ ns}$$

$$\text{D2: } T_k = (n + k - 1)\tau = (100 + 8 - 1) \times 2$$

$$= 107 \times 2 = 214 \text{ ns}$$

$$\text{Total time saved} = 416 - 214 = 202 \text{ ns}$$

21. (c)

Execution time for pipeline =  $(k + n - 1) \times tp$   
 where  $k$  = Number of stages  
 $n$  = Number of instruction  
 $tp$  = Execution time = Max (all stages)  
 $P_1 = [8 + 500 - 1] \times 8 = 4056$   
 $P_2 = [5 + 500 - 1] \times 5 = 2520$   
 Time saved using  
 $P_2 = 4056 - 2520 = 1536 \text{ nsec}$   
 $= 1.536 \mu\text{sec}$

22. (3.87) [3.86-3.88]

$$\begin{aligned} \text{CPI} &= \frac{\sum (J_i \times CP_i)}{I_C} \\ &\approx \frac{[45000 \times 1 + 32000 \times 2 + 15000 \times 2 + 8000 \times 2]}{10^5} \\ &= \frac{155000}{10^5} = \frac{155}{10^2} = 1.55 \\ \text{Execution time} &= \frac{I_C \times CPI}{f} = \frac{10^5 \times 1.55}{40 \times 10^6} \\ &= 3.87 \text{ msec.} \end{aligned}$$

23. (c)

| RAW    | WAR                   | WAW                   |
|--------|-----------------------|-----------------------|
| In-Out | Out-In                | Out-Out               |
|        | $I_3 \rightarrow I_2$ | $I_2 \rightarrow I_1$ |
|        | $I_4 \rightarrow I_2$ | $I_5 \rightarrow I_1$ |
|        | $I_4 \rightarrow I_3$ | $I_5 \rightarrow I_2$ |
|        | $I_5 \rightarrow I_4$ |                       |
| = 0    | = 4                   | = 3                   |

$$\text{Sum} = (0 + 4 + 3) = 7$$

24. (a)

| 1     | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|-------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| $I_4$ | IF | ID | EX | EX | WB |    |    |    |    |    |    |    |    |    |    |
| $I_4$ |    | IF | ID | ID | EX | EX | EX | WB |    |    |    |    |    |    |    |
| $I_2$ |    |    | IF | IF | ID | ID |    |    |    |    |    |    |    |    |    |
| $I_3$ |    |    |    |    | IF | IF | ID |    | EX |    |    | WB |    |    |    |
| $I_4$ |    |    |    |    |    |    | IF | IF | ID | ID | ID | EX | WB | WB |    |

It requires 16 clock cycles.

25. (78)

4 memory references  $\rightarrow$  1 Instruction

1000 memory references  $\rightarrow$  ? instructions.

$$\text{Number of instructions} = \frac{1000}{4} = 250$$

$$\begin{aligned} \left[ \frac{\# \text{ memory stalls}}{\# \text{ Instructions}} \right] &= \left[ \frac{\# \text{ misses } L_1}{\# \text{ Instructions}} \times \text{hit } L_2 \right] + \\ &\quad \left[ \frac{\# \text{ misses } L_2}{\# \text{ Instructions}} \times \text{miss penalty } L_2 \right] \\ &= \left[ \frac{150}{250} \times 50 \right] + \left[ \frac{100}{250} \times 120 \right] \\ &= [30 + 48] = 78 \text{ cycles} \end{aligned}$$

26. (d)

Throughput  $\times$  Number of stages  $\Rightarrow$  1 is true  
 Latency will be less for M-7  $\Rightarrow$  2 is true  
 Since, the throughput is higher for M-7, program will execute faster than that on M-5  $\Rightarrow$  3 is true.

27. (b)

Speed up  $\approx$  Nonpipe line execution time / Pipeline execution time.

Non pipeline Execution time

$$\begin{aligned} &= S1 \text{ delay} + S2 \text{ delay} + S3 \text{ delay} + S4 \text{ delay} \\ &= 5 + 6 + 11 + 8 = 30 \text{ ns} \end{aligned}$$

Pipeline Execution time

$$\begin{aligned} &\approx \text{Max. (stage delay)} + \text{buffer delay} \\ &= \text{max. (5 ns, 6 ns, 11 ns, 8 ns)} + 1 \text{ ns} \\ &= 11 \text{ ns} + 1 \text{ ns} = 12 \text{ ns} \end{aligned}$$

$$\text{Hence Speedup} = \frac{30}{12} = 2.5$$

28. (22.2)

Assume  $x$  % of instructions incur 3 stall cycles.

$$\text{Speedup} = \frac{\text{Pipeline depth}}{(1 + \# \text{stall/ Instruction})}$$

$$\begin{aligned} \text{Number of stall/Instructions} &= (\% \text{ Instruction don't} \\ &\text{incur stall} \times 0 + \% \text{ Instruction incur stall} \times 3) \\ &= (1 - x) \times 0 + x(3) = 3x \end{aligned}$$

$$\therefore 3 = \frac{5}{1+3x}$$

$$3 + 9x = 5$$

$$9x = 5 - 3 = 2$$

$$x = \frac{2}{9} = 0.222$$

$$\ln \% = 0.222 \times 100 = 22.2$$

29. (d)

$$10 + 8 + 6 + 9 + 5 = 38 \text{ ns in non-pipelined.}$$

30. (0.859)

$$S_x = \frac{\text{Pipeline depth}}{(1 + \text{Frequency} \times \#stalls per instruction)}$$

$$= \frac{5}{1 + (0.3 \times 4)} = 2.27$$

$$S_y = \frac{9}{1 + (0.3 \times 8)} = 2.64$$

$$\frac{S_x}{S_y} = 0.859$$

31. (0.083) [0.08-0.09]

$$\text{Speed up} = \frac{\text{Pipeline depth}}{(1 + \text{Branch frequency} \times \text{Branch penalty})} \geq 4$$

$$\frac{5}{1 + P \times 3} \geq 4$$

$$4 + 12P \leq 5$$

$$12P \leq 1$$

$$P \leq \frac{1}{12} = 0.083$$

32. (4)



35. (d)

| 1  | 2  | 3  | 4  | 5  | 6  | 7 | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
|----|----|----|----|----|----|---|----|----|----|----|----|----|----|
| IF | ID | EX | MA | WB |    |   |    |    |    |    |    |    |    |
|    | IF | ID | EX | MA | WB |   |    |    |    |    |    |    |    |
|    |    | IF | ID | S  | S  | S | EX | MA | WB |    |    |    |    |
|    |    |    | IF | ID | S  | S | S  | S  | S  | S  | EX | MA | WB |

S = Stall

14 cycles are required.

Since  $I_1$  and  $I_2$  are dependent. So  $I_2$  will enter into execute stage when  $I_1$  completes WB stage. It creates 2 nop. Similarly  $I_3$  and  $I_4$  are dependent. They create 2 nop.

Total 4 nop instructions are required to eliminate hazards.

33. (b)



RAW (In-Out) dependency between consecutive instruction is considered only.

34. (b)

$$ET_{\text{non-pipe}} = \text{Average CPI} \times \text{Cycle time (non-pipe)} \\ = 4 \times 0.33 \text{ nsec} = 1.65 \text{ nsec}$$

$$ET_{\text{pipe}} = \text{Average CPI}_{\text{pipe}} \times \text{Cycle time (pipe)} \\ = 1 \times 0.5 \text{ nsec} = 0.5 \text{ nsec}$$

$$\text{Speed-up} = \frac{ET_{\text{non-pipe}}}{ET_{\text{pipe}}}$$

$$= \frac{1.65}{0.5} = 3.3$$

36. (50)

$$\text{Speedup (S)} = \frac{\text{Pipeline depth}}{[1 + \# \text{stalls}/\text{Instruction}]}$$

$$\text{No. of stalls/Instruction} = (1 - x) \times 0 + x \times 2 = 2x$$

$$\text{Speedup (S)} = \frac{6}{1+2x}$$

$$3 = \frac{6}{1+2x}$$

$$3 + 6x = 6$$

$$6x = 6 - 3$$

$$x = \frac{3}{6} = 0.5$$

$$\text{In \%} = 0.5 \times 100 = 50$$

37. (6.75)

$$\begin{aligned}\text{Average access time for non-pipeline system} \\ &= (10 \times 0.4 + 10 \times 0.2 + 12 \times 0.4) \times 20 \text{ nsec} \\ &= (4 + 2 + 4.8) \times 20 \text{ nsec} \\ &= (10.8) \times 20 \text{ nsec} = 216 \text{ nsec}\end{aligned}$$

For pipeline system in every clock cycle one instruction will get executed and overhead of 12 nsec.

So average time =  $(20 \text{ nsec} + 12 \text{ nsec}) = 32 \text{ nsec}$

$$\text{Speedup} = \frac{\text{Non-pipe}}{\text{Pipeline}} = \frac{216 \text{ nsec}}{32 \text{ nsec}} = 6.75$$

38. (0.933)

$$\frac{\text{Pipeline depth}}{(1 + \% \text{Branch} \times \text{Branch penalty})} \leq \text{Speedup}$$

$$\Rightarrow \frac{6}{1 + (1 - X) \times 3} \leq 5$$

$$\Rightarrow [1 + (3 \times 3X)]5 \leq 6$$

$$5 + 15 - 15X \leq 6$$

$$-15X \leq 6 - 20$$

$$+15X \leq +14$$

$$\Rightarrow X \leq \frac{14}{15} = 0.933$$

39. (a)

|                       | <b>C<sub>1</sub></b> | <b>C<sub>2</sub></b> | <b>C<sub>3</sub></b> | <b>C<sub>4</sub></b> | <b>C<sub>5</sub></b> | <b>C<sub>6</sub></b> | <b>C<sub>7</sub></b> | <b>C<sub>8</sub></b> | <b>C<sub>9</sub></b> | <b>C<sub>10</sub></b> | <b>C<sub>11</sub></b> | <b>C<sub>12</sub></b> | <b>C<sub>13</sub></b> | <b>C<sub>14</sub></b> | <b>C<sub>15</sub></b> |
|-----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| <b>I<sub>1</sub></b>  | IF                   | ID                   | EX                   | MW                   | WB                   |                      |                      |                      |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>2</sub></b>  |                      | IF                   | ID                   | EX                   | MW                   | WB                   |                      |                      |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>3</sub></b>  |                      |                      | IF                   | ID                   | EX                   | MW                   | WB                   |                      |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>4</sub></b>  |                      |                      |                      | IF                   | ID                   | EX                   | MW                   | WB                   |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>5</sub></b>  |                      |                      |                      |                      | IF                   | ID                   | EX                   | MW                   | WB                   |                       |                       |                       |                       |                       |                       |
| <b>I<sub>6</sub></b>  |                      |                      |                      |                      |                      | IF                   | ID                   | EX                   |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>7</sub></b>  |                      |                      |                      |                      |                      |                      | IF                   | ID                   |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>8</sub></b>  |                      |                      |                      |                      |                      |                      |                      | IF                   |                      |                       |                       |                       |                       |                       |                       |
| <b>I<sub>12</sub></b> |                      |                      |                      |                      |                      |                      |                      |                      | IF                   |                       |                       |                       |                       |                       |                       |
| <b>I<sub>13</sub></b> |                      |                      |                      |                      |                      |                      |                      |                      |                      | IF                    |                       |                       |                       |                       |                       |
| <b>I<sub>14</sub></b> |                      |                      |                      |                      |                      |                      |                      |                      |                      |                       | IF                    |                       |                       |                       |                       |
| <b>I<sub>15</sub></b> |                      |                      |                      |                      |                      |                      |                      |                      |                      |                       |                       | IF                    |                       |                       |                       |
| <b>I<sub>16</sub></b> |                      |                      |                      |                      |                      |                      |                      |                      |                      |                       |                       |                       | IF                    |                       |                       |

So total 13 instruction are executed.

$$K = 5, n = 13, t_p = 10 \text{ nsec}$$

$$ET = (K + n - 1)t_p$$

$$= (5 + 13 - 1)10 \text{ nsec} = 17 \times 10 = 170 \text{ nsec}$$

**CS**

Objective Practice Sets

# 4

## CHAPTER

# Memory Hierarchy Design

1. Spatial locality refers to the problem that once a location is referenced
  - (a) it will not be referenced again
  - (b) it will be referenced again
  - (c) a nearby location will be referenced soon
  - (d) none of the above
2. The principle of locality justifies the use of
  - (a) interrupts      (b) threads
  - (c) DMA                (d) cache memory
3. A system which has lot of crashes, data should be written to the disk using
  - (a) write-through
  - (b) write-back
  - (c) both of the above
  - (d) none of the above
4. Cache memory enhances
  - (a) memory capacity
  - (b) memory access time
  - (c) effective memory access time
  - (d) secondary storage access time
5. The average memory access time for a machine with a cache hit rate of 90% where the cache access time is 10 ns and the memory access time is 100 ns is
  - (a) 55 ns                (b) 45 ns
  - (c) 90 ns                (d) 19 ns
6. Which of the following lists memory types from highest to lowest speed?
  - (a) secondary storage, main memory, cache, registers
  - (b) registers, cache, secondary storage, main memory
  - (c) registers, cache, main memory, secondary storage
  - (d) cache, registers, main memory, secondary storage
7. According to temporal locality, processes are likely to reference pages that \_\_\_\_\_.
  - (a) have been referenced recently
  - (b) are located at address near recently referenced pages in memory
  - (c) have been preloaded into memory
  - (d) none of the above
8. Given below are some statements associated with cache memory. Identify the correct statement.
  - (a) The Level 1 cache is always faster than the Level 2 cache.
  - (b) The Level 2 cache is used to mitigate the dynamic slowdown every time a Level 1 cache miss occurs.
  - (c) Level 2 cache comes as on board only.
  - (d) In modern day computers, the Level 2 cache is considered an internal cache.
9. Given below are some statements associated with computer memory. Identify the correct statement.
  - (a) Dynamic random access memory is faster than static random access memory.
  - (b) Main memory is used by the processor to store primary active programs and data.
  - (c) The cache memory is of dynamic random access memory.
  - (d) The production cost of dynamic random access memory is higher than that of static random access memory.
10. In caching system, the memory reference made in any short time interval tend to use only a small fraction of the total memory is called \_\_\_\_\_.
  - (a) checker boarding
  - (b) locality principle
  - (c) memory interleaving
  - (d) none of the above

11. Which of the following statements is false about dynamic RAM?
- Power consumption of dynamic RAMs is more than that of static RAMs.
  - Gate density of dynamic RAMs is higher than that of static RAMs.
  - Dynamic RAMs require a memory refresh cycle after each memory read cycle.
  - Main memory of a general purpose computer system is usually built using dynamic RAMs.
12. Which of the following statements is false with regard to fully associative and direct mapped cache organizations?
- Direct mapped caches do not need a cache block replacement policy, whereas fully associative caches do.
  - In a fully associative cache, the full address is stored alongside the data block.
  - A direct mapped cache is organized according to an 'index', and the 'tag' part of the memory address is stored alongside the data block.
  - Direct mapped caches may produce more misses if programs refer to memory words that occupy a single tag value.
13. In a two level memory hierarchy, the access time of the cache memory is 12 nanoseconds and the access time of the main memory is 1.5 microseconds. The hit ratio is 0.98. What is the average access time of the two level memory system?
- 13.5 nsec
  - 42 nsec
  - 7.56 nsec
  - 41.76 nsec
14. Consider the following organization of main memory and cache memory.
- Main memory: 64 K x 16  
 Cache memory: 256 x 16  
 Memory is word-addressable and block size is of 8 words. Determine the size of tag field if direct mapping is used for transforming data from main memory to cache memory.
- 5 bits
  - 6 bits
  - 7 bits
  - 8 bits
15. Match List-I with List-II using memory hierarchy from top to down and select the correct answer using the codes given below the lists:
- | List-I                 | List-II      |
|------------------------|--------------|
| A. Size                | 1. Increases |
| B. Cost/bit            | 2. Decreases |
| C. Access time         | 3. No change |
| D. Frequency of access |              |
- Codes:**
- | A     | B | C | D |
|-------|---|---|---|
| (a) 1 | 1 | 2 | 2 |
| (b) 2 | 1 | 2 | 1 |
| (c) 2 | 2 | 1 | 1 |
| (d) 1 | 2 | 1 | 2 |
16. A computer system has 4K-word cache organized in a block-set-associative manner, with 4 blocks per set, 64 words per block. Memory is word addressable. The number of bits in the SET and WORD fields of the main memory address format is
- 15, 4
  - 6, 4
  - 7, 6
  - 4, 6
17. If the memory clip size is  $256 \times 1$  bits, then how many clips are required to make up 1 KB of memory?
- 8
  - 16
  - 24
  - 32
18. A certain computer system design has a single CPU, a two-level cache, and supports memory mapped I/O for output-only controllers. Which of the following is true?
- The design is impossible, since memory mapped I/O will prevent cache coherence.
  - In two-level caches, the  $L_1$  cache is generally built from SRAM.
  - In two-level caches the  $L_1$  cache is generally larger than  $L_2$  cache.
  - In two-level caches, the  $L_2$  cache generally has a lower latency than the  $L_1$  cache.
19. Assume a fully associative write-back cache with many cache entries that starts empty. We define a sequence of memory operations. The address or locations is in square brackets.

$S_1$  : Write  $M[100]$ ; $S_2$  : Write  $M[100]$ ; $S_3$  : Read  $M[200]$ ; $S_4$  : Write  $M[200]$ ; $S_5$  : Write  $M[100]$ ;

What are the number of hits when using no write allocate versus write allocate?

- (a) 2, 1                   (b) 2, 2  
 (c) 1, 3                   (d) 4, 1

20. Suppose a CPU contains 1000 memory references there are 40 misses in  $L_1$  cache (First Level Cache) and 20 misses in the  $L_2$  cache (Second Level Cache). Assume miss penalty from the  $L_2$  cache to memory is 100 clock cycles the hit time of  $L_2$  cache is 10 clock cycles, the hit time of  $L_1$  cache is 1 clock cycle and there are 1.5 memory references per instruction.

What is the average memory access time?

- (a) 3.4 clock cycles  
 (b) 3.5 clock cycles  
 (c) 5.3 clock cycles  
 (d) 1.8 clock cycles

21. Consider the following fragment of code:

```
for (i = 0 ; i <= 100 ; i++)
  {A[i] = B[i] + C}
```

Assume  $A$  and  $B$  are arrays of 64-bit integers and  $C$  and  $i$  are 64-bit integers. Assume that all data values and their addresses are kept in memory (at addresses 0, 5000, 1500 and 2000 for  $A$ ,  $B$ ,  $C$  and  $i$  respectively) except when they are operated on. Assume that the values in registers are also between iterations of the loop. Also assume instructions are 4 byte in size.

How many instructions are required dynamically?

- (a) 1618                   (b) 400  
 (c) 512                   (d) 1024

22. A 2 level memory has average access time 30 ns with cache and memory access time as 20 ns and 150 ns respectively. What is the hit ratio?

- (a) 80%                   (b) 93%  
 (c) 70%                   (d) 99%

23. The access time of cache is 100  $\mu$ s, the access time of main memory is 90  $\mu$ s, and hit ratio is 95%, then access efficiency related to cache is  
 (a) 0.9569               (b) 0.9869  
 (c) 0.9469               (d) 0.9969

24. Let access times of cache is  $10^{-8}$ . Time required for block access is  $10^{-6}$  and hit ratio is 0.9000. The average time for the CPU to access a word in two level memory is given by  
 (a)  $0.11 \times 10^{-6}$      (b)  $0.91 \times 10^{-6}$   
 (c)  $1.009 \times 10^{-6}$      (d)  $1.001 \times 10^{-6}$

25. The percentage of times that a page number is found in the associative register is called the hit ratio.  
 Assume it takes 50 nanoseconds to search the associative registers and 100 nanoseconds to access main memory. What is the percentage of slowdown in memory-access time if the hit ratio is 90%?  
 (a) 24                   (b) 40  
 (c) 45                   (d) 60

26. Find the average access time experienced by the CPU in a system with two levels of caches.

if  $h_1$  : is the hit rate in the primary cache  
 $h_2$  : is the hit rate in the secondary cache  
 $c_1$  : is the time to access information in the primary cache  
 $c_2$  : is the time to access information in the secondary cache  
 $M$  : is the time to access information in the main memory  
 (a)  $h_1 c_1 + h_2 c_2 + (1 - h_1 h_2)M$   
 (b)  $h_1 c_1 - h_2 c_2 + (1 - h_1 h_2)M$   
 (c)  $h_1 c_1 + (1 - h_2)h_1 c_2 + (1 - h_1)(1 - h_2)M$   
 (d)  $h_1 c_1 + (1 - h_1)h_2 c_2 + (1 - h_1)(1 - h_2)M$

27. A computer system has 4K word cache organized in a block-set-associative manner, with 4 blocks per set, 64 words per block. The number of bits in the SET and WORD fields of the main memory address format is

- (a) 15, 4                   (b) 6, 4  
 (c) 7, 2                   (d) 4, 6

28. A computer has a cache, main memory; and a disk used for virtual memory. If a referenced word is in the cache, 10 ns are required to access it. If it is main memory but not in the cache, total 200 ns are needed to load it into cache, and then the reference is started again. If the word is not in main memory, total 500 ms are required to fetch the word from the disk, followed by 200 ns to copy it to the cache, and then the reference is started again. The cache hit ratio is 99%, and the main memory hit ratio is 90%. What is the average time in ns to access a referenced word on this system? Ignore the time it takes to determine whether or not a reference may be found in cache or main memory. The hit ratio is the percent of the time a reference is found in the given level of the memory hierarchy.
- (a) 732000 ns      (b) 485750 ns  
 (c) 500012 ns      (d) None of these
29. Let a cache has 128 lines each with 16 words and is 2-way set associative. Suppose the main memory has 16 bit address. Both cache and memory is word addressable. How many bits are in the "tag" field in the cache?
- (a) 6      (b) 8  
 (c) 5      (d) 7
30. In a general purpose computer system the CPU, the main memory and the cache may be interconnected via one or more shared system bus(es). However, input/output devices (eg. Hard disk, network interfaces) may only be connected to the system bus through an I/O controller. The following are four statements regarding the requirement for an I/O controller.
1. The capacities of I/O devices are magnitude order larger than that of main memory and hence direct interfacing is impossible.
  2. The response times of I/O devices are magnitude order slower than that of CPU and hence direct interfacing is impossible.
  3. It is always better to off load the I/O processing to a secondary processor on the I/O controller board then to depend on the primary CPU for I/O processing.
  4. The variety of I/O devices in the market requires that a separate I/O controller exist for each device.
- What statement(s) best explain the requirement for an I/O controller?
- (a) Only 1 is true  
 (b) Only 3 is true  
 (c) Only 2 and 3 are true  
 (d) Only 2, 3 and 4 are true
31. The following are three statements about the "locality of reference" principle used in memory systems.
1. Says it is more likely that an already accessed memory location is accessed further, and, it is more likely that surrounding (adjacent) memory locations will also be accessed.
  2. An observation equally widely used to implement virtual memory and cache memory systems.
  3. Says that the bytes of a word be interleaved on several physical memory modules for better performance.
- Which of them is/are true?
- (a) Only 1 is true      (b) Only 2 is true  
 (c) Only 3 is true      (d) Only 1 and 2 are true
32. A hierachal cache has cache takes 80 n.s, to access the index field of each set and 10 n.s to access the tag field with in a set, Memory access time of 500 n.sec, 80% of memory references are reads and 20% for writes. Hit ratio is 0.9 for read access and 0.80 for write access. What is the average access time if write through policy is used?
- (a) 212 n.sec      (b) 183.6 n.sec  
 (c) 158 n.sec      (d) None of these
33. The access time of a cache memory is 50 ns and that of main memory is 500 ns. It is estimated that 80 percent of memory requests are for read and remaining for write. The hit ratio for read access only is 0.9, and for write is 0.8. What is the average access time if write through policy is used?
- (a) 75 ns      (b) 90 ns  
 (c) 95 ns      (d) 140 ns

34. In a Fully Associative Cache Memory consisting of 256 cache lines of 16 Bytes each, a Tag field is of 14 Bits. Determine the Size of Cache Memory and Main Memory  
(a) 2 KB and 128 KB (b) 4 KB and 256 KB  
(c) 8 KB and 1 MB (d) None of these

35. A cache has a hit rate of 95 percent, 128-byte lines, and a cache hit latency of 5 ns. The main memory takes 100 ns to return the first word (32 bits) of a line, and 10 ns to return each subsequent word.

What is  $T_{miss}$  for this cache? (Assume that the cache waits until the line has been fetched into the cache and then reexecutes the memory operation, resulting in a cache hit. Neglect the time required to write the line into the cache once it has been fetched from the main memory. Also assume that the cache takes the same amount of time to detect that a miss has occurred as to handle a cache hit)

- (a) 410 ns (b) 420 ns  
(c) 240 ns (d) 540 ns

36. Consider a small two-way set-associative cache memory, consisting of four blocks. For choosing the block to be replaced, use the least recently used (LRU) scheme. The number of cache misses for the following sequence of block addresses is 8, 12, 0, 12, 8

- (a) 2 (b) 3  
(c) 4 (d) 5

37. The cache can hold 512 KB. Data is transferred between main memory and cache in blocks of 32 byte each. The main memory consist 4 Gbyte. If the cache memory is 2 way set associative then the hexadecimal main memory address  $(ABABABAB)_H$  is mapped to which cache set?  
(a) 1 B 5 B (b) 1 D 5 B  
(c) 1 D 5 D (d) 1 B 5 D

38. A cache has 64 KB capacity, 128 byte line, and is 4-way set-associative. The system containing the cache uses 32-bit addresses. If the cache is write-through, \_\_\_\_\_ bits are required for each entry in the tag array, and total storage required is \_\_\_\_\_ for the tag array if an LRU replacement policy is used?

- (a) 20,974 21,456  
(b) 10,572 11,264  
(c) 10,572 21,456  
(d) 20,974 11,264

39. Consider a 4-way set associative cache consisting of 120 lines with a line size of 64 words. The CPU generates a 20-bit address of a word in main memory. The number of bits in the TAG, LINE and WORD fields are respectively

- (a) 9, 6, 5 (b) 7, 7, 6  
(c) 7, 5, 8 (d) 9, 5, 6

40. Consider a direct mapped cache of size 32 KB with block size 32 bytes. The CPU generates 32 bit addresses. The number of bits needed for cache indexing and the number of tag bits are respectively

- (a) 10, 17 (b) 10, 22  
(c) 15, 17 (d) 5, 17

41. A cache contains  $n$  blocks and main memory contains  $m$  blocks. If  $k$  way set associative mapping is used than what will be number of TAG bits

- (a)  $\log_2 \frac{mk}{n}$  (b)  $\log_2 \frac{m}{n}$   
(c)  $\log_2 \frac{nk}{m}$  (d)  $\log_2 \frac{mn}{k}$

42. Which of the following is true in case of lockup free cache?

1. A cache structure allows a CPU to access a cache while a hit is being serviced.
  2. A cache structure allows a CPU to access a cache while a miss is being serviced.
  3. A cache that can support multiple outstanding misses.
  4. A cache that can support multiple outstanding hits.
- (a) Only 1 and 4 (b) Only 2 and 3  
(c) Only 1 (d) Only 2

43. The mapping function is implemented using the address. For purpose of cache access, each main memory address can be viewed as consisting of three fields.

The least significant ' $w$ ' bits identify a unique word or byte within a block of main memory. The remaining ' $s$ ' bits specify one of the  $2^s$  blocks of main memory. The cache logic interprets these ' $s$ ' bits as a tag of ' $s - r$ ' bits and a line field of ' $r$ ' bits. With the above information, the size of tag equals to \_\_\_\_\_.

- (a)  $r$  bits      (b)  $(s + w)$  bits  
 (c)  $s$  bits      (d)  $(s - r)$  bits

44. Consider a computer system that has a cache with 512 blocks, each of which can store 32 bytes. All addresses are byte addresses.

Which cache set will the memory address  $0 \times FBFC$  map to if the cache is 8-way set associative?

- (a) 1 D      (b) 1 F  
 (c) 2 C      (d) 3 E

45. Consider a very small 2-way set-associative cache with a total of 2 sets and a block size of 16 bytes. The cache uses an LRU replacement policy. Assume that the cache is initially empty. The CPU accesses the following memory locations, in that order: C88H, 774H, C00H, 784H, C80H, 718H, 738H, 770H, 774H. All addresses are byte addresses. (H: Hexadecimal). No random order is considered.

The total number of page replacement in the overall operation will be

- (a) 0      (b) 1  
 (c) 2      (d) 3

46. Consider a 2-way set associative cache memory with 4 sets and total 8 cache blocks (0-7) and a main memory with 128 blocks (0-127). What memory blocks will be present in the cache after the following sequence of memory block references if LRU policy is used for cache block replacement.

Assuming that initially the cache did not have any memory block from the current job?

0 5 3 9 7 0 16 55

- (a) 0 3 5 7 16 55      (b) 0 3 5 7 9 16 55  
 (c) 0 5 7 9 16 55      (d) 3 5 7 9 16 55

47. Consider a machine with a byte addressable main memory of  $2^{16}$  bytes. Assume that a direct mapped data cache consisting of 32 lines of 64 bytes each is used in the system. A  $50 \times 50$  two-dimensional array of bytes is stored in the main memory starting from memory location 1100 H. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses.

Which of the following lines of the data cache will be replaced by new blocks in accessing the array

- (a) line 4 to line 11      (b) line 4 to line 12  
 (c) line 0 to line 7      (d) line 0 to line 8

48. A CPU has a cache with block size 64 bytes. The main memory has  $k$  blocks, each block being  $c$ -bytes wide. Consecutive  $c$ -byte chunks are mapped on consecutive blocks with warp-around. All the  $k$  blocks can be accessed in parallel, but two accesses to the same block must be serialized. A cache block access may involve multiple iterations of parallel block accesses depending on the amount of data obtained by accessing all the  $k$  blocks in parallel. Each iteration requires decoding the block numbers to be accessed in parallel and this takes  $k/2$  ns. The latency of one block access is 80 ns.

If  $c = 2$  and  $k = 24$ , then latency of retrieving a cache block starting at address zero from main memory is

- (a) 92 ns      (b) 104 ns  
 (c) 172 ns      (d) 184 ns

49. A cache line is 64 bytes. The main memory has latency 32 ns and bandwidth 1 G.Bytes/s. The time required to fetch the entire cache line from the main memory is

- (a) 32 ns      (b) 64 ns  
 (c) 96 ns      (d) 128 ns

50. Consider two cache-organizations: the first one is 32 kB 2-way set associative with 32 byte block size. The second one is of the same size but direct mapped. The size of an address is 32

bits in both cases. A 2-to-1 multiplexer has latency of 0.6 ns. While a k-bit comparator has a latency of  $k/10$  ns. The hit latency of the set-associative organization is  $h_1$  and that of direct mapped one is  $h_2$ . Find  $h_1/h_2$ .

51. Suppose that a cache is 20 times faster than main memory and cache memory can be used 80% of the time. The speed-up factor that can be achieved by using the cache is \_\_\_\_\_.  
(a) 2.16      (b) 3.16  
(c) 4.20      (d) 4.16
52. Consider the following statements:  
S1 : Compulsory miss can be reduced.  
S2 : Conflict miss can be reduced.  
S3 : Capacity miss can be reduced.  
Which of the above statements are True?  
(a) Only S1      (b) Only S1 and S2  
(c) Only S2 and S3 (d) S1, S2 and S3
53. Consider the hypothetical processor which support 512 k words memory. It uses the memory mapped input output configuration. In which when 2 MSB bits of address are 1 then assigned to input output port. How many number of input output port address and memory addresses are possible in the processor respectively?  
(a)  $1 \times 2^{17}$ ,  $3 \times 2^{17}$  (b)  $3 \times 2^{17}$ ,  $1 \times 2^{17}$   
(c)  $2 \times 2^{17}$ ,  $1 \times 2^{17}$  (d) None of these
54. Consider a system with the main memory access time as 200 ns and cache access time as 10 ns. Hit ratio for read request is 0.8 and 80% of the memory requests are for read. If write through policy is used, then the average time considering both read and write requests is \_\_\_\_\_.  
(a) 169.6 ns      (b) 192.4 ns  
(c) 78.4 ns      (d) None of these
55. A computer has a cache, main memory and a hard disk used for virtual memory. If referenced word is in cache, 20 ns are required to access it. If it is in main memory but not in cache 60 ns are needed to load it into cache and then reference is started again. If word is not in main memory, 12 ms are required to fetch the word from disk

followed by 60 ns to copy into cache, the reference is started again. The cache hit ratio is 0.9 and main memory hit ratio is 0.6. The average time in nano seconds required to access a referenced word on this system is \_\_\_\_\_.

56. Consider the machine with a byte addressable main memory of  $2^{16}$  byte, block size of 16 byte and a 2 way set associative mapped cache having  $2^{10}$  lines. Suppose there are two bytes in main memory i.e. first byte  $[E\ 01\ F]_{16}$  and second byte  $[E\ 208]_{16}$  respectively then the difference of the set value (in decimal) between given two bytes i.e. (SET value of second byte – SET value of 1<sup>st</sup> byte) is \_\_\_\_\_.
57. Consider two cache organizations. The first one is 64 kB way associative with 64 byte block size. The second one is of the 64 kB direct mapped cache. The size of an address is 32 bits in both organization. A 4 to 1 multiplexer has latency of 0.8 ns which k bit comparator has latency of  $k/5$  nsec. The difference between the hit latencies of both cache organization (i.e. associative hit latency – direct mapped hit latency) (in nsec) is \_\_\_\_\_.
58. Let cache of 0.7 hit having average access time 9 times faster than that of memory. If average access time increase 40% from 80 ns. What would be the new hit ratio  
(a) 0.72      (b) 0.36  
(c) 0.53      (d) 0.80
59. Consider a machine with 4-way set associative data cache of size 32 Kbytes and block size 8 byte. The cache is managed using 32 bit virtual addressed and page size is 5 Kbytes. What is the total size of the tags in the cache directory is \_\_\_\_\_ (in K bits).
60. Suppose that a processor has access to two levels of memory. Level 1 contain 1000 words and has an access time of 0.01  $\mu$ sec. Level 2 contains 1,00,000 words and has an access time of 0.1  $\mu$ sec. Assume that if a word to be accessed is in level 1 then the processor accesses it directly.

If it is in level 2 then the word is first transferred to level 1 and then accessed by the processor. Suppose 95% of the memory accesses are found in the cache. What is the average access time.

- (a) 0.025  $\mu$ sec (b) 0.015  $\mu$ sec  
(c) 0.125  $\mu$ sec (d) 0.115  $\mu$ sec

61. The main memory of a computer has  $2^{XY}$  blocks while the cache has  $2^X$  blocks. If the cache uses the set associative mapping scheme with 2 blocks per set, then block K of the main memory maps to the \_\_\_\_\_ set.

- (a)  $(K \bmod Y)$  of the cache  
(b)  $(K \bmod XY)$  of the cache  
(c)  $(K \bmod X)$  of the cache  
(d)  $(K \bmod 2XY)$  of the cache

62. The cache can hold 64 KB. Data is transferred between main memory and cache in blocks of 4 Bytes each. The main memory consists of 16 M Bytes. If the cache memory is 16-way set associative, then the Hexa decimal main memory address AAAAAA is mapped into which set of cache?

- (a) 1 AA (b) 2 AA  
(c) 0 AA (d) 3 AA

63. Suppose after analyzing a new cache design, you discover that the cache has far too many conflict misses and this needs to be resolved. You know that you must increase associativity in order to decrease the number of cache misses.

What are the implications of increasing associativity?

- (a) Slower cache access time  
(b) Increase index bits  
(c) Increase block size  
(d) All of these

64. A processor has 8 bit physical address space and a physically addressed cache. Memory is byte addressable. The cache uses the LRU replacement policy. The processor supplies the following sequence of addresses to cache.

The cache is initially empty. The hit/miss outcome is shown below.

| Address | Outcome |
|---------|---------|
| 0       | Miss    |
| 2       | Hit     |
| 4       | Miss    |
| 128     | Miss    |
| 0       | Hit     |
| 128     | Hit     |
| 64      | Miss    |
| 4       | Hit     |
| 0       | Miss    |
| 32      | Miss    |
| 64      | Hit     |

Which of the following is the possible values of block size (in bytes) and associativity?

- (a) 2, 4 (b) 128, 4  
(c) 4, 128 (d) 4, 2

65. Consider a 4 way set associative cache consisting of 128 lines with a line size of 64 words. The CPU generates 20 bit addresses for words in main memory.

Match the following main memory addresses with the set number.

List-I List-II

- |                             |       |
|-----------------------------|-------|
| A. 11110101 1010 0011 1111  | 1. 16 |
| B. 0000 1111 1100 0011 0011 | 2. 10 |
| C. 1100 1100 1100 1100 0011 | 3. 8  |
| D. 1010 1010 1010 1010 1010 | 4. 19 |

Codes:

- |       |   |   |   |
|-------|---|---|---|
| A     | B | C | D |
| (a) 3 | 1 | 4 | 2 |
| (b) 3 | 1 | 2 | 4 |
| (c) 3 | 2 | 4 | 1 |
| (d) 2 | 3 | 1 | 4 |

66. Which of the following is true about TLB?

- (i) Tag entry in TLB contains a virtual page number and each data entry of the TLB holds a physical page number.  
(ii) TLB misses can be handled either in hardware or in software.  
(iii) TLB miss or a page fault requires using the exception mechanism.  
(a) Only (i) and (ii)  
(b) Only (ii) and (iii)  
(c) Only (iii)  
(d) All of these

67. Consider the direct mapped cache organization which consists of  $m$ -lines with a line size of  $2^w$  words/bytes. Main memory address can be viewed as consisting of three fields. The least significant  $w$ -bits identify a unique word within the block of main memory. The remaining 'S' bits specify one of the  $2^S$  block of main memory. Assume that cache is initially empty. Main memory blocks are referenced by the CPU in a sequential order.

Which one of the following sequence of the blocks are mapped on the cache lines in sequential order from the initial line respectively.

- (a)  $2m + 1, 2m + 2, \dots, 3m - 1$
- (b)  $2^S - m, 2^S - m+1, \dots, 2^S - 1$
- (c)  $1, 2, 3, \dots, m - 1$
- (d)  $0, 2, 4, \dots, (m - 1)$

68. A cache block has 64 kbyte. The main memory has latency 64  $\mu$ sec and bandwidth 1 GBps. What is the total time required to fetch the entire cache block from the main memory (in  $\mu$ sec,  $1 \text{ G} = 10^9$ )?

- (a) 125
- (b) 128
- (c) 135
- (d) None of these

69. A computer has 170 different operations. Word size is 4 bytes and each instruction requires one word and are two address instructions. One address for register and one address for memory. If there are 37 registers then the memory size would be?

- (a) 1 MB
- (b) 32 MB
- (c) 8 MB
- (d) 256 KB

**Answers****Memory Hierarchy Design**

1. (c)
2. (d)
3. (b)
4. (c)
5. (d)
6. (c)
7. (a)
8. (b)
9. (b)
10. (b)
11. (a)
12. (d)
13. (d)
14. (d)
15. (d)
16. (d)
17. (d)
18. (b)
19. (c)
20. (a)
21. (a)
22. (b)
23. (a)
24. (a)
25. (d)
26. (d)
27. (d)
28. (c)
29. (a)
30. (a)
31. (d)
32. (a)
33. (c)
34. (b)
35. (b)
36. (c)
37. (c)
38. (b)
39. (b)
40. (a)
41. (a)
42. (b)
43. (c)
44. (b)
45. (c)
46. (c)
47. (a)
48. (d)
49. (c)
50. (d)
51. (d)
52. (d)
53. (a)
54. (c)
55. (b)
56. (c)
57. (b)
58. (c)
59. (a)
60. (b)
61. (c)
62. (b)
63. (a)
64. (d)
65. (a)
66. (d)
67. (b)
68. (b)

**Explanations****Memory Hierarchy Design**

3. (b)

System should write data immediately.

5. (d)

$$\begin{aligned}\text{Average memory access time} &= \text{Hit Ratio} \times \text{Cache access time} + \text{Miss Ratio} \times \text{Memory access time} \\ &= 0.90 \times 10 \text{ ns} + 0.10 \times 100 \text{ ns} \\ &= 9 \text{ ns} + 10 \text{ ns} = 19 \text{ ns}\end{aligned}$$

10. (b)

It is the definition of locality principle.

13. (d)

$$\begin{aligned}\text{Average access time} &= \text{Hit Ratio} \times \text{Cache access time} + \text{Miss Ratio} \times \text{Memory access time}\end{aligned}$$

Here, Memory access time =  $1.5 \mu\text{s}$  i.e.  $1500 \text{ ns}$

So, Average access time

$$\begin{aligned}&= 0.98 \times 12 \text{ ns} + 0.02 \times 1500 \text{ ns} \\ &= 11.76 \text{ ns} + 30 \text{ ns} = 41.76 \text{ ns}\end{aligned}$$

14. (d)

Memory address: 16 bits

Number of words in cache =  $256 = 2^8$ .



16. (d)

There are 64 words in a block. So the 4k cache has  $(4 \times 1024)/64 = 64$  blocks. Since 1 set has 4 blocks, there are 16 sets. 16 sets needs 4 bits for representation. In a set there are 4 blocks, which needs 2 bits. Each block has 64 words. So the word field has 6 bits.

17. (d)

$$\begin{aligned}\text{Clip size} &= 256 \times 1 \text{ bits} \\ &= 2^8 \text{ bits} = 2^5 \text{ bytes}\end{aligned}$$

For 1 KB Memory,  $\frac{2^{10}}{2^5} = 2^5 = 32$  clips will be required.

19. (c)

For no write allocate, the second or write M[200] is a hit so is in cache so only 1 hit. For write allocate  $S_2, S_4, S_5$  are hit so 3 hits.

So pair contains 1, 3.

20. (a)

Average Memory Access time = Hit Time  $L_1$  + Miss rate  $L_1 \times (\text{Hit time } L_2 + \text{Miss rate } L_2 \times \text{Miss Penalty } L_2)$

$$= 1 + 4\% (10 + 50\% \times 100) \left[ \text{Miss rate } L_1 = \frac{40}{1000} \times 100 = 4\% \right]$$

$$= 1 + 4\% \times 60 \left[ \text{Miss rate } L_2 = \frac{20}{40} \times 100 = 50\% \right]$$

= 3.4 clock cycles

21. (a)

$$\begin{aligned}\text{Instruction executed} &= 2 + (16 \times 101) \\ &= 2 + 1616 = 1618\end{aligned}$$

22. (b)

$$T_c = 20 \text{ ns}$$

$H$ : Hit ratio

$$T_m = 150 \text{ ns}$$

$$T_{avg} = T_c \times H + (1 - H) \times (T_c + T_m)$$

$$\text{or, } 30 = 20 \times H + (1 - H) \times 170$$

$$\therefore H = 93\%$$

23. (a)

$$T_1 = 100 \mu\text{s}, T_2 = 90 \mu\text{s}, H = 95\%$$

$$\therefore \text{Access efficiency} = \frac{T_1}{T_s}$$

$$= \frac{1}{1 + (1 - H) \frac{T_2}{T_1}} = \frac{1}{1 + (1 - 0.95) \frac{90 \mu\text{s}}{100 \mu\text{s}}}$$

$$= \frac{1}{1 + 0.05 \times 0.9} = 0.9569$$

24. (a)

$$\begin{aligned}t_A &= t_{A_1} + (1 - H)t_B \\ &= 10^{-8} + (1 - 0.9000)10^{-6} \\ t_A &= 0.11 \times 10^{-6}\end{aligned}$$

$t_{A_1}$  = access time of  $M_1$  or cache is  $10^{-8}$

$t_B$  = Time required for the block transfer is called the block access or block transfer time

$$t_B = 10^{-6}$$

$$\text{Hit ratio} = H = 0.9000$$

By using above formula

$$t_A = 0.11 \times 10^{-6}$$

25. (d)

Effective access time = (hit ratio) \* (memory access time + search time in associative registers) + (fail ratio) \* (search time in associative registers) =  $2 * \text{Memory access time} = (0.90 * 150) + (0.10 * 250) = 160$  nanoseconds.

26. (d)

The secondary cache can be considerably slower, but it should be much larger to ensure a high hit rate. Its speed is less critical, because it only affects the miss penalty of the primary cache. A workstation computer may include a primary cache with the capacity of tens of kilobytes and a secondary cache of several megabytes.

$$t_{ave} = h_1 C_1 + (1 - h_1) h_1 C_2 + (1 - h_1)(1 - h_2)M$$

The number of misses in the secondary cache, given by the term  $(1 - h_1)(1 - h_2)$ .

27. (d)

There are 64 words in a block. So the 4 K cache has  $(4 \times 1024) / 64 = 64$  blocks. Since 1 set has 4 blocks, there are 16 sets. 16 sets needs 4 bits for representation. In a set there are 4 blocks, which need 2 bits. Each block has 64 words. So, the word field has 6 bits.

28. (c)

| Location of referenced word      | Probability           | Total time for access in ns             |
|----------------------------------|-----------------------|-----------------------------------------|
| In cache                         | 0.99                  | 10                                      |
| Not in cache, but in main memory | $(0.01)(0.9) = 0.009$ | $200 + 10 = 210$                        |
| Not in cache or main memory      | $(0.01)(0.1) = 0.001$ | $500 \text{ ms} + 200 + 10 = 500000210$ |

So the average access time would be:

Average access time

$$= (0.99)(10) + (0.009)(210) + (0.001)(500000210) \\ = 500012 \text{ ns}$$

29. (a)

|             |             |              |
|-------------|-------------|--------------|
| Tag(6 bits) | Set(6 bits) | Word(4 bits) |
|-------------|-------------|--------------|

32. (a)

Average access time

$$= 0.80 \times [0.9 \times 90 + 0.1 \times (90 + 500)] + 0.20 \times 500 \\ = 212 \text{ nsec}$$

33. (c)

|             | $T_{\text{acc}}$ (ns) | Read hit | Write hit |
|-------------|-----------------------|----------|-----------|
| Cache       | 50                    | 90%      | 80%       |
| Main Memory | 500                   | 10%      | 20%       |

Given 80% of memory requests are for read 20% of memory requests are for write. The read access time of the system considering only memory read cycle

$$= \text{Access time cache} + \text{access time for main memory} \\ = (0.9 \times 50) + (0.1 \times 500) = 95 \text{ ns}$$

34. (b)

$$\text{Size of Cache Memory} = 256 \times 16 \\ = 4096 \text{ bytes} = 4 \text{ KB}$$

Now, tag field is of 14 bits, that means the main memory consists of  $2^{14}$  possible cache lines.

$$\text{Size of Cache Memory} = 2^{14} \times 16 \\ = 16 \text{ KB} \times 16 \text{ Bytes} = 256 \text{ KB}$$

35. (b)

The cache has 128-byte lines, which is 32 words. Using the specified memory timings, the cache spends 5 ns detecting that a miss has occurred, and  $100 \text{ ns} + 31(10 \text{ ns}) = 410 \text{ ns}$  fetching a line

from the main memory into the cache. The cache then reexecutes the operation that caused the miss, taking 5 ns. This gives a cache miss time of 420 ns.

36. (c)

2 way set associative cache memory with 4 blocks LRU policy is used.

Number of blocks = 4

Number of sets =  $\frac{4}{2} = 2$  (as the memory is

2 way set associative).

Requests are: 8, 12, 0, 12, 8.



$8 \bmod 2 = 0 \rightarrow S_0$  1 miss

$12 \bmod 2 = 0 \rightarrow S_0$  1 miss

$0 \bmod 2 = 0 \rightarrow S_0$  1 miss

12 already present hit

$8 \bmod 2 = 0 \rightarrow S_0$  1 miss

$\therefore$  Total 4 cache misses.

37. (c)



$$32 - (14+5) = 13 \quad \log_2(14 \text{ bit}) = 4 \quad \log_2 32 = 5$$

Main memory address:



Line offset = 2 B 5 B

2-way set associative cache:

$$\text{Number of set} = \frac{N}{P\text{-way}} = \frac{2^{14}}{2^1} = 2^{13}$$



$$32 - (13+5) = 14 \quad \log_2 14 = 4 \quad \log_2 32 = 5$$



Set offset = 1 D 5 D

39. (b)

128 line or  $2^7$  so require 7 bitsSize of line 64 words =  $2^6$ So 6 bits are needed to identify each word uniquely. So, Tag bit =  $20 - 13 = 7$  bits.

40. (a)

Cache size = 32 kB

Block size = 32 B =  $2^5$  B

$$\text{Number of blocks} = \frac{32\text{kB}}{32\text{B}} = 1024 = 2^{10}$$

32 bit address.

| Tag | No. of blocks | Block size |
|-----|---------------|------------|
| 32  |               |            |
| 17  | 10            | 5          |
| 32  |               |            |

$$\text{Tag bits} = 32 - (10 + 5) = 17$$

41. (a)

$$\text{Number of sets} = n/k$$

Let there are  $x$  words per block in main memory.

| $\log_2 mx$ |     |               |
|-------------|-----|---------------|
| TAG         | SET | word          |
|             |     | $\log_2(n/k)$ |

$$\text{No. of TAG bits} = \log_2(mx) - \left( \log_2 \frac{n}{k} + \log_2 x \right)$$

$$= \log_2(mx) - \log \frac{nx}{k} = \log_2 \frac{mk}{n}$$

46. (c)

No. of blocks in cache = 8 (0 to 7)

No. of blocks in Main Memory = 128 (0 to 127)

|   |    |             |
|---|----|-------------|
| 0 | 0  | $0\%4 = 0$  |
|   | 16 | $55\%4 = 3$ |
| 1 | 5  | $5\%4 = 1$  |
|   | 9  | $3\%4 = 3$  |
| 2 |    | $9\%4 = 1$  |
|   |    | $7\%4 = 3$  |
| 3 | 55 | $0\%4 = 0$  |
|   | 7  | $16\% = 0$  |

Only 3 will be replaced.

Hence, (c) is the correct option.

48. (d)

Catch blocksize = 64 bits

Main memory has  $k$  banks = 24Each bank is 2 byte long because  $C = 2$ 

Total time for parallel access

$$= T = \frac{k}{2} + \text{latency} = \frac{24}{2} + 80 = 92\text{ns}$$

$$\text{Total latency time} = CT = 2 \times 92 = 184\text{ ns}$$

50. (1.411)

For set-associative organization:

| Tag | Set | Block |
|-----|-----|-------|
| 18  | 9   | 5     |

$$\text{So, } h_1 = \left( \frac{k}{10} + 0.6 \text{ for multiplexer} \right)$$

$$= \left( \frac{18}{10} + 0.6 \right) = 2.4\text{ ns}$$

| Tag | Line | Block |
|-----|------|-------|
| 17  | 10   | 5     |

$$\text{So, } h_2 = \left( \frac{17}{10} \right) = 1.7$$

(as number 2-to-1 multiplexer is required for direct mapped cache)

$$\therefore h_1/h_2 = 1.411$$

51. (d)

Apply Amdahl's law

$$F = 80\%, S = 20,$$

$$\text{Overall speed-up} = \left[ (1-F) + \frac{F}{S} \right]^{-1}$$

$$= \left[ (1-0.8) + \frac{0.8}{20} \right]^{-1} = 4.16$$

52. (d)

S1: Compulsory miss can be reduced by increasing the line size i.e., reduce number of lines.

S2: Conflict miss occur when too many blocks are mapped into same line or set. So by increasing the associativity i.e. increases the size of set and increases the number of sets.



S3: Capacity miss can be reduced by increasing the cache memory size.  
All of the three statements are true.

53. (a)

$$512 \text{ kW} = 2^{19} \text{ words} = 19 \text{ bit address}$$



$$\text{So number of input output addresses} = 1 \times 2^7$$

$$\text{Number of memory addresses} = 3 \times 2^{17}$$

54. (c)

$$T_{\text{memory}} = 200 \text{ ns}$$

$$H_{\text{read}} = 0.8$$

$$T_{\text{cache}} = 10 \text{ ns}$$

$H_{\text{write}}$  = 1 (by default for write through)

$$f_{\text{read}} = 80\%$$

$$f_{\text{write}} = 20\%$$

$$T_{\text{avg read}} = (0.8 \times 10) + (0.2 \times 200)$$

$$= 8 + 40 = 48 \text{ ns}$$

$$T_{\text{avg write}} = 1 \times 200 = 200 \text{ ns}$$

$$T_{\text{avg}} = f_{\text{read}} \times T_{\text{avg read}} + f_{\text{write}} \times T_{\text{avg write}}$$

$$= 0.8 \times 48 + 0.2 \times 200 = 78.4 \text{ ns}$$

55. (480026)

There are 3 cases to consider

| Location of referenced word      | Probability         | Total time for access in ns          |
|----------------------------------|---------------------|--------------------------------------|
| In cache                         | 0.9                 | 20                                   |
| Not in cache, but in main memory | $(0.1)(0.6) = 0.06$ | $60 + 20 = 80$                       |
| Not in cache or main memory      | $(0.1)(0.4) = 0.04$ | $12 \text{ ms} + 60 + 20 = 12000080$ |

So average access time should be

$$= 0.9(20) + (0.06)(80) + (0.04)(12000080)$$

$$= 480026 \text{ nsec}$$

56. (31)

$$\text{Block size} = 16 \text{ byte} = 2^4 \text{ byte} = 4 \text{ bits}$$

$$\text{Blocks in main memory} = 2^{10}$$

$$\text{So number of sets} = \frac{2^{10}}{2^4} = 2^6 \Rightarrow 64 \text{ bits}$$

$$\# \text{ of bits in physical address} = 2^{16} \text{ byte}$$

$$\Rightarrow 16 \text{ bits}$$



$$\text{SET value}_1 = 00000001 = \text{Decimal value} = (1)_{10}$$

$$\text{SET value}_2 = 000100000 = \text{Decimal value} = (32)_{10}$$

$$\text{Difference} = \text{SET}_2 - \text{SET}_1$$

$$= 32 - 1 = (31)_{10}$$

57. (1.2)

$$\text{Number of cache lines} = \frac{64 kB}{64 B} = 1 k = 2^{10}$$

$$\text{Number of sets} = \frac{2^{10}}{2^2} = 2^8$$



$$H_1 = 18/5 + 0.8 \text{ ns}$$

$$= 3.6 + 0.8 \text{ ns} = 4.4 \text{ ns}$$

Direct mapped cache:



This is no need of multiplexer in direct mapped cache.

$$\text{So, } H_2 = 16/5 \text{ nsec}$$

$$= 3.2 \text{ nsec}$$

$$\text{Difference} = H_1 - H_2$$

$$= 4.4 - 3.2 = 1.2 \text{ nsec}$$

58. (c)

$$\text{Cache access time} = x \text{ ns}$$

$$\text{Memory access time} = 9x \text{ ns}$$

$$80 = x + (1 - 0.7) * 9x$$

$$80 = 3.7x$$

$$x = 80/3.7 = 21.62 \text{ ns}$$

$$\text{So } 9x = 9 \times 21.62 = 194.60 \text{ ns}$$

Average access time increase 40%

$$112 = 21.62 + (1 - H_1) * 194.60$$

$$H_1 = 0.53$$

59. (76)

4 way set associative cache

Cache memory size = 32 Kbytes

Block size = 8 bytes

$$\text{Number of lines} = \frac{32 \text{ Kbytes}}{8 \text{ bytes}} = 4 \text{ K} = 2^{12}$$

$$\text{Number of sets} = \frac{2^{12}}{2^2} = 2^{10} = 1 \text{ K}$$

32 bit virtual address

| 32                               |                                  |                       |
|----------------------------------|----------------------------------|-----------------------|
| Tag                              | Set                              | Word offset           |
| Tag = $32 - (10+3)$<br>= 19 bits | $\log(1 \text{ K})$<br>= 10 bits | $\log(8)$<br>= 3 bits |

TAG memory size = Number of set × Number of lines in set × Number of tag bits

$$= 4 \times 2^{10} \times 19$$

$$= 76 \times 2^{10} = 76 \text{ K bits}$$

60. (b)

Level 1 : Cache and Level 2 : Main memory

$$T_{\text{avg}} = HT_c + (1 - H)(T_m + T_c)$$

$$T_c = 0.01 \mu\text{sec}, H = 0.95$$

$$T_m = 0.1 \mu\text{sec}$$

$$T_{\text{avg}} = (0.95 \times 0.01) \mu\text{sec} + (0.05)(0.01 + 0.1) \mu\text{sec}$$

$$= (0.0095 + 0.0055) \mu\text{sec}$$

$$= 0.015 \mu\text{sec}$$

61. (c)

Cache has  $2X$  blocks, Each set has 2 blocks

$$\Rightarrow \# \text{ sets} = \frac{2X}{2} = X$$

Block  $K$  maps to  $(K \bmod X)$  of the cache set

62. (b)

Direct cache format

| 24 bit |                             |                       |
|--------|-----------------------------|-----------------------|
| Tag    | LO                          | WO                    |
| 8 bit  | $\log_2 2^{14}$<br>= 14 bit | $\log_2 4$<br>= 2 bit |

MM Address :

|           |                     |      |
|-----------|---------------------|------|
| 1010 1010 | 1010 1010 1010 1010 | 1010 |
| tag       | LO                  | WO   |

LO = 2AAA

16-way set associate cache

$$\# \text{ Sets (S)} = \frac{N}{P-\text{way}}$$

$$\Rightarrow \frac{2^{14}}{16} = 2^{10}$$



MM Address :

|                |                |      |
|----------------|----------------|------|
| 1010 1010 1010 | 1010 1010 1010 | 1010 |
| tag            | SO             | WO   |

Set Offset = 2AA

63. (a)

Increase in the associativity leads to increase in the number of tag comparisons. Hence it leads to increase in cache access time.

64. (d)

Initially cache was empty

0 is miss and block containing 0<sup>th</sup> word would have been brought into cache. Again 2 is hit means 2<sup>nd</sup> address of memory available in same block as 0<sup>th</sup> address. But again 4 miss occurs.  
 $\therefore$  0<sup>th</sup>, 1<sup>st</sup>, 2<sup>nd</sup> words are surely in same block. Since size to be power of '2' than  $2^2 = 4$  bytes (Block size)

0-miss  $\rightarrow$  128-miss  $\rightarrow$  0H  $\rightarrow$  128H  $\rightarrow$  64M  $\rightarrow$  0M  
 64-miss affected 0  $\Rightarrow$  64 maps in the same set as 0  $\Rightarrow$  128 maps in the same set as zero. But the miss of 128 did not affect the presence of '0'.  
 $\Rightarrow$  At least 2-way set associative.

If it was more than 2-way associative then 64 would not have evicted 0.

$\Rightarrow$  Exactly 2-way set associative.

65. (a)  
4 way set associative  
Number of lines = 128

$$\text{Number of sets} = \frac{128}{4} = 32$$

Block size = 64 words



A : 1111 0101 1 010 00 11 1111

In addresses bit numbers (10 – 14) will decide the set number in cache memory.

∴ A – 8, B – 16, C – 19, D – 10

66. (d)  
A TLB miss can be handled in software or hardware because it will require only a short sequence of operations to copy a valid page table entry from memory into TLB. (MIPS traditionally handles TLB miss in software). Handling a TLB miss or a page fault requires using the exception mechanism to interrupt the active process, transfer control to the OS (operating system) and later resuming execution to interrupted process.  
So (i), (ii) and (iii) are correct.

67. (b)

Range of blocks is: 0 to  $2^S - 1$

$2^S - 1$  goes to cache line  $m - 1$

$2^S - 2$  goes to cache line  $m - 2$

⋮

$2^S - m$  goes to cache line 0.

68. (b)

For 1 second it takes  $10^9$  byte

$$\text{So for 64 kbyte it takes} = \frac{64k}{10^9} = 64 \mu\text{sec}$$

Main memory latency = 64  $\mu\text{sec}$

$$\begin{aligned} \text{Total time required to fetch} &= 64 \mu\text{sec} + 64 \mu\text{sec} \\ &= 128 \mu\text{sec} \end{aligned}$$

69. (a)



$$\text{Opcode bit} = \lceil \log_2 170 \rceil = 8$$

$$\text{Register address bit} = \lceil \log_2 37 \rceil = 6$$

$$\begin{aligned} \text{Memory address bit} &= 32 - (8 + 6) = 18 \\ &= 2^{18} \text{ words} \\ &= 2^{18} \times 4 \text{ bytes} \\ &= 2^{20} \text{ bytes} = 1 \text{ MB} \end{aligned}$$



**5****CHAPTER****Input-Output and Secondary Memory**

1. During DMA transfer, DMA controller takes over the buses to manage the transfer
  - (a) directly from CPU to memory
  - (b) directly from memory to CPU
  - (c) indirectly between the I/O device and memory
  - (d) directly between the I/O device and memory
2. Memory mapped I/O involves
  - (a) transferring information between memory locations
  - (b) transferring information between registers and memory
  - (c) transferring information between the CPU and I/O devices in the same way as between the CPU and memory
  - (d) transferring information between I/O devices and memory
3. To prevent signals from colliding on the bus, \_\_\_\_\_ prioritize access to memory by I/O channels and processors.
  - (a) a register
  - (b) interrupts
  - (c) the processor scheduler
  - (d) a controller
4. Which of the following is not valid class of interrupts?
 

|             |                     |
|-------------|---------------------|
| 1. Program  | 2. Timer            |
| 3. I/O      | 4. Hardware failure |
| (a) 1 and 3 | (b) 1, 2 and 4      |
| (c) 2 and 3 | (d) none of these   |
5. System calls are usually invoked by using
  1. An indirect jump
  2. A software interrupt
  3. Polling
  4. A privileged instruction
6. What is signified by the term Direct Memory Access (DMA)?
  - (a) A less efficient I/O mechanism than 'interrupts' for single byte transfer operations.
  - (b) A class of multiprocessor computers.
  - (c) A mechanism used by the I/O system to bypass the memory management functions of an operating system.
  - (d) A more efficient I/O mechanism than 'programmed I/O' for single byte transfer operations.
7. Which one is required while establishing the communication link between CPU and peripherals?
  - (a) synchronization mechanism
  - (b) conversion of signal values
  - (c) operating modes
  - (d) all of the above
8. Which one is not a type of I/O channel?
  - (a) multiplexer
  - (b) selector
  - (c) block-multiplexer
  - (d) none of the above
9. The CPU initializes the DMA by sending \_\_\_\_\_.
  - (a) the starting address of the memory blocks where data is available or where data is to be stored
  - (b) the word count
  - (c) control for mode and start the transfer
  - (d) all of the above
10. Match List-I with List-II and select the correct answer using the codes given below the lists:

## List-I

- A. Stack overflow
- B. Supervisor call
- C. Invalid opcode
- D. Timer

## List-II

1. Software interrupt
2. Internal interrupt
3. External interrupt
4. Machine check interrupt

## Codes:

|     | A | B | C | D |
|-----|---|---|---|---|
| (a) | 2 | 3 | 4 | 1 |
| (b) | 2 | 1 | 2 | 3 |
| (c) | 3 | 1 | 2 | 4 |
| (d) | 3 | 1 | 4 | 2 |

11. Consider the following I/O instruction format for IBM 370 I/O channel

|                |                 |                |
|----------------|-----------------|----------------|
| Operation Code | Channel Address | Device Address |
|----------------|-----------------|----------------|

Then operation code specifies

1. Test I/O
  2. Test channel
  3. Store channel identification
  4. Halt device
- (a) Only 1 and 4    (b) Only 2 and 3
  - (c) 1, 2, 3 and 4    (d) 1, 3 and 4
12. In case of vectored interrupt, interrupt vector means
- (a) The branch information from the source which interrupts the system
  - (b) An address that points to a location in memory where the beginning address of the I/O service routine is stored
  - (c) Both (a) and (b)
  - (d) None of these
13. A computer system contains a main memory of 32 K size with 16 bit words. It also has a 4 K word cache divided into 4 slot sets with 64 words per slot. Assume that the cache is initially empty. The processor fetches words from 0, 1, 2, ..., 4351 in that order repeatedly 10 times. Assume a LRU policy for block replacement. How many miss operations will occur?

14. Daisy Chained and Independent Request bus arbitration requires \_\_\_\_\_ respectively. (Where N stands for number of Bus Master Devices in the Configuration)

- (a) 3 and 7 signal lines
- (b) 3 and  $2^N$  signal lines
- (c)  $2N$  and  $2N$  signal lines
- (d) 3 and  $2N$  signal lines

15. An 8-Bit DMA Device is operating in Cycle Stealing Mode (Single Transfer Mode). Each DMA cycle is of 6 clock states and DMA clock is 2 MHz. Intermediate CPU machine cycle takes 2  $\mu$ s, determine the DMA Data Transfer Rate
- (a) 100 Kbytes/sec (b) 200 Kbytes/sec
  - (c) 350 Kbytes/sec (d) None of these

16. Select the correct option for communication between computer buses with memory and I/O.
- (a) Use two separate buses, one for memory and the other for I/O
  - (b) Use one common bus for memory and I/O but have separate control lines for each
  - (c) Use one common bus for memory and I/O with common control lines
  - (d) All of the above

17. A device employing INTA line for device interrupt puts the CALL instruction on the data bus while
- (a) INTA is active
  - (b) HOLD is active
  - (c) READY is active
  - (d) None of the above

18. An interrupt that can be temporarily ignored by the counter is known as
- (a) vectored interrupt
  - (b) non-maskable interrupt
  - (c) maskable interrupt
  - (d) low priority interrupt

19. Which one of the following is true for a CPU having a single interrupt request line and a single interrupt grant line?
- (a) Neither vectored interrupt nor multiple interrupting devices are possible

- (b) Vectored interrupts are not possible but multiple interrupting devices are possible  
 (c) Vectored interrupts and multiple interrupting devices are both possible  
 (d) Vectored interrupts is possible but multiple interrupting devices are not possible
20. A device with data transfer rate 10 KB/sec is connected to a CPU. Data is transferred byte-wise. Let the interrupt overhead be 4  $\mu$ sec. The byte transfer time between the device interface register and CPU or memory is negligible. What is the minimum performance gain of operating the device under interrupt mode over operating it under program controlled mode?  
 (a) 15                  (b) 25  
 (c) 35                  (d) 45
21. DMA is transferring characters to the processor, from a device transmitting at 16000 bits per second. Assume DMA is using cycle stealing. If the processor needs access to main memory once every microsecond, find how much percentage will the processor be slowed down due to DMA activity?  
 (a) 0.20                  (b) 0.25  
 (c) 0.30                  (d) 0.35
22. Consider a system in which DMA technique is used to transfer 16 MB of data from an I/O device into memory. The bandwidth of I/O device is 128 KB/s. Once the data is filled into interface buffer, the DMA controller takes over the bus and transfer it to main memory in 28 sec.  
 What percentage of time is the CPU in busy mode (approximately)?  
 (a) 17                  (b) 82  
 (c) 35                  (d) 41
23. Consider a system employing interrupt driven input/output for a particular device that transfers data at an average of 16 KB/s on a continuous basis. Assume that interrupt processing takes 50  $\mu$ sec (i.e., the jump to the interrupt service routine (ISR), execute it and return to main program). The fraction of processor time is consumed by this input/output device if it interrupt for every byte is \_\_\_\_\_ (upto 3 decimal places).
24. A computer system that uses memory mapped I/O configuration, has a 32 bit address space. Address with 1's in 4 MSB refers to devices. What is the maximum amount of memory and port addresses that can be referenced in such system respectively?  
 (a)  $15 \times 2^{25}$  and  $1 \times 2^{28}$   
 (b)  $15 \times 2^{32}$  and  $1 \times 2^{32}$   
 (c)  $15 \times 2^{28}$  and  $1 \times 2^{30}$   
 (d)  $12 \times 2^{28}$  and  $1 \times 2^{28}$
25. The minimum time delay between the initiations of two independent memory operations is called  
 (a) access time    (b) cycle time  
 (c) transfer rate    (d) latency time
26. Consider a Disk I/O transfer, in which 1500 bytes are to be transferred, but number of bytes on a track is 1000, and rotation speed of disk is 1500 rps but the average time required to move the disk arm to the required track is 15 ms, then what will be total access time?  
 (a) 16.33 ms                  (b) 15.33 ms  
 (c) 16.33  $\mu$ s                  (d) 15.33  $\mu$ s
27. Consider a disk drive with the following specifications: 16 surfaces, 512 tracks/surface, 512 sectors/track, 1 KB/sector, rotation speed 3000 rpm. The disk is operated in cycle stealing mode whereby whenever one byte word is ready it is sent to memory; similarly, for writing, the disk interface reads a 4 byte word from the memory in each DMA cycle. Memory cycle time is 40 nsec. The maximum percentage of time that the CPU gets blocked during DMA operation is  
 (a) 10                  (b) 25  
 (c) 40                  (d) 50

**Answers** Input-Output and Secondary Memory

1. (d) 2. (c) 3. (d) 4. (d) 5. (a) 6. (a) 7. (d) 8. (d) 9. (d)  
 10. (b) 11. (c) 12. (c) 14. (d) 15. (b) 16. (d) 17. (a) 18. (c) 20. (b)  
 21. (b) 22. (b) 24. (a) 25. (b) 26. (a) 27. (b)

**Explanations** Input-Output and Secondary Memory

4. (d)

1, 2, 3, 4 are valid classes of interrupts.

1. **Program:** Generated by some condition that occurs as a result of an instruction execution, such as arithmetic overflow, division by zero etc.
2. **Timer:** Generated by a timer within the processor. This allows the operating system to perform certain functions on a regular basis.
3. **I/O:** Generated by an I/O controller, to signal normal completion of an operation or to signal a variety of error condition.
4. **Hardware failure:** Generated by failure such as power failure or memories parity error.

7. (d)

To establish the communication link between CPU and peripheral requires following:

1. **Synchronization mechanism:** The data transfer rate of peripherals is usually slower than the transfer rate of the CPU, and consequently a synchronization mechanism may be needed.
2. **Conversion of signal values:** Peripheral are electromechanical and electromagnetic devices and their manner of operation is different from the operation of the CPU and memory, which are electronic devices. Therefore, a conversion of signal values may be required.
3. **Operating modes:** The operating modes of peripheral are different from each other and each must be controlled so as not to disturb the operation of other peripheral connected to the CPU.

9. (d)

The CPU initializes the DMA by sending:  
 The starting address of the memory blocks where data are available (for read) or where data are to be stored. (For write)

The word count, which is the number of words in the memory.  
 Control to specify the mode of transfer such as read or write.

13. (248)

Cache holds 64 blocks without replacement.  
 CPU fetches words from 0 to 4351 (Total 4352 words)

$$\text{Number of blocks required} = \frac{4352}{64} = 68 \text{ blocks}$$

1<sup>st</sup> time = 68 miss operations (initially cache was empty)

For 2<sup>nd</sup> time reference, the available blocks in cache are 64. But the misses ( $B_0, B_1, B_2$  and  $B_3$ ) lead to subsequent misses of ( $B_{16}$  to  $B_{19}$ ), ( $B_{32}$  to  $B_{35}$ ), ( $B_{48}$  to  $B_{51}$ ) and ( $B_{64}$  to  $B_{67}$ ) with LRU policy. The same processes repeated 9 times.

$$\therefore \text{Total} = 68 + 9 \times (4 + 4 + 4 + 4 + 4) = 248$$

18. (c)

Maskable interrupt temporarily ignored by counter.

21. (b)

It is transferring 16000 bits in 1 second.  
 1 character takes 500 micro seconds.  
 Processor accesses main memory in every ms

$$\therefore \frac{1}{500} \times 100 = \frac{1}{5} = 0.25\%$$

22. (b)

$$\text{Time taken by I/O device} = \frac{16 \text{ MB}}{128 \text{ kB}} = 128 \text{ sec}$$

$$\begin{aligned}\text{Percentage time CPU is busy} &= \frac{128}{128 + 28} \times 100 \\ &= 82.05\end{aligned}$$

23. (0.819)

$$\begin{aligned}\text{The device generate } 16 \text{ KB} &= 16 \times 1024 \text{ B/sec} \\ &= 16384 \text{ byte/sec}\end{aligned}$$

$$1 \text{ sec} = 16384 \text{ B}$$

$$1 \text{ byte} = \frac{1}{16384} = 61.03 \times 10^{-6} \text{ sec}$$

$$= 61.03 \mu\text{sec}$$

Given that each interrupt consume 50  $\mu\text{sec}$ .

$\therefore$  Fraction of processes time consumed in

$$= \frac{50}{61.03} = 0.8192$$

24. (a)

|        | $A_{31}$ | $A_{30}$ | $A_{29}$ | $A_{28}$ | $A_3 A_2 A_1 A_0$ |
|--------|----------|----------|----------|----------|-------------------|
| Memory | 0        | 0        | 0        | 0        |                   |
|        | 0        | 0        | 0        | 1        |                   |
|        | 0        | 0        | 1        | 0        |                   |
|        | 0        | 0        | 1        | 1        |                   |
|        | 0        | 1        | 0        | 0        |                   |
|        | M        | M        | M        | M        |                   |
|        | 1        | 1        | 0        | 1        |                   |
|        | 1        | 1        | 1        | 0        |                   |
| I/O    | 1        | 1        | 1        | 1        |                   |

$\therefore$  Memory address space:  $15 \times 2^{28}$

I/O address space =  $1 \times 2^{28}$

26. (a)

$$T_a = T_s + \frac{1}{2r} + \frac{b}{rN}$$

Given:  $T_a \rightarrow$  transfer time

$T_s \rightarrow$  average seek time = 15 ms

$r \rightarrow$  rotation speed in rpm = 1500 rps

$b \rightarrow$  no. of bytes to be transferred = 1500 bytes

$N \rightarrow$  number of bytes on a track = 1000 bytes

$$T_a = 15 \text{ ms} + \frac{1}{2 \times 1500 \text{ rps}} + \frac{1500}{150 \text{ rps} \times 1000}$$

$$= 15 \text{ ms} + \frac{1}{3000 \text{ rps}} + \frac{1}{1000 \text{ rps}}$$

$$= 15 \text{ ms} + \frac{1+3}{3000 \text{ rps}}$$

$$= 15 \text{ ms} + \frac{4}{3000 \text{ rps}} = 16.33 \text{ ms}$$



# Data Representation

1. Consider the following floating point format



What is the normalized representation of decimal number  $0.413 \times 2^{13}$ ?

Assume, the mantissa has an implicit 1 preceding the binary point and only 0's are padded in while shifting a field

- (a) 0A20      (b) 4BA8  
 (c) 4DD0      (d) 4AE8

2. Consider the following two 2's complement numbers, 10001011 and 11110011 and perform the arithmetic addition operation. What is the status of the carry, sign, zero and overflow flags after the operation?

- (a) 1, 0, 1, 1      (b) 1, 0, 0, 1  
 (c) 1, 1, 0, 0      (d) 1, 1, 1, 0

3. Consider the following floating point format



Mantissa is a pure fraction in sign-magnitude form. What is the representation of decimal number  $0.625 \times 2^{18}$  in hexadecimal without normalization and rounding off?

- (a) 52B0      (b) 52A0  
 (c) 52A1      (d) None of these

4. Consider the following statements:

A multiplexer

1. selects one of the several inputs and transmits it to a single output

2. routes the data from a single input to one of many output

3. converts parallel data into serial data

4. is a combinational circuit

Which of these statements are correct?

- (a) 1, 2 and 4      (b) 2, 3, and 4  
 (c) 1, 3 and 4      (d) 1, 2 and 3

5. In a particular number system having base  $B$ ,

$$(\sqrt{41})_B = 5_{10}$$

The value of ' $B$ ' is \_\_\_\_\_

6. Number  $(+48.5)_{10}$  can be represented as a floating-point binary number with 24 bits. The normalized fraction mantissa has 1 bits and the exponent has 8 bits. The mantissa of the number is

- (a) 0101110100000000  
 (b) 1001110100000000  
 (c) 0000011000000000  
 (d) 000000001011101

7. Two numbers -48 and -23 are added using 2's complement. The 2's complement of the result using 8 bit representation is \_\_\_\_\_

- (a) 10111001      (b) 01000111  
 (c) 01101010      (d) 11100111

8. Consider an IBM system which allocate 32 bits for floating point number. The base of the system is 16. Mantissa is normalised sign magnitude fraction. Exponent is in Excess-128 from.

What is the octal pattern which represents the value  $-(7.5)_{10}$  in the register?

- (a) 30057000000      (b) 02005700000  
 (c) 00601360000      (d) 60136000000

9. Consider the following floating point format



Mantissa is a pure fraction in sign – magnitude form.

The decimal number  $0.529 \times 2^{13}$  has the following hexadecimal representation without normalization and rounding off.

- (a) 0D24      (b) 0D4D  
(c) 48D7      (d) 4D3D

10. Consider the following floating point format:

| Sign bit | Exponent | Mantissa |
|----------|----------|----------|
|----------|----------|----------|

The exponent is formatted using excess 16 notation with an implied base of 2. If the contents are:

Sign: 0

Exponent: 10111

Mantissa: 1100100000

What is the exponent value expressed by the above format?

11. Consider the following floating point format

| Sign(s) | Exponent (E) | Mantissa (M) |
|---------|--------------|--------------|
| 1 bit   | 8 bits       | 23 bits      |

The decimal number 42.6875 has following hexadecimal representation with normalization and rounding off

- (a) 422AD000      (b) 422AC000  
(c) 42AAD000      (d) None of these

12. The 2's complement representation of a 16 bit number (One sign bit and 15 magnitude bits) is FFFF. Its magnitude in decimal representation will be \_\_\_\_\_.

13. The value denoted by 1100000111110000...0 in IEEE 754 signal precision standard in decimal is \_\_\_\_\_.

14. Booth's coding in 8-bits for the decimal number -57 is \_\_\_\_\_

- (a) 0 - 100 + 1000  
(b) 0 - 100 + 100 - 1  
(c) 0 - 1 + 100 - 10 + 1  
(d) 00 - 10 + 100 - 1

15. When multiplicand  $Y$  is multiplied by multiplier  $X = x_{n-1}x_{n-2}\dots x_0$  using bit-pair recording in Booth's algorithm, partial products are generated according to the following table.

| Row | $x_{i+1}$ | $x_i$ | $x_{i-1}$ | Partial Product |
|-----|-----------|-------|-----------|-----------------|
| 1   | 0         | 0     | 0         | 0               |
| 2   | 0         | 0     | 1         | $Y$             |
| 3   | 0         | 1     | 0         | $Y$             |
| 4   | 0         | 1     | 1         | $2Y$            |
| 5   | 1         | 0     | 0         | ?               |
| 6   | 1         | 0     | 1         | $-Y$            |
| 7   | 1         | 1     | 0         | $-Y$            |
| 8   | 1         | 1     | 1         | ?               |

The partial products for rows 5 and 8 are

- (a)  $2Y$  and  $Y$       (b)  $-2Y$  and  $2Y$   
(c)  $-2Y$  and 0      (d) 0 and  $Y$

**Answers Data Representation**

1. (b) 2. (b) 3. (b) 4. (c) 5. (6) 6. (a) 7. (b) 8. (a) 9. (c)  
 11. (b) 14. (b) 15. (a)

**Explanations Data Representation****1. (b)**

The decimal number is  $0.413 \times 2^{13}$

Binary representation of 0.413 is  
 $0.01101001101110$

$\therefore$  Number is  $(0.01101001101110) \times 2^{13}$ . After normalization we have  $(1.101001101110) \times 2^{11}$ .

Then the biased exponent =  $11 + 64 = 75$

Representing biased exponent 75 in binary

$$(75)_{10} = (1001011)_2$$

Floating point representation of a number is as follows:

| Sign | Exponent            | Mantissa |
|------|---------------------|----------|
| 0    | 1001011             | 10100110 |
|      | 0100 1011 1010 0110 |          |

↓      ↓      ↓      ↓  
 4      B      A      6

**2. (b)**

$$\begin{array}{r}
 1 \quad 11 \\
 10001011 \\
 11110011 \\
 \hline 01111110
 \end{array}$$

Carry = 1

Sign = 0

Zero = 0

Overflow = 1

**3. (b)**Biased exponent =  $18 + 64 = 82$ 

Representing 82 in binary

$$(82)_2 = (1010010)_2$$

Representing mantissa in binary

$$(0.625)_{10} = (0.10100000)$$

Floating point representation is as follows:

| Sign bit | Exponent | Mantissa |
|----------|----------|----------|
| 0        | 1010010  | 10100000 |

5      2      A      0

**4. (c)**

Multiplexer is a combinational circuit, converts parallel to serial data and it selects one of the several inputs and transmits to a single output.

**5. (6)**

Squaring both sides,  $(\sqrt{41})^2 = (5)^2$

$$(41)_B = (25)_{10}$$

$$(4B+1)_{10} = (25)_{10}$$

$$\Rightarrow B = 6$$

**6. (a)**

$$\text{As } 46.5 = (101110.1)_2$$

$$\begin{array}{r}
 0101110100000000 \\
 \cdot \qquad \qquad \qquad \text{mantissa} \\
 \hline
 00000110 \\
 \cdot \qquad \qquad \qquad \text{exponent}
 \end{array}$$

**7. (b)**

$$\begin{array}{r}
 (-48)_{10} + (-23)_{10} \\
 -48 \quad 11010000 \\
 -23 \quad (+) \quad 11101001 \\
 \hline
 -71 \quad 10111001
 \end{array}$$

2's complement of a 2's complement number is the number itself.

Here answer -71 is in 2's complement.

2's complement of 10111001 is 01000111

**8. (a)**

$$\text{Bias} = 128$$

$$2^{k-1} = 128$$

$$\Rightarrow k = 8$$

$$S = 1$$

$$\begin{aligned}
 (7.5)_{10} &= (111.1)_2 = (.1111)_2 \cdot 2^3 \\
 &= (.01111)_2 \cdot 2^4
 \end{aligned}$$

| 1    | 8        | 23       |
|------|----------|----------|
| Sign | Exponent | Mantissa |
|      |          |          |

32 bits

Mantissa = 01111000.....0  
23 bits

$$\begin{aligned}E-128 &= 1 \\ \Rightarrow E &= 129 = 10000001\end{aligned}$$

| Sign | Exponent | Mantissa      |
|------|----------|---------------|
| 1    | 10000001 | 01111000....0 |

32 bits

11   000   000   101   111  
 ↓       ↓       ↓       ↓       ↓  
 3       0       0       5       7

9. (c)

$$\text{Biased exponent} = 13 + 64 = 77$$

$$\text{Representing } 77 \text{ in binary } (77)_{10} = (1001101)_2$$

Representing mantissa in binary

$$(0.529)_{10} = 0.10000111011011$$

Floating point representation is as follows:

| Sign | Exponent | Mantissa |
|------|----------|----------|
| 0    | 1001101  | 10000111 |

0100   1101   1000   0111  
 ↓       ↓       ↓       ↓  
 4       D       8       7

10. (7)

$$\text{Exponent} = (10111)_2 = (23)_{10}$$

$$\text{Biased Exponent for the format} = 23 - 16 = 7$$

11. (b)

The decimal number is = 42.6875

Binary number representation of 42.6875

$$\text{Normalization form} = 1.010101011 \times 2^5$$

Mantissa field is = 010101011000000000000000

$$\text{Exponent field is} = 5 + 127 = 132 = (10000100)_2$$

Value in given format is

| Sign(s) | Exponent (E) | Mantissa (M)              |
|---------|--------------|---------------------------|
| 0       | 100 00100    | 010 101011000000000000000 |

4      2      2      A      C      0      0      0

So the hexadecimal representation is  $(422AC000)_{16}$ .

12. (1)

2's complement representation of FFFF =  $(1111)_2$



$$\begin{aligned}\therefore \text{Magnitude in decimal representation} \\ &= \text{Magnitude of 2's complement of} \\ &111111111111111 \\ &= (000000000000001)_2 = (1)_{10}\end{aligned}$$

13.  $-(30.5)_{10}$ 

| 1 bit  | 8 bit    | 23 bit        |
|--------|----------|---------------|
| Signal | Exponent | Mantissa      |
| 1      | 1000001  | 111110000...0 |

32

$$\begin{aligned}\text{Value} &= (-1)^3 (1.E) \times 2^{E-127} \\ &= (-1)^1 (1.11111) \times 2^{131-127} \\ &= -(1.11111) \times 2^4 \\ &= -[2^0 \times 1 + 2^{-1} \times 1 + 2^{-2} \times 2^{-3} \times 1 \\ &\quad + 2^{-4} \times 1 + 2^{-5} \times 1] \times 2^4 \\ &= [2^4 + 2^3 + 2^2 + 2^1 + 2^{-1}] \\ &= -(30.5)_{10}\end{aligned}$$

14. (b)

0-100+100-1 evaluates to 57 in Booth's coding as below:

$$\begin{aligned}(0 * 2^7) - (1 * 2^6) + (0 * 2^5) + (0 * 2^4) + (1 * 2^3) \\ + (0 * 2^2) + (0 * 2^1) - (1 * 2^0) \\ = -64 + 8 - 1 = -57\end{aligned}$$

