

## 1

# Machine Instructions and Addressing Modes



## Multiple Choice Questions

### Common Data For Q.1 & Q.2

- Q.1 In the program below, the number of times the FIRST and SECOND JNZ instructions cause the control to be transferred to LOOP respectively

```

MVI H, 02H
MVI L, 05H
LOOP: DCR L; Decrement L by 1
FIRST: JNZ LOOP
       DCR H; Decrement H by 1
SECOND: JNZ LOOP
    
```

- (a) 5 and 2      (b) 4 and 1  
(c) 259 and 1    (d) 260 and 1

- Q.2 In place of DCR L instruction if we can use SUBL, #1. The number of times the FIRST and SECOND JNZ instructions cause the control to be transferred to loop respectively

- (a) 259 and 1      (b) 260 and 1  
(c) 6 and 1          (d) 5 and 2

- Q.3 The most relevant addressing mode to write position independent codes is

- (a) Direct mode    (b) Indirect mode  
(c) Relative mode   (d) Indexed mode

[GATE 1987]

### Common Data For Q.4 & Q.5

Consider the following assembly language program for a hypothetical processor. A, B and C are 8-bit registers. The meanings as various instructions are shown as comments.

```

MOV B, #0; B ← 0
MOV C, #8; C ← 8
    
```

© Copyright Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

|     |            |                                          |
|-----|------------|------------------------------------------|
| Z : | CMP C, #0; | Compare C with 0                         |
|     | JZ X;      | Jump to X if zero flag is set            |
| SUB | C, #1;     | C ← C - 1                                |
| RRC | A, #1;     | Rotate right A through carry by one bit. |
| JC  | Y;         | Jump to Y if carry flag is set           |
| JMP | Z;         | Jump to Z                                |
| Y : | Add B, #1; | B ← B + 1                                |
|     | JMP Z;     | Jump to Z                                |
| X : |            |                                          |

- Q.4 If the initial value of register A is  $A_0$ . The value of register B after the program execution will be
- The number of '0' bits in  $A_0$
  - The number of '1' bits in  $A_0$
  - $A_0$
  - 8

[GATE 2003]

- Q.5 Which of the following instructions when inserted at location X will ensure that the value of register A after program execution is the same as its initial value
- RRC A, #1; Rotate right A through carry
  - NOP; No operation
  - LRC A, #1; Left rotate A through carry
  - ADD A, #1; A ← A + 1

[GATE 2003]

- Q.6 Which of the following addressing modes are suitable for program relocation at run time
- Absolute addressing
  - Based addressing
  - Relative addressing
  - Indirect addressing

(a) 1 and 2  
(c) 2 and 3

Common Data  
Solve the problem  
Consider the  
hypothetical C  
and  $R_3$   
Instruction

MOV R<sub>1</sub>, 50  
MOV R<sub>2</sub>, (R<sub>1</sub>)  
ADD R<sub>2</sub>, R<sub>1</sub>  
MOV 6000  
Halt;

Q.7 Consider  
with w  
been  
1000  
CPU  
instru  
stack  
(a) 1  
(c) 1

Q.8 Con  
with  
bee  
100  
AD  
pus  
(a)  
(c)

Q.9 Le  
op  
Re  
Ad  
In  
Th  
is  
(a)  
(c)

- (a) 1 and 4      (b) 1 and 2  
 (c) 2 and 3      (d) 1, 2 and 4

[GATE 2004]

**Common Data For Q.7 to Q.9**

Solve the problems and choose the correct answers. Consider the following program segment for a hypothetical CPU having three user registers  $R_1$ ,  $R_2$  and  $R_3$ .

| Instruction            | Operation                  | Instruction size (words) |
|------------------------|----------------------------|--------------------------|
| MOV $R_1$ , 5000;      | $R_1 \leftarrow M[5000]$   | 2                        |
| MOV $R_2$ , ( $R_1$ ); | $R_2 \leftarrow M[[R_1]]$  | 1                        |
| ADD $R_2$ , $R_3$ ;    | $R_2 \leftarrow R_2 + R_3$ | 1                        |
| MOV 6000, $R_2$ ;      | $M[6000] \leftarrow R_2$   | 2                        |
| Halt;                  | Machine Halts              | 1                        |

- Q.7** Consider that the memory is byte addressable with word size 32 bits, and the program has been loaded starting from memory location 1000 (decimal). If an interrupt occurs when the CPU has been halted after executing the Halt instruction, the return address saved in the stack will be  
 (a) 1007      (b) 1020  
 (c) 1024      (d) 1028

[GATE 2004]

- Q.8** Consider that the memory is word addressable with word size 32 bits and the program has been loaded starting from memory location 1000 (decimal). If an interrupt occurs during the ADD instruction, what will be the return address pushed on to the stack  
 (a) 1007      (b) 1004  
 (c) 1005      (d) 1016

[GATE 2004]

- Q.9** Let the clock cycles required for various operations be as follows:  
 Register to/from memory transfer: 3 cycles  
 Add with both operands in registers : 1 cycle  
 Instruction fetch and decode: 2 cycle per word  
 The total cycles required to execute the program is  
 (a) 29      (b) 24  
 (c) 23      (d) 20

[GATE 2004]

© Copyright Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

- Q.10** Match List-I with List-II and select the correct answer using the codes given below the lists:

**List-I**                    **List-II**

- A.  $A[I] = B[J]$       1. Indirect Addressing  
 B. While[\*A<sup>++</sup>]      2. Indexed Addressing  
 C. int temp = \*X      3. Auto Increment

**Codes:**

- | A     | B | C |
|-------|---|---|
| (a) 3 | 2 | 1 |
| (b) 1 | 3 | 2 |
| (c) 2 | 3 | 1 |
| (d) 1 | 2 | 3 |

[GATE 2005]

- Q.11** A CPU has 24-bit instructions. A program starts at address 300 (in decimal). Which one of the following is a legal program counter (all values in decimal)

- (a) 400      (b) 500  
 (c) 600      (d) 700

[GATE 2005]

- Q.12** Which of the following is/are true of the auto increment addressing mode?

1. It is useful in creating self relocating code.
  2. If it is included in an instruction set architecture, then an additional ALU is required for effective address calculation.
  3. The amount of increments depends on the size of the data item accessed.
- (a) 1 only      (b) 2 only  
 (c) 3 only      (d) 2 and 5 only

[GATE 2008]

**Common Data For Q.13 & Q.14**

Consider a hypothetical processor supports both 2 address and one address instructions. It has 128 word memory A 16-bit instruction is placed in the one memory word.

- Q.13** What is the range of two address and one address instructions are supported.

- (a) 1 to 3 and 128 to 364  
 (b) 1 to 3 and 128 to 384  
 (c) 0 to 4 and 120 to 380  
 (d) 0 to 3 and 64 to 256

- Q.14 If two 2-address instructions are already existed. How many one address instructions can be supported.
- 128
  - 2
  - 256
  - 32

**Linked Answer for Q.15 to Q.17**

A computer has 40-bit instruction format. It uses 1 register operand and 1 memory operand. There are 128 general purpose registers and 256 M bytes of RAM.

- Q.15 How many different operations the computer can perform if the indirect bit (mode) is used as the part of the operation code.
- 32
  - 16
  - 17
  - 8

- Q.16 If there are 'n' 2-address instructions which uses both register and memory, then how many one address instructions (which uses only memory) are possible
- $(2^5 - n) * 2^7$
  - $(2^4 - n) * 2^7$
  - $(2^5 - (n - 1)) * 2^7$
  - $(2^4 - (n + 1)) * 2^7$

- Q.17 If there are 'n' two address instructions which uses both register and memory, 'm' one address instructions which uses register operand then how many zero address instructions are possible
- $[(2^5 - n) * 2^7 - m] * 2^{28}$
  - $[(2^4 - n) * 2^{28} - m] * 2^7$
  - $[(2^5 - n) * 2^{28} - m] * 2^7$
  - $[(2^4 - n) * 2^7 - m] * 2^{28}$

- Q.18 The memory locations 1000, 1001 and 1020 have data values 18, 1 and 16 respectively before the following program is executed

|                                              |                  |
|----------------------------------------------|------------------|
| MOVI R <sub>s</sub> , 1;                     | Move immediate   |
| Load R <sub>d</sub> , 1000(R <sub>s</sub> ); | Load from memory |
| ADDI R <sub>d</sub> , 1000;                  | Add immediate    |
| Storl 0(R <sub>d</sub> ), 20;                | Store immediate  |

Which of the statement below is true after the program is executed?

- Memory location 100 has value 20
- Memory location 1020 has value 20
- Memory location 1021 has value 20
- Memory location 1001 has value 20

[GATE 2006]

- Q.19 If we use internal data forwarding to speed up the performance of a CPU (R<sub>1</sub>, R<sub>2</sub> and R<sub>3</sub>) are registers and M[100] is a memory reference then the sequence of operations.

$$R_1 \rightarrow M[100]$$

$$M[100] \rightarrow R_2$$

M[100] → R<sub>3</sub> can be replaced by

- $R_1 \rightarrow R_3$
- $M[100] \rightarrow R_2$
- $R_2 \rightarrow M[100]$
- $R_1 \rightarrow R_2$
- $R_1 \rightarrow R_3$
- $R_2 \rightarrow R_3$
- $R_1 \rightarrow R_2, R_1 \rightarrow R_3$
- $R_1 \rightarrow M[100]$

[GATE 2004]

- Q.20 In which addressing mode the effective address of the operand is generated by adding a constant value to the content of a register?

- Absolute mode
- Indirect mode
- Immediate mode
- Index mode

[ISRO 2009]

- Q.21 Consider a hypothetical processor with an instruction of type LW R<sub>1</sub>, 20(R<sub>2</sub>). Which during execution reads a 32-bit word from memory and stores it in a 32-bit register R<sub>1</sub>. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R<sub>2</sub>. Which of the following best reflects the addressing mode implemented by this instruction for the operand in memory

- Immediate addressing
- Register addressing
- Register indirect scaled addressing
- Base indexed addressing

[GATE 2011]

- Q.22 While an instruction is executed the program counter should contain the address to

- the current instruction
- the next sequential instruction
- the operand
- the previous instruction

[DRDO 2009]

Q.23 For which register, the effective address

$$EA = \begin{cases} (BX) \\ (DI) \\ (SI) \end{cases}$$

is

- (a) relative base indexed
- (b) register indirect
- (c) base indexed
- (d) register relative

Q.24 A certain RISC processor has 12 register windows and 16 global registers. Each window has 8 input 16 local and 8 output register. The total number of registers in the processor is

- (a) 312
- (b) 320
- (c) 296
- (d) 304

[DRDO 2008]

Q.25 Which of the following statements about relative addressing mode is false

- (a) It enables reduced instruction set
- (b) It allows indexing of array element with same instruction
- (c) It enables easy relocation of data
- (d) It enables faster address calculation than absolute addressing

[ISRO 2009]

Q.26 Compared to CISC processors, RISC processor contain

- (a) More register and smaller instruction set
- (b) Larger instruction set and less register
- (c) Less registers and smaller instruction set
- (d) More transistor elements

[ISRO 2009]

Q.27 Which of the following must be true for the RFE (Return from Exception) instruction on a general purpose processor

1. It must be a trap instruction\*
  2. It must be a privileged instruction
  3. An exception can not be allowed to occur during execution of an RFE instruction
- (a) 1 only
  - (b) 2 only
  - (c) 1 and 2 only
  - (d) 1, 2 and 3 only

[GATE 2008]

Q.28 Word 20 contains 40  
Word 30 contains 50

Word 40 contains 60  
Word 50 contains 70

Which of the following instructions loads 60 in to the accumulator?

- (a) Load immediate 20
- (b) Load direct 30
- (c) Load indirect 20
- (d) Load indirect 30



### Numerical Data Type Questions

#### Common Data For Q.29 & Q.30

Consider the following program segment. Here  $R_1$ ,  $R_2$  and  $R_3$  are the general purpose registers.

| Instruction                  | Operation                  | Instruction size (words) |
|------------------------------|----------------------------|--------------------------|
| MOV $R_1$ , [3000];          | $R_1 \leftarrow M[3000]$   | 2                        |
| Loop: MOV $R_2$ , ( $R_3$ ); | $R_2 \leftarrow M[[R_3]]$  | 1                        |
| ADD $R_2$ , $R_1$ ;          | $R_2 \leftarrow R_2 + R_1$ | 1                        |
| MOV ( $R_3$ ), $R_2$ ;       | $M[[R_3]] \leftarrow R_2$  | 1                        |
| INC $R_3$ ;                  | $R_3 \leftarrow R_3 + 1$   | 1                        |
| DEC $R_1$ ;                  | $R_1 \leftarrow R_1 - 1$   | 1                        |
| BNZ loop;                    | Branch on non zero         | 2                        |
| Halt;                        | Stop                       | 1                        |

Assume that the content of memory location 3000 is 10 and the content of the register  $R_3$  is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from memory location 1000. All the numbers in decimal.

Q.29 Assume that the memory is word addressable. The number of memory references for accessing the data in executing the program completely is \_\_\_\_\_.

[GATE 2007]

Q.30 Assume that the memory is word addressable. After the execution of this program, the content of memory location 2010 is \_\_\_\_\_.

[GATE 2007]

Q.31 Assume that the memory is byte addressable and the word size is 32 bits. If an interrupt occurs during the execution of the instruction "INC  $R_3$ ", what return address will be pushed on to the stack?

[GATE 2007]

- Q.32** Consider a three word machine instruction ADD A[R<sub>0</sub>], @ B

The first operand (destination) "A[R<sub>0</sub>]" uses indexed addressing mode with R<sub>0</sub> as the index register. The second operand "@ B" uses indirect addressing mode A and B are memory addresses residing at the second and third words respectively.

The first word of the instruction specifies the OP-CODE, the index register designation and the source and destination addressing modes. During execution of ADD instruction the two operands are added and stored in the destination (First Operand)

The number of memory cycles needed during the execution cycle of the instruction is \_\_\_\_\_. [GATE 2005]

- Q.33** A machine support 16-bit instruction format the size of address field is 4-bit. The computer uses expanding OP-CODE techniques and 34 two address instruction and 100 one address instructions. The number of zero address instructions it can support is \_\_\_\_\_.

- Q.34** A computer has 256 K word memory. The instruction format has four fields i.e., OP-CODE, register field to represent one of the 60 processor registers, mode field represents one of 7 addressing modes and memory address field. How many instructions the system supports when a 32-bit instruction is placed in the one memory cell.

#### Linked Answer for Q.35 & Q.36

A PC-relative mode branch instruction is 3 bytes long. The address of the instruction in decimal is 342038.

- Q.35** Determine the branch target address if the signed displacement in the instruction is -31

- Q.36** What is the branch address, if the base and index registers contain the values 480220 and 9 respectively (Assume base with index addressing mode is used)

- Q.37** Consider evaluating the following expression tree on a machine with load-store architecture

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

in which memory can be accessed only through load and store instructions. The variables a, b, c, d and e are initially stored in memory. The binary operators used in this expression tree can be evaluated by the machine only when the operands are in registers. The instructions produce result only in a register. If no intermediate results can be stored in memory. What is the minimum number of registers needed to evaluate this expression.



[GATE 2011]

- Q.38** The program below uses six temporary variables a, b, c, d, e, f

```
a = 1
b = 10
c = 20
d = a + b
e = c + d
f = c + e
b = c + e
e = b + f
d = 5 + e
return d+ f
```

Assuming that all operations take their operands from registers. What is the minimum number of registers needed to execute the program without spilling?

[GATE 2010]

- Q.39** The frequency of different types of instruction executed by a machine is tabulated below

| Operand Accessing Mode | Frequency in % |
|------------------------|----------------|
| Register               | 30             |
| Immediate              | 20             |
| Direct                 | 22             |
| Memory indirect        | 17             |
| Index                  | 11             |

Assuming two cycles are consumed for an operand to be read from the memory one cycle for index arithmetic computation and zero cycles if operands are available in registers or with in the instruction itself, the average operand fetch rate of the machine is \_\_\_\_\_.

[DRDO 2008]



### Try Yourself

- T1. A machine has a 32-bit architecture, with 1-word long instructions. It has 64 registers, each of which is 32 bits long. It needs to support 45 instructions, which have an immediate operand in addition to two register operands. Assuming that the immediate operand is an unsigned integer, the maximum value of the immediate operand is \_\_\_\_\_.

[GATE-2014, Ans: (16383)]

- T2. Consider a processor with byte-addressable memory. Assume that all registers, including Program Counter (PC) and Program Status Word (PSW), are of size 2 bytes. A stack in the main memory is implemented from memory location  $(0100)_{16}$  and it grows upward. The stack pointer (SP) points to the top element of the stack. The current value of SP is  $(016E)_{16}$ . The CALL instruction is of two words, the first word is the op-code and the second word is the starting address of the subroutine (one word = 2 bytes). The CALL instruction is implemented as follows:
- Store the current value of PC in the stack.
  - Store the value of PSW register in the stack.
  - Load the starting address of the subroutine in PC.

The content of PC just before the fetch of a CALL instruction is  $(5FA0)_{16}$ . After execution of the CALL instruction, the value of the stack pointer is

- (a)  $(016A)_{16}$       (b)  $(016C)_{16}$   
 (c)  $(0170)_{16}$       (d)  $(0172)_{16}$

[GATE-2015, Ans: (d)]

- T3. A hypothetical processor uses the instruction format with three fields i.e. opcode, register address fields used to represent one of the  $2^x$  registers and memory address field. A 32 bit instruction is placed in the  $2^y$  addressable cell memory. If there exists, 'z' two address instructions which uses both register and memory reference how many one address register reference instructions possible in the processor.

- (a)  $(2^{(32 - xy)} - z)2^y$       (b)  $(2^{32 - (x+y)} - z)2^y$   
 (c)  $(2^{xy} - z)2^y$       (d)  $(2^{32xy} - z)2^x$

[Ans: (b)]

- T4. Consider a 16 bit processor in which the following appears in main memory starting at location 38246:

|       |     |                  |
|-------|-----|------------------|
| 38246 | JMP | Mode             |
| 38247 |     | -12              |
| 38248 |     | Next Instruction |

The first byte of the instruction specifies the opcode and type of addressing mode used. Second byte of the instruction is the address field. Determine the effective address of the instruction to transfer the control if the mode field uses the PC-relative addressing mode.

- (a) 38234      (b) 38235  
 (c) 38236      (d) 38248

[Ans: (c)]

- T5. Consider a hypothetical process which supports two address instruction format. Instruction takes two operands 1<sup>st</sup> operand using the register addressing mode and 2<sup>nd</sup> operand using the indexed addressing mode. Processor supports  $2^n$  operations and 128 registers including the index register. Base address of the operand will be present in the address field of the instruction as a constant which occupies 20 bits space. What is the length of the instruction

- (a)  $(34 \times n)$  bits      (b)  $(34 + n)$  bits  
 (c)  $(2^n \times 34)$  bits      (d)  $(2^n + 34)$  bits

[Ans: (b)]

- T6. A computer supports 64 kB physical memory with the following contents

| Memory   |
|----------|
| 0000     |
| ...      |
| 4000 23H |
| 4001 12H |
| 4002 OAH |
| 4003 FCH |
| 4004     |
| ...      |
| ffff     |

The following instruction is executed on the two different computers ( $C_1$  and  $C_2$ ). Which are having the above memory specifications.  $C_1$  uses the little-endian mechanism and  $C_2$  uses the big-endian mechanism.

$I_1 : \text{MOV } r_0, @r_1 ; r_0 \leftarrow M[[r_1]]$

It is a 64 bit instruction with a general purpose register size of 16 bits. Register  $r_1$  contains 4002H what is the output of the above instruction when it is running on  $C_1$  and  $C_2$  respectively.

- (a) OAFCH and FCOAH
- (b) 2312H and 1223H
- (c) FCOAH and OAFCH
- (d) 1223H and 2312H

[Ans: (c)]

- T7. Consider the following program segment used to execute on a hypothetical processor. Consider all the registers are of 16 bit size

|       |       |          |                         |                              |
|-------|-------|----------|-------------------------|------------------------------|
| $I_1$ | MOV   | CX,0005  | ; CX $\leftarrow$ 0005  |                              |
| $I_2$ | MOV   | BX,OFF7H | ; BX $\leftarrow$ OFF7H |                              |
| Label | $I_3$ | MOV      | AX,OBCAH                | ; AX $\leftarrow$ OBCAH      |
|       | $I_4$ | OR       | BX,AX                   | ; BX $\leftarrow$ BX (OR)AX  |
|       | $I_5$ | AND      | DX,AX                   | ; DX $\leftarrow$ DX (AND)AX |
|       | $I_6$ | LOOP     | label                   | ; LOOP till CX = 0           |

Processor clock frequency is 1 kHz. In which data transfer operations takes 6 cycles, data manipulation operations takes 4 cycles and transfer of control operations takes 2 cycles to execute. How much time is required to execute the program on a above CPU

- (a) 92 msec
- (b) 28 msec
- (c) 108 msec
- (d) 80 msec

[Ans: (a)]

- T8. The size of the data count register of a DMA controller is 16 bits. The processor needs to transfer a file of 29,154 kilobytes from disk to main memory. The memory is byte addressable. The minimum number of times the DMA controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory is \_\_\_\_\_.

[GATE-2016, Ans: (456)]

- T9. Consider a processor with 64 registers and an instruction set of size twelve. Each instruction has five distinct fields, namely, opcode, two source register identifiers, one destination register identifier, and a twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a program has 100 instructions, the amount of memory (in bytes) consumed by the program text is \_\_\_\_\_.

[GATE-2016, Ans: (500)]





## Multiple Choice Questions

**Q.1** Consider a new instruction named branch on bit set (nemonic bbs). The instruction "bbs reg, pos, label" jumps to label if bit in position pos of register operand reg is one. A register is 32 bits wide and the bits are numbered 0 to 31 Bit in position 0 being the least significant. Consider the following emulation of this instruction on a processor that does not have bbs implemented.

temp  $\leftarrow$  reg and mask

Branch to label if temp is non-zero.

The variable temp is a temporary register for correct emulation, the variable mask must be generated by

- (a) mask  $\leftarrow$  0x 1 << Pos
- (b) mask  $\leftarrow$  0x f f f f f f f f >> Pos
- (c) mask  $\leftarrow$  Pos
- (d) mask  $\leftarrow$  0x f

[GATE 2006]

**Q.2** A micro programmed control unit

- (a) is faster than a hard-wired control unit
- (b) facilitates easy implementation of new instructions
- (c) is useful when very small programs are to be run
- (d) usually refers to the control unit of a micro processor

[GATE 1987]

**Q.3** A typical vertical micro instruction format is characterized by

- (a) limited encoding with limited parallelism
- (b) limited encoding with high degree of parallelism
- (c) high degree of encoding with high degree of parallelism
- (d) high degree of encoding with limited parallelism

[DRDO 2009]

## Common Data For Q.4 to Q.6

Consider the following data path of CPU



The ALU, the bus and all the registers in the data path are of identical size. All operations including increment of the PC and the GPR are to be carried out in the ALU. Two clock cycles are needed for memory read operation the first one for loading address in the MAR and the next one for loading data from the memory but into the MDR.

**Q.4** The instruction "add r<sub>0</sub>, r<sub>1</sub>" has the register transfer interpretation

R<sub>0</sub>  $\leftarrow$  R<sub>0</sub> + R<sub>1</sub> the minimum number of clock cycles needed for execution cycle of this instruction is

- |       |       |
|-------|-------|
| (a) 2 | (b) 3 |
| (c) 4 | (d) 5 |

[GATE 2005]

- Q.5** The instructions "call R<sub>n</sub>, Sub" is a two word instruction. Assuming the PC is incremented during the fetch cycle of the first word of the instruction. Its register transfer interpretation is

$R_n \leftarrow PC + 1;$   
 $PC \leftarrow M[PC];$

The minimum number of CPU cycles needed during the execution cycle of this instruction is



[GATE 2005]

- Q.6** The instructions "call R<sub>n</sub>, Sub" is a two word instruction. Assuming the PC is incremented during the fetch cycle of the first word of the instruction. Its register transfer interpretation is

$R_n \leftarrow PC + 1;$   
 $PC \leftarrow M[R_n];$

The minimum number of CPU cycles needed during the execution cycle of this instruction is



- Q.7** Consider a CPU where all instructions require 7 cycles to complete operation. There are 140 instruction in instruction set. It is found that 125 control signals are needed to be generated by the control unit. While designing the horizontal micro programmed control unit. Single address field format is used for branch control logic. What is the minimum size of the control word and control address register



[GATE 2005]

- Q.8** A micro programmed control memory supports 256 instructions. Every instruction on average consumes 8 micro operations. The system supports 16 flag conditions and 48 control signals. If the horizontal micro programming is used, what is the size of each control word let 1 address control instruction is used.



[GATE 2008]

- Q.9** The control field of 1-address control word has to support 2 groups of control signals. In the group-1 it is required to generate either 1 or none of the 63 control signals. In the group-2 at most 4 from the remaining. What will be the minimum number of bits needed for control field.







- Q.11** A hard wired CU uses 10 control signals  $S_1$  to  $S_{10}$  in various time steps  $T_1$  to  $T_5$  implement 4 instruction  $I_1$  to  $I_4$  as shown below:

|                | T <sub>1</sub>                                   | T <sub>2</sub>                                    | T <sub>3</sub>                                   | T <sub>4</sub>                  | T <sub>5</sub>                  |
|----------------|--------------------------------------------------|---------------------------------------------------|--------------------------------------------------|---------------------------------|---------------------------------|
| I <sub>1</sub> | S <sub>1</sub> , S <sub>3</sub> , S <sub>5</sub> | S <sub>2</sub> , S <sub>4</sub> , S <sub>6</sub>  | S <sub>1</sub> , S <sub>7</sub>                  | S <sub>10</sub>                 | S <sub>3</sub> , S <sub>8</sub> |
| I <sub>2</sub> | S <sub>1</sub> , S <sub>3</sub> , S <sub>5</sub> | S <sub>8</sub> , S <sub>9</sub> , S <sub>10</sub> | S <sub>5</sub> , S <sub>6</sub> , S <sub>7</sub> | S <sub>6</sub>                  | S <sub>10</sub>                 |
| I <sub>3</sub> | S <sub>1</sub> , S <sub>3</sub> , S <sub>5</sub> | S <sub>7</sub> , S <sub>8</sub> , S <sub>10</sub> | S <sub>2</sub> , S <sub>6</sub> , S <sub>9</sub> | S <sub>10</sub>                 | S <sub>1</sub> , S <sub>3</sub> |
| I <sub>4</sub> | S <sub>1</sub> , S <sub>3</sub> , S <sub>5</sub> | S <sub>2</sub> , S <sub>6</sub> , S <sub>7</sub>  | S <sub>2</sub> , S <sub>10</sub>                 | S <sub>6</sub> , S <sub>9</sub> | S <sub>10</sub>                 |

Which of the following pairs of expressions represent the circuit for generating control signals  $S_5$  and  $S_{10}$  respectively.

- (a)  $T_1 + T_2 T_3 \& (I_1 + I_3) T_4 + (T_2 + I_4) T_5$   
 (b)  $T_1 + (I_2 + I_4) T_3 \& (I_1 + I_3) T_4 + (I_2 + I_4) T_5$   
 (c)  $T_1 (I_2 + I_4) T_3 \& (I_2 + I_3 + I_4) T_2 + (I_1 + I_3) T_4 (I_2 + I_4) T_5$   
 (d)  $T_1 + I_2 T_3 \& (I_2 + I_3) T_2 + I_4 T_3 + (I_1 + I_3) T_4 + (I_2 + I_4) T_5$

**Q.12** A CPU has only three instruction  $I_1$ ,  $I_2$  and  $I_3$ . Which uses the following signals in time steps  $T_1 - T_5$

- I<sub>1</sub>: T<sub>1</sub>: A<sub>in</sub>, B<sub>out</sub>, C<sub>in</sub>
  - T<sub>2</sub>: PC<sub>out</sub>, B<sub>in</sub>
  - T<sub>3</sub>: Z<sub>out</sub>, A<sub>in</sub>
  - T<sub>4</sub>: PC<sub>in</sub>, B<sub>out</sub>
  - T<sub>5</sub>: End

$I_2 : T_1 : C_{in}, B_{out}, D_{in}$   
 $T_2 : A_{out}, B_{in}$   
 $T_3 : Z_{out}, A_{in}$   
 $T_4 : B_{in}, C_{out}$   
 $T_5 : End$   
 $I_3 : T_1, D_{in}, A_{out}$   
 $T_2 : A_{in}, B_{out}$   
 $T_3 : Z_{out}, A_{in}$   
 $T_4 : Z_{out}, A_{in}$   
 $T_5 : End.$

Which of the following logic functions will generate the hard wired control for the signal  $A_{in}$

- (a)  $T_1 I_1 + T_2 I_3 + T_4 I_3 + T_3$
- (b)  $(T_1 + T_2 + T_3)I_3 + T_1 I_1$
- (c)  $(T_1 + T_2)I_1 + (T_2 + T_4)I_3 + T_3$
- (d)  $(T_1 + T_2)I_2 + (T_1 + T_3)I_1 + T_3$

[GATE 2004]

Q.13 Horizontal micro programming

- (a) does not require use of signal decodes
- (b) results in larger sized micro instructions than vertical micro programming
- (c) use one bit for each control signal
- (d) All of the above

Q.14 The following code segments run on RISC and CISC architectures separately

#### RISC

```

MOV AX, 00
MOV BX, 05
MOV CX, 06
Start: ADD AX, BX
        LOOP Start ; loop till CX = 0
    
```

#### CISC

```

MOV AX, 05
MOV BX, 06
MUL AX, BX
    
```

If the MUL instruction takes 40 clock cycles, which of the following statement is true?

- (a) CISC code will run faster by a factor of 1.8
- (b) RISC code will run faster by a factor of 2.8
- (c) CISC code runs slower by a factor of 0.025
- (d) RISC code will run faster by a factor of 40

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

## Numerical Data Type Questions

### Common Data For Q.15 to Q.17

A 2 ns clock cycle processor consumes 4 cycles for ALU operations, 3 cycles for branches and 5 cycles for memory operations. The relative frequencies of these operations are 45%, 15% and 40% respectively.

Q.15 What is the average instruction execution time (in nano seconds)?

Q.16 What is performance in MIPS?

Q.17 If the program contains  $10^9$  instructions. What is the program execution time (in sec)?

Q.18 The instruction mix in an application and instruction execution speed of a hypothetical machine are shown below:

| Instruction | Speed (cycle) | Occurrence (%) |
|-------------|---------------|----------------|
| ADD         | 8             | 30             |
| SHIFT       | 4             | 20             |
| LOAD        | 12            | 30             |
| STORE       | 12            | 20             |

If the clock frequency is 2.3 GHz, the machine performance in MIPS is \_\_\_\_\_.

### Common Data For Q.19 to Q.21

The micro instructions stored in the control memory of a processor have a width of 26 bits. Each micro instruction is divided into three fields: a micro operation field of 13 bits, a next address field (X) and a MUX select field (Y). Three are 8 status bits in the inputs of the MUX



**Q.19** How many bits are there in X?

**Q.20** How many bits are there in Y?

**Q.21** What is the size of the control memory in number of words?

**Q.22** A control unit has to support 8 groups of mutually exclusive control signals. What will be the minimum number of bits saved with respect to horizontal micro programming.

| Group | G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 |
|-------|----|----|----|----|----|----|----|----|
| CS    | 1  | 3  | 6  | 7  | 11 | 13 | 3  | 1  |

**Q.23** An instruction set of a processor has 125 signals which can be divided into 5 groups of mutually exclusive signals.

Group 1: 20 signals

Group-2: 70 Signals

Group-3: 2 Signals

Group-4: 10 Signals

Group-5: 23 Signals

How many bits of the control word can be saved by using vertical micro programming over horizontal programming

[GATE 205]



### Try Yourself

**T1.** Consider 1 GHz clock frequency processor, uses different operand accessing modes shown below

| Operand Accessing Mode | Frequency (%) |
|------------------------|---------------|
| Register               | 20            |
| Immediate              | 20            |
| Direct                 | 20            |
| Mem indirect           | 10            |
| Indexed                | 17            |
| Auto indexed           | 13            |

Assume that 4 cycles consumed for memory reference, 2 cycles consumed for arithmetic computation and 0 cycles consumed when the operand is in register or instruction itself. What is the average operand fetch rate of the processor.

- (a) 290 million words/sec
- (b) 294.11 million words/sec
- (c) 394.11 million words/sec
- (d) 390.26 million words/sec

[Ans: (b)]

**T2.** A benchmark program is run on an 80 MHz processor. The executed program consists of 1,00,000 instructions with the following instruction mix and clock cycle count.

| Instruction Type   | Instruction Count | CPI |
|--------------------|-------------------|-----|
| Integer arithmetic | 45000             | 1   |
| Data transfer      | 32000             | 2   |
| Floating point     | 15000             | 2   |
| Control transfer   | 8000              | 2   |

What is the MIPS rate of the processor with reference to the above program

- (a) 51.61
- (b) 60
- (c) 42.8
- (d) 55.16

[Ans: (a)]

**T3.** Consider the following micro program.

$T_1 : MAR \leftarrow PC$

$T_2 : MBR \leftarrow X$

$PC \leftarrow PC + I$

$T_3 : IR \leftarrow Y$

Above  $\mu$ -program is used to perform which one of the following operation.

- (a) Operand fetch
- (b) Conditional jump
- (c) Instruction fetch
- (d) Interrupt subprogram initiation

[Ans: (c)]

**T4.** Consider the following program segment used to execute on a hypothetical processor.

|       |      |              |                                  |
|-------|------|--------------|----------------------------------|
| $I_1$ | LOAD | $r_1, (r_0)$ | ; $r_1 \leftarrow M[[r_0]]$      |
| $I_2$ | ADD  | $r_1, r_2$   | ; $r_1 \leftarrow r_1 + r_2$     |
| $I_3$ | AND  | $r_3, r_2$   | ; $r_3 \leftarrow r_3 \& \& r_2$ |
| $I_4$ | ADD  | $r_3, r_4$   | ; $r_3 \leftarrow r_3 + r_4$     |
| $I_5$ | LOAD | $(r_0), r_3$ | ; $M[[r_0]] \leftarrow r_3$      |
| $I_6$ | SUB  | $r_1, r_3$   | ; $r_1 \leftarrow r_1 - r_3$     |
| $I_7$ | HALT |              | ; HALTS                          |
| $I_8$ | OR   | $r_2, r_1$   | ; $R_2 \leftarrow R_2    R_1$    |
| $I_9$ | ADD  | $r_2, r_1$   | ; $R_2 \leftarrow R_2 + R_1$     |



## Multiple Choice Questions

- Q.1** Individual states of data path have the following latencies

| IF     | ID     | EX     | Mem    | WB     |
|--------|--------|--------|--------|--------|
| 300 Ps | 400 Ps | 350 Ps | 500 Ps | 100 Ps |

What is the clock cycle time in a pipelined processor?

- (a) 500 Ps      (b) 1650 Ps  
 (c) 1050 Ps      (d) None of these

- Q.2** Consider a pipelined processor with the following 4 stages

IF : Instruction fetch

ID : Instruction decode and operand fetch

EX : Execute

WB : Write Back

The IF, ID and WB stages take one cycle each to complete the operation the ADD and SUB instruction need 1 clock cycle and the MUL instruction need 3 cycles in the EX stage operand forwarding is used in the pipelined processor. What is the number of clock cycle taken to complete the following sequence of instructions?

$$\begin{array}{ll} \text{ADD } R_2, R_1, R_0; & R_2 \leftarrow R_1 + R_0 \\ \text{MUL } R_4, R_3, R_2; & R_4 \leftarrow R_3 * R_2 \\ \text{SUB } R_6, R_5, R_4; & R_6 \leftarrow R_5 - R_4 \end{array}$$

- (a) 7      (b) 8  
 (c) 10      (d) 14

- Q.3** Assume there are 4 stages in pipeline with each stage taking 1 ns, 2 ns, 1 ns, 3 ns respectively.

© Copyright Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

Operating the system in pipeline mode over non-pipelining mode will be benefited by

- (a) 1.33      (b) 1.66  
 (c) 2      (d) 2.33

- Q.4** For a pipelined CPU with a single ALU, consider the following situations

1. The  $j+1^{\text{st}}$  instruction uses the result of the  $j^{\text{th}}$  instruction as an operand.
2. The execution of a conditional jump instruction.
3. The  $j^{\text{th}}$  and  $j+1^{\text{st}}$  instructions require the ALU at the same time.

Which of the above can cause a Hazard?

- (a) 1 and 2 only      (b) 2 and 3 only  
 (c) 3 only      (d) All the three

[GATE 2003]

- Q.5** A 5-stage pipelined CPU has the following sequence of stages.

IF : Instruction fetch from instruction memory.

RD : Instruction decode and register read

EX : Execute, ALU operation for data and address computation.

MA : Data memory access; for write access the register read at RD stage it used.

WB : Register write back.

Consider the following sequence of instruction.

$I_1 : L R_0, Loc1 ; R_0 \leftarrow M[Loc1]$

$I_2 : A R_0, R_0 ; R_0 \leftarrow R_0 + R_0$

$I_3 : A R_2, R_0 ; R_2 \leftarrow R_2 - R_0$

Let each stage take one clock cycle. What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of  $I_1$ .



- Q.12** Consider the unpipelined processor. Assume that it has a 1 ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively.

Suppose that due to clock skew and set up. Pipelining the processor add 0.2 ns overhead to the clock. Ignoring any latency impact, how much speed up in the instruction execution rate will we gain from a pipeline







- Q.15** A weather forecasting computation requires 250 billion floating point operations. The problem is processed in a super computer that can perform 100 mega flops. How long will it takes to do these calculations?

(a) 21.32 min      (b) 31.42 min  
(c) 41.67 min      (d) 38.67 min



- Q.17** Consider 2-pipeline A and B where pipeline A having 8 stages of uniform delay of 2 ns. The pipeline B is having 5 stages with respective stage delays 2 ns, 3 ns, 1 ns, 2 ns, 2 ns. How much time is saved if 100 instructions are pipelined using A instead of B.

(a) 90 ns                  (b) 98 ns  
 (c) 88 ns                  (d) 0 ns

### Common Data For Q.18 & Q.19

Consider the following 4-stages instruction pipeline where different instructions are spending different amount of time at different stages.

|       | $S_1$ | $S_2$ | $S_3$ | $S_4$ |
|-------|-------|-------|-------|-------|
| $I_1$ | 2     | 1     | 2     | 2     |
| $I_2$ | 1     | 3     | 3     | 1     |
| $I_3$ | 2     | 2     | 2     | 2     |
| $I_4$ | 1     | 2     | 1     | 2     |

- Q.18** What is the speed up factor?

  - (a) 1.7
  - (b) 1.9
  - (c) 2.2
  - (d) 2.4



- Q.20** Consider 2 pipeline with the following specification

| Stall cycles for memory | Pipeline | No. of stages | Memory      |
|-------------------------|----------|---------------|-------------|
| 1                       | A        | K             | Single port |
| 0                       | B        | K             | Dual port   |

The pipeline allows all instructions except memory based instructions. If 2 memory operations can not be done in same clock. The penalty is 1 clock. Let there are 20% memory instructions obtain  $S_A / S_B$

- (a) 0.833      (b) 0.53  
 (c) 0.94      (d) 0.24

- Q.21** We have two designs  $D_1$  and  $D_2$  for a synchronous pipeline processor.  $D_1$  has 5 pipeline stages with execution times of 3 ns, 2 ns, 4 ns, 2 ns and 3 ns while the design  $D_2$  has 8 pipeline stages each with 2 ns execution time. How much time can be saved using design  $D_2$  over design  $D_1$  for executing 100 instructions.

- (a) 214 ns      (b) 202 ns  
 (c) 86 ns      (d) -200 ns

**Common Data For Q.22 & Q.23**

A pipelined processor uses a 4-stages instruction pipeline with the following stages. Instruction fetch (IF), instruction decode (ID), execute (EX) and write back (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement

$$X = (S - R * (P \cdot Q)) / T \text{ is given below:}$$

The values of variables P, Q, R, S, and T are available in the registers  $R_0$ ,  $R_1$ ,  $R_2$  and  $R_4$  respectively; before the execution of the instruction sequence.

|       |                   |                            |
|-------|-------------------|----------------------------|
| ADD   | $R_5, R_0, R_1 ;$ | $R_5 \leftarrow R_0 + R_1$ |
| MUL   | $R_6, R_2, R_5 ;$ | $R_6 \leftarrow R_2 * R_5$ |
| SUB   | $R_5, R_3, R_6 ;$ | $R_5 \leftarrow R_3 - R_6$ |
| DIV   | $R_6, R_5, R_4 ;$ | $R_6 \leftarrow R_5 / R_4$ |
| STORE | $R_6, X ;$        | $X \leftarrow R_6$         |

**Q.22** The number of read-after-write (RAW), write-after-read (WAR) and write-after-write (WAW) dependencies in the sequence of instructions are respectively.

- (a) 2, 2, 4
- (b) 3, 2, 3
- (c) 4, 2, 2
- (d) 3, 3, 2

**Q.23** The IF, ID and WB stages take 1-clock cycle each. The EX stage takes 1-clock cycle each for the ADD, SUB and STORE operations and 3-clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

- (a) 10
- (b) 12
- (c) 14
- (d) 16

**Q.24** A processor takes 12 cycles to complete an instruction 1. The corresponding pipelined processor uses 6 states with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speed up assuming that a very large number of instructions are to be executed.

- (a) 1.83
- (b) 2
- (c) 3
- (d) 6

[GATE 2007]

© Copyright. Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

**Q.25** Data forwarding technique can be used to speed up the operation in presence of data dependencies. Consider the following replacements of LHS with RHS

1.  $R_1 \rightarrow \text{LOC}; \text{LOC} \rightarrow R_2 \equiv R_1 \rightarrow R_2, R_1 \rightarrow \text{LOC}$
2.  $R_1 \rightarrow \text{LOC}; \text{LOC} \rightarrow R_2 \equiv R_1 \rightarrow R_2$
3.  $R_1 \rightarrow \text{LOC}; R_2 \rightarrow \text{LOC} \equiv R_1 \rightarrow \text{LOC}$
4.  $R_1 \rightarrow \text{LOC}; R_2 \rightarrow \text{LOC} \equiv R_2 \rightarrow \text{LOC}$

In which of the following options, will the result of executing the RHS be the same as executing the LHS irrespective of the instructions that follow

- (a) 1 and 3
- (b) 1 and 4
- (c) 2 and 3
- (d) 2 and 4

[GATE 2007]



The above circuit represents \_\_\_ Stages.

- (a) n states
- (b)  $(n - 1)$  states
- (c)  $2^n$  states
- (d)  $2^{n-1}$  states

**Q.27** Which of the following are not true in a pipelined processor.

1. By passing can handle all RAW Hazards.
  2. Register renaming can eliminate all register carried WAR Hazards
  3. Control Hazards penalties can be eliminated by dynamic branch prediction.
- (a) 1 and 2
  - (b) 1 and 3
  - (c) 2 and 3
  - (d) 1, 2 and 3

[GATE 2008]

**Q.28** A processor that has carry, overflow and sign bits as part of its Program Status Word (PSW) perform addition of the following two 2's complement numbers 01001101 and 11101001. After the execution of this addition operation. The status of the carry, overflow, and sign flags respectively will be

- (a) 1, 1, 0
- (b) 1, 0, 0
- (c) 0, 1, 0
- (d) 1, 0, 1

[ISRO 2009]

**Common Data For Q.22 & Q.23**

A pipelined processor uses a 4-stages instruction pipeline with the following stages. Instruction fetch (IF), instruction decode (ID), execute (EX) and write back (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement

$$X = (S - R * (P \cdot Q)) / T \text{ is given below:}$$

The values of variables P, Q, R, S, and T are available in the registers  $R_0$ ,  $R_1$ ,  $R_2$  and  $R_4$  respectively; before the execution of the instruction sequence.

|       |                  |                            |
|-------|------------------|----------------------------|
| ADD   | $R_5, R_0, R_1;$ | $R_5 \leftarrow R_0 + R_1$ |
| MUL   | $R_6, R_2, R_5;$ | $R_6 \leftarrow R_2 * R_5$ |
| SUB   | $R_5, R_3, R_6;$ | $R_5 \leftarrow R_3 - R_6$ |
| DIV   | $R_6, R_5, R_4;$ | $R_6 \leftarrow R_5 / R_4$ |
| STORE | $R_6, X;$        | $X \leftarrow R_6$         |

- Q.22** The number of read-after-write (RAW), write-after-read (WAR) and write-after-write (WAW) dependencies in the sequence of instructions are respectively.

- (a) 2, 2, 4
- (b) 3, 2, 3
- (c) 4, 2, 2
- (d) 3, 3, 2

- Q.23** The IF, ID and WB stages take 1-clock cycle each. The EX stage takes 1-clock cycle each for the ADD, SUB and STORE operations and 3-clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

- (a) 10
- (b) 12
- (c) 14
- (d) 16

- Q.24** A processor takes 12 cycles to complete an instruction 1. The corresponding pipelined processor uses 6 states with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speed up assuming that a very large number of instructions are to be executed.

- (a) 1.83
- (b) 2
- (c) 3
- (d) 6

[GATE 2007]

- Q.25** Data forwarding technique can be used to speed up the operation in presence of data dependencies. Consider the following replacements of LHS with RHS

1.  $R_1 \rightarrow \text{LOC}; \text{LOC} \rightarrow R_2 = R_1 \rightarrow R_2, R_1 \rightarrow \text{LOC}$
2.  $R_1 \rightarrow \text{LOC}; \text{LOC} \rightarrow R_2 = R_1 \rightarrow R_2$
3.  $R_1 \rightarrow \text{LOC}; R_2 \rightarrow \text{LOC} = R_1 \rightarrow \text{LOC}$
4.  $R_1 \rightarrow \text{LOC}; R_2 \rightarrow \text{LOC} = R_2 \rightarrow \text{LOC}$

In which of the following options, will the result of executing the RHS be the same as executing the LHS irrespective of the instructions that follow

- (a) 1 and 3
- (b) 1 and 4
- (c) 2 and 3
- (d) 2 and 4

[GATE 2007]



The above circuit represents \_\_\_ Stages.

- (a) n states
- (b)  $(n - 1)$  states
- (c)  $2^n$  states
- (d)  $2^{n-1}$  states

- Q.27** Which of the following are not true in a pipelined processor.

1. By passing can handle all RAW Hazards.
  2. Register renaming can eliminate all register carried WAR Hazards
  3. Control Hazards penalties can be eliminated by dynamic branch prediction.
- (a) 1 and 2
  - (b) 1 and 3
  - (c) 2 and 3
  - (d) 1, 2 and 3

[GATE 2008]

- Q.28** A processor that has carry, overflow and sign bits as part of its Program Status Word (PSW) perform addition of the following two 2's complement numbers 01001101 and 11101001. After the execution of this addition operation. The status of the carry, overflow, and sign flags respectively will be

- (a) 1, 1, 0
- (b) 1, 0, 0
- (c) 0, 1, 0
- (d) 1, 0, 1

[ISRO 2009]



## Numerical Data Type Questions

**Q.29** A 5 stage pipeline is used to overlap all the instructions except the branch instructions. The target of the branch can't be fetched till the current instruction is completed. What is the throughput (in MIPS) of the system if 20% of instructions are branch instructions ignore the overhead of buffer register. Each stage is having same amount of delay. The pipeline clock is 10 ns. Branch penalty of 4 cycles.

**Q.30** A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively. Registers that are used between the stages have a delay of 5 ns each. Assuming constant clocking rate; the total time taken to process 1000 data items on this pipeline will be (in  $\mu$ sec) \_\_\_\_\_.  
**[GATE 2004]**

**Q.31** A CPU has 5-stage pipeline and runs at 1 GHz frequency. Instruction fetch happens in the first stage of pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instruction following a conditional branch until the branch outcome is known. A program executes  $10^9$  instructions out of which 20% are conditions branches. If each instruction takes one cycle to complete on average, then total execution time of the program is (in sec) \_\_\_\_\_.  
**[GATE 2006]**

**Q.32** A non pipeline system takes 50 ns to process a task. The same task can be processed in a six segment pipeline with a clock cycle of 10 nS. Determine the speedup ratio of the pipeline for 100 tasks.

### Common Data For Q.33 to Q.35

Consider 5-stage pipeline which allows overlapping of all instructions except branch instruction. The target of branch instruction is not available until the branch instruction is completed. Let each stage delay is 20 ns and there are 30% branch instruction.

**Q.33** What is the average instruction execution time (ignore the fact that some of them are conditional)? (in ns)

**Q.34** What is the performance gain of the pipeline over nonpipelining?

**Q.35** Among the branch instructions 30% conditional and 70% of them does not satisfy the condition (branch not taken). If there is no stall due to them. What is average instruction execution time (in ns)?

**Q.36** Consider a pipeline processor with 4 stages  $S_1$  to  $S_4$ . We want to execute the following loop: for ( $i = 1, i \leq 1000; i++$ ) { $I_1, I_2, I_3, I_4$ } where the time taken (in ns) by instructions  $I_1$  to  $I_4$  for stage  $S_1$  to  $S_4$  are given below:

|       | $S_1$ | $S_2$ | $S_3$ | $S_4$ |
|-------|-------|-------|-------|-------|
| $I_1$ | 1     | 2     | 1     | 2     |
| $I_2$ | 2     | 1     | 2     | 1     |
| $I_3$ | 1     | 1     | 2     | 1     |
| $I_4$ | 2     | 1     | 2     | 1     |

The output of  $I_1$  for  $i = 2$  will be available after \_\_\_\_\_ (in ns).

**[GATE 2004]**

**Q.37** A 5-stage pipelined processor has instruction fetch (IF), instruction decode (ID) operand fetch (OF) perform operation (PO) and write back (WB) stages. The IF, ID, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions. 3 clock cycle for MUL instruction and 6 clock cycle for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions.

| Instruction               | Meaning of Instruction     |
|---------------------------|----------------------------|
| $I_0 : MUL R_2, R_0, R_1$ | $R_2 \leftarrow R_0 * R_1$ |
| $I_1 : DIV R_5, R_3, R_4$ | $R_5 \leftarrow R_3 / R_4$ |
| $I_2 : ADD R_2, R_5, R_2$ | $R_2 \leftarrow R_5 + R_2$ |
| $I_3 : SUB R_5, R_2, R_6$ | $R_5 \leftarrow R_2 - R_6$ |

**[GATE 2010]**

- Q.38** An instruction pipeline has 5 stages where each stage takes 2 ns and all instructions use all 5 stages. Conditional branch instructions are not overlapped i.e., the instruction after branch is not fetched till the branch instruction is completed consider ideal conditions.

If a branch instruction is a conditional branch instruction, the branch need not be taken, if the branch is not taken, the following instruction can be overlapped.

When 80% of the instructions are conditional branch instruction and 50% of the conditional branch instruction are such that the branch is taken calculate average instruction execution time (in ns).



### Try Yourself

- T1.** Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is \_\_\_\_\_.

[GATE-2014, Ans: (4)]

- T2.** Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency.

**P1:** Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns.

**P2:** Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.

**P3:** Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.

**P4:** Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.

Which processor has the highest peak clock frequency?

- (a) P1
- (b) P2
- (c) P3
- (d) P4

[GATE-2014, Ans: (c)]

- T3.** An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1ns, 2.2ns, 2ns, 1ns, and 0.75ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are  $P$  and  $Q$  nanoseconds, respectively. The value of  $P/Q$  is \_\_\_\_\_.

[GATE-2014, Ans: (1.54)]

- T4.** Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is \_\_\_\_\_.

[GATE-2015, Ans: (3.2)]

- T5.** Consider the sequence of machine instructions given below:

|     |            |
|-----|------------|
| MUL | R5, R0, R1 |
| DIV | R6, R2, R3 |
| ADD | R7, R5, R6 |
| SUB | R8, R7, R4 |

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is \_\_\_\_\_.

[GATE-2015, Ans: (13)]

- T6. Consider the 4 stage pipeline with the following stages.

$S_1$ : Instruction fetch.

**S<sub>3</sub>:** Instruction decode.

### S<sub>2</sub>: Execute.

S.: Write back.

Program consists of 16 instructions ( $I_1, I_2, \dots, I_{16}$ ). In which  $I_6$  is a unconditional branch instruction transfer the control to  $I_{12}$ . In the pipeline branch target address will be available at the end of execution state. Each instruction spends the same amount of time in all the pipeline stages. The cycle time of the pipeline is 4 ns. How much time is required to execute the above program without using the branch prediction?



[Ans: (c)]

- T7. Consider pipeline 'x' consist of 5 stages named as IF, ID, OF, EX and WB with the respective stage delays of 2 ns, 5 ns, 6 ns, 8 ns and 1 ns. The alternative pipeline 'y' contain the same number of stages but EX stage is divided into

4 sub stages, (EX1, EX2, EX3 and EX4) with equal delay i.e. (8 ns/4) and ID stage is divided into 2 substages (ID1 and ID2) with equal delay of (5 ns/2). In the pipeline x and y memory reference instruction are not overlapped so the penalty of memory reference instruction in the pipeline 'x' is 4 cycles and in the pipeline 'y' is 8 cycles. If the program contain 30% of the instruction as memory based instructions what is the value of ( $S_x / S_y$ ). i.e. speed up x / speed up of y.

- (a) 1.544      (b) 1.16  
 (c) 0.589      (d) 0.75

[Ans: (b)]

- T8. Consider the program segment consists of 16 instructions ( $I_1, I_2, \dots, I_{16}$ ) used to execute on a pipelines A and B. In the program segment all instructions are consumes same amount of time to execute.

Pipeline A is having 6 stages ( $S_1, S_2, S_3, S_4, S_5, S_6$ ) with the respective stage delays of 2 ns, 6 ns, 5 ns, 2 ns, 8 ns and 1 ns. Pipeline B is designed with same number of stages but stage 5 will be divided into 4 sub stages using the equal delay i.e. (8 ns/4). If the program execution time of pipeline 'A' is  $x$  and the program execution time of pipeline 'B' is  $y$ . What is the value of  $(x/y)$ .



[Ans: (a)]

- T9. Consider an instruction pipeline with the following stages.

IF : Instruction fetch.

ID : Instruction decode.

OF : Operand fetch.

**EX** : Execute (ALU operations are performed).

**MA** : Memory access (Data memory is accessed).

WB : Write back.

The following program segment is executed on the pipeline.

$I_1 : \text{Load } r_0, 3(r_1); r_0 \leftarrow M[3 + [r_1]]$

$I_2 : \text{Add } r_3, r_0, r_2; r_3 \leftarrow r_0 + r_2$

$I_3 : \text{Sub } r_4, r_3, r_5; r_4 \leftarrow r_3 - r_5$

$I_4 : \text{MUL } r_6, r_4, r_7; r_6 \leftarrow r_4 + r_7$

Operand forwarding is used in the pipeline. All the instructions will take one cycle on each stage to complete the execution. How long will it take to complete the above instructions in the pipeline.

- (a) 10 cycles      (b) 9 cycles  
(c) 8 cycles      (d) 12 cycles

[Ans: (a)]

T10. Consider the following code sequence having five instructions  $I_1$  to  $I_5$ . Each of these instructions has the following format.

OP R<sub>j</sub>, R<sub>j</sub>, R<sub>k</sub>

where operation OP is performed on contents of registers R<sub>j</sub> and R<sub>k</sub> and the result is stored in register R<sub>j</sub>.

$I_1 : \text{ADD } R_1, R_2, R_3$

$I_2 : \text{MUL } R_7, R_1, R_3$

$I_3 : \text{SUB } R_4, R_1, R_5$

$I_4 : \text{ADD } R_3, R_2, R_4$

$I_5 : \text{MUL } R_7, R_8, R_9$

Consider the following three statements:

S1: There is an anti-dependence between instructions  $I_2$  and  $I_5$ .

S2: There is an anti-dependence between instructions  $I_2$  and  $I_4$ .

S3: Within an instruction pipeline an anti-dependence always creates one or more stalls.

Which one of above statements is/are correct?

- (a) Only S1 is true  
(b) Only S2 is true  
(c) Only S1 and S3 are true  
(d) Only S2 and S3 are true

[GATE-2015, Ans: (b)]

T11. The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is \_\_\_\_\_ percent.

[GATE-2016, Ans: (33.28)]

T12. Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies  $\tau_1$ ,  $\tau_2$ , and  $\tau_3$  such that  $\tau_1 = 3\tau_2/4 = 2\tau_3$ . If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is GHz, ignoring delays in the pipeline registers.

[GATE-2016, Ans: (\*)]



4

# Memory Organization



## **Multiple Choice Questions**



**Q.2** Consider the following two designs for a little cache.

Design 1 is a direct cache of 8, 1-word cache lines. The miss penalty is 8 clock cycles.

Design 2 can store the same total number of items as design 1 but it is a two way set associative cache of 1 word cache line. Least recently used is utilized to determine which items should be removed from the cache. The miss penalty is 10 clock cycles.

Suppose that the following eight memory reference arrive.

0, 3, 14, 11, 4, 11, 8, 0

How much time will these designs spend on cache miss penalties, assuming the cache starts empty?

- (a) design 1 spends 56 cycles and  
design 2 spends 60 cycles
  - (b) design 1 spends 56 cycles and  
design 2 spends 70 cycles
  - (c) design 1 spends 48 cycles and  
design 2 spends 70 cycles
  - (d) design 1 spends 64 cycles and  
design 2 spends 60 cycles

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any manner without written permission.

Common Data For Q.8 & Q.9

Consider two cache organizations: The first one is 32 kB 2-way set associative with 32 B block size. The second one is of the same size but direct mapped. The size of an address is 32-bit in both cases. A 2 to 1 multiplexer has latency of 0.6 ns while a k-bit comparator has a latency of  $k/10$  ns. The hit latency of the set-associative organization is  $h_1$ , while that of the direct mapped one is  $h_2$ .

**Q.8** The value of  $h_1$  is

- (a) 2.4 ns      (b) 2.3 ns  
 (c) 1.8 ns      (d) 1.7 ns

**Q.9** The value of  $h_2$  is



**Q.10** Consider a 4-way set associative cache consisting of 128 lines with a line size of 64 words. The CPU generates a 20-bit address of a word in main memory. The number of bits in the tag, Line and word fields are respectively,



[GATE 2008]

**Q.11** For inclusion to hold between two cache levels  $L_1$  and  $L_2$  in a multilevel cache hierarchy, which of the following are necessary

1.  $L_1$  must be a write through cache
  2.  $L_2$  must be write through cache
  3. The associativity of  $L_2$  must be greater than that of  $L_1$
  4. The  $L_2$  cache must be atleast as large as the  $L_1$  cache

(a) 4 only                          (b) 1 and 4 only  
(c) 1, 2 and 4                      (d) All of these

### Common Data For Q.12 & Q.13

Consider a machine a 2-way set associative data cache of size 64 kB and block size 16 bytes. The cache is managed using 32-bit virtual addressed and the page size is 4 kB. A program to be run on this machine begins as follows:

Double APR [1024][1024]

```

int i, j;
/* initialize array APR to 0.0*/
for (i = 0; i < 1024; i++)
for (j = 0; j < 1024; j++)
APR [i] [j] = 0.0;

```

The size of double 8 bytes. Array APR is in memory starting at the beginning of virtual page OX ff000 and stored in row major order. The cache is initially empty and no prefetching is done. The only data memory references made by the program are those to array APR.

**Q.12** The total size of the tags in the cache directly is

- (a) 32 Kbits                          (b) 34 Kbits  
(c) 64 Kbits                          (d) 68 Kbits

[GATE 2007]

**Q.13** Which of the following array elements has the same cache index as APR[0][0]

- (a) APR [0][4]      (b) APR[4][0]  
(c) APR[0][5]      (d) APR[5][0]

[GATE 2007]

**Q.14** The cache hit ratio for this initialization loop is



[GATE 2007]

**Q.15** In a 2 level memory, if the level 1 memory is a 5 times faster than level 2 and its access time is 10 ns less than avg access time. Let the level 1 memory access time is 20 ns. What is the hit ratio.



### Common Data For Q.16 to Q.18

Suppose that in 1000 memory references there are 40 misses in the first level cache and 20 misses in the second level cache. Assume miss penalty from the L<sub>2</sub> cache to memory is 100 cycles the hit time of the L<sub>2</sub> cache is 10 clock cycles. The hit time of the L<sub>1</sub> cache is 1 clock cycle.

**Q.16** What are global and local miss rates of level 1 and level 2 caches respectively

- (a)  $L_1 = 0.04$  and  $0.04$   
 $L_2 = 0.04$  and  $0.02$
- (b)  $L_1 = 0.04$  and  $0.04$   
 $L_2 = 0.02$  and  $0.5$
- (c)  $L_1 = 0.02$  and  $0.5$   
 $L_2 = 0.04$  and  $0.2$
- (d)  $L_1 = 0.05$  and  $0.5$   
 $L_2 = 0.04$  and  $0.02$

**Q.17** What is the average memory access time?

- (a) 3.4 cycles      (b) 3.2 cycles
- (c) 3.1 cycles      (d) 3.5 cycles

**Q.18** If there are 1.5 memory references per instruction. What is the average stall cycles per instruction

- (a) 3.4 cycles      (b) 3.5 cycles
- (c) 3.2 cycles      (d) 3.6 cycles

**Q.19** The cache line is 64 bytes. The main memory has latency 32 ns and bandwidth 1 GB/Sec. The time required to fetch the entire cache line from the main memory is

- (a) 32 ns      (b) 64 ns
- (c) 96 ns      (d) 128 ns

#### Common Data For Q.20 & Q.21

A computer system has an  $L_1$  cache, and  $L_2$  cache, and a main memory unit connected as shown below.

The block size in  $L_1$  cache is 4 words.

The block size in  $L_2$  cache is 16 words.

The memory access times are 2 ns, 20 ns and 200 ns for  $L_1$  cache,  $L_2$  cache and main memory unit respectively.



**Q.20** When there is a miss in  $L_1$  cache and a hit in  $L_2$  cache, a block is transferred from  $L_2$  cache to  $L_1$  cache. What is the time taken for this transfer

- (a) 2 ns      (b) 20 ns
- (c) 22 ns      (d) 88 ns

[GATE 2010]

**Q.21** When there is a miss in both  $L_1$  and  $L_2$  cache, first a block is transferred from main memory to  $L_2$  cache and then a block is transferred from  $L_2$  cache to  $L_1$  cache. What is the total time taken for these transfers

- (a) 222 ns      (b) 888 ns
- (c) 902 ns      (d) 968 ns

[GATE 2010]

**Q.22** Consider a 4-way set associative cache (initially empty) with total 16 cache lines. The main memory consists of 256 blocks and the request for memory blocks is in following order:

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155

Which one of the following memory block will NOT be in cache if LRU replacement policy is used

- (a) 3      (b) 8
- (c) 129      (d) 216

[GATE 2009]

**Q.23** Consider a 2-way set associative cache memory with 4 sets and total 8 cache blocks (0-7) and a main memory with 128 blocks (0-127).

What memory blocks will be present in the cache after the following sequence of memory block references if LRU policy is used for cache block replacement. Assuming that initially the cache did not have any memory block from the current job

0, 5, 3, 9, 7, 0, 16, 55

- (a) 0, 3, 5, 7, 16, 55      (b) 0, 3, 5, 7, 9, 16, 55
- (c) 0, 5, 7, 9, 16, 55      (d) 3, 5, 7, 9, 16, 55

**Q.24** Consider a small two way set associative cache memory, consisting of 4 blocks. For choosing the block to be replaced, use the least recently used (LRU) scheme. The number of cache misses for the following sequence of block addresses is

8, 12, 0, 12, 8

- (a) 2      (b) 3
- (c) 4      (d) 5

### Common Data For Q.25 & Q.26

Consider a computer system that has a cache with 512 blocks each of which can store 32 bytes. All addresses are byte addresses.

Q.25 Which cache set will the memory address 0x FBFC map to if the cache is direct mapped

- (a) 1 DF
- (b) 1 CF
- (c) DF 1
- (d) DAC

Q.26 Which cache set will the memory address 0x FBFC map to if the cache is 8-way set associative

- (a) 1 D
- (b) 1 F
- (c) 2 F
- (d) 3 E

Q.27 Which of the following is true in case of lock up free cache?

1. A cache structure allows a CPU to access a cache while a hit is being serviced.
  2. A cache structure allows a CPU to access a cache while a miss is being serviced.
  3. A cache that can support multiple outstanding misses.
  4. A cache that can support multiple outstanding hits.
- (a) 1 and 4
  - (b) 2 and 3
  - (c) 1 only
  - (d) 2 only

Q.28 A hierarchical cache memory subsystem has a cache access time of 50 ns, the main storage access time of 500 ns. With the read hit ratio of 0.9. The average access time of the system considering only memory read cycle in the write through scheme will be

- (a) 10 ns
- (b) 100 ns
- (c) 50 ns
- (d) 500 ns

Q.29 Consider a memory system that uses a 20 bit address to address at the byte level and a cache that uses a 256 byte line size. Assume a direct map cache with tag field of 6-bit in the address.

The number of blocks in main memory and the number of lines in cache respectively are

- (a) 1024, 128
- (b) 4096, 64
- (c) 4096, 128
- (d) 8192, 64

Q.30 In set associative mapping the cache is divided into V sets, each of which consists of K lines. The required relationship are

$$m = V * K \text{ and } i = j \bmod V.$$

where

- (a)  $i$  = cache line number,  $j$  = main memory block number;  $m$  = number of lines in the cache
- (b)  $i$  = main memory block number  
 $j$  = cache set number  
 $m$  = number of blocks in main memory
- (c)  $i$  = cache set number  
 $j$  = main memory block number  
 $m$  = number of lines in the cache
- (d)  $i$  = cache set number  
 $j$  = cache line number  
 $m$  = number of set in the cache

Q.31 In the NONIX operating system the time required for various file read operations are given below:

Disk seek time : 25 msec

Disk latency time : 8 msec

Disk transfer time : 1 msec/K byte

Operating system overhead : 1 msec/1 K byte + 10 msec

What is the time required to retrieve a block of 2 K bytes?

- (a) 45 msec
- (b) 47 msec
- (c) 90 msec
- (d) 94 msec

Q.32 Consider a 4 block cache with the following main memory block references.

4, 5, 7, 12, 4, 5, 13, 4, 5, 7

Assume cache is initially emptying identify the hit ratio. Using FIFO, LRU, directed mapping and 2-way set associative with LRU respectively.

- (a) 0.2, 0.4, 0.3 and 0.4
- (b) 0.4, 0.2, 0.4 and 0.3
- (c) 0.5, 0.2, 0.4 and 0.6
- (d) 0.7, 0.8, 0.3 and 0.4

### Common Data For Q.33 & Q.34

Consider a cache memory having hit ratios for read and write operations are 80% and 90% respectively. If there is miss (either for read or for write) entire 2

word block is to be through from main memory to cache. Let there are 30% updations. The cache access time is 20 ns/word main memory access time is 100 ns/word.

**Q.33** What is the efficiency of the cache using the write through scheme?

- (a) 13.8 million words/sec
- (b) 12.8 million words/sec
- (c) 14.8 million words/sec
- (d) 11.8 million words/sec

**Q.34** What is the efficiency of the cache using the write back scheme?

- (a) 15.5 million words/sec
- (b) 16.5 million words/sec
- (c) 14.5 million words/sec
- (d) 13.5 million words/sec

### Numerical Data Type Questions

**Q.35** An 8 kB direct mapped write-back cache is organized as multiple blocks each of size 32 B. The processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block comprising of the following.

- 1 Valid bit
- 1 modified bit

As many bits as the minimum needed to identify the memory block mapped in the cache. What is the total size of memory (in bits) needed at the cache controller to store meta data (tags) for the cache.

[GATE 2011]

**Q.36** Amount of time required to access  $L_1$  cache is 4 ns having hit ratio 80% and main memory access time is 1 msec. What is average access time (in ns)?

**Q.37** Let cache of 0.7 hit having access time 9 times faster than that of memory. If average access time increase 40% from 80 ns. What would be the new hit ratio?

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

**Q.38** A CPU has cache with block size 64 B. The main memory has K banks, each bank being C-bytes wide. Consecutive C-byte chunks are mapped on consecutive banks with wraparound. All the K-banks can be accessed in parallel, but to access to the same bank must be serialized. A cache block access may involve multiple iterations of parallel bank accesses depending on the amount of data obtained by accessing all the K-banks in parallel. Each iteration requires decoding the bank numbers to be accessed in parallel and this takes  $K/2$  ns. The latency of one bank access is 80 ns.

If  $C = 2$  and  $K = 24$ , then latency of retrieving a cache block starting at address zero from main memory is \_\_\_\_\_ ns.

[GATE 2006]

### Common Data For Q.39 & Q.40

A CPU has a 32 kB direct mapped cache with 128 B block size. Suppose A is a two dimensional array of size  $512 \times 512$  with elements that occupy 8 bytes each. Consider the following two C-CODE segments

$P_1$  and  $P_2$

```
 $P_1 :$  for ( $i = 0$ ;  $i < 512$ ;  $i++$ )
{
    for ( $j = 0$ ;  $j < 512$ ;  $j++$ )
    {
         $x += A[i][j]$ 
    }
}
 $P_2 :$  for ( $i = 0$ ;  $i < 512$ ;  $i++$ )
{
    for ( $j = 0$ ;  $j < 512$ ;  $j++$ )
    {
         $x += A[j][i];$ 
    }
}
```

$P_1$  and  $P_2$  are executed independently with the same initial state, namely the array A is not in the cache and i, j, x are in registers. Let the number of cache miss experienced by  $P_1$  be  $M_1$  and that for  $P_2$  be  $M_2$ .

**Q.39** The

**Q.40** The

Common

Consider

memory

data ca

used in t

of bytes

memory

is initia

twice. A

not of c

Q.41

Q.42

Q.43

Q.44

T

Q 39 The value of  $M_1$  is \_\_\_\_\_.

[GATE 2008]

Q.40 The value of the ratio  $M_1/M_2$  is \_\_\_\_\_.

### Common Data For Q.41 & Q.42

Consider a machine with a byte addressable main memory of  $2^{16}$  bytes. Assume that a direct mapped data cache consisting of 32 lines of 64 bytes. Each is used in the system. A  $50 \times 50$  two dimensional array of bytes is stored in the main memory starting from memory location 1100 H. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses.

**Q.41** How many data cache misses will occur in total?

[GATE 2006]

Q.42 Line \_\_\_\_\_ to Line \_\_\_\_\_ of the data cache will be replaced by new blocks in accessing the array.

[GATE 2006]

**Q.43** How many  $32\text{ K} \times 1$  RAM chips are needed to provide a memory capacity of  $256\text{ kB}$ ?

[GATE 2010]

**Q.44** In a certain system the main memory access time is 100 ns. The cache is 10 times faster than the main memory and uses the write through protocol. If the hit ratio for read request is 0.92 and 85% of the memory requests generated by the CPU are for read, the remaining being for write; then the average time considering both read and write requests is \_\_\_\_\_ ns.



## Try Yourself

- T1. An access sequence of cache block address of length  $N$  and contains  $n$  unique block addresses. The number of unique block addresses between two consecutive accesses to the same block address is bounded above by  $k$ .

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission

What is the miss ratio if the access sequence is passed through a cache of associativity  $A \geq k$  exercising least-recently used replacement policy?

- (a)  $n/N$       (b)  $1/N$   
 (c)  $1/A$       (d)  $k/n$

[GATE-2014, Ans: (a)]

- T2. A 4-way set-associative cache memory unit with a capacity of 16 KB is built using a block size of 8 words. The word length is 32 bits. The size of the physical address space is 4 GB. The number of bits for the TAG field is \_\_\_\_\_.

[GATE-2014, Ans: (20)]

- T3. In designing a computer's cache system, the cache block (or cache line) size is an important parameter. Which one of the following statements is correct in this context?

- (a) A smaller block size implies better spatial locality
  - (b) A smaller block size implies a smaller cache tag and hence lower cache tag overhead
  - (c) A smaller block size implies a larger cache tag and hence lower cache hit time
  - (d) A smaller block size incurs a lower cache miss penalty

[GATE-2014, Ans: (d)]

- T4. If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected?

- (a) Width of tag comparator
  - (b) Width of set index decoder
  - (c) Width of way selection multiplexor
  - (d) Width of processor to main memory data bus

[GATE-2014, Ans: (d)]

- T5. Consider a main memory system that consists of 8 memory modules attached to the system bus, which is one word wide. When a write request is made, the bus is occupied for 100

nanoseconds (ns) by the data, address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module executes one cycle accepting and storing the data. The (internal) operation of different memory modules may overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one word each) that can be initiated in 1 millisecond is \_\_\_\_\_.

[GATE-2014, Ans: (10000)]

- T6. The memory access time is 1 nanosecond for a read operation with a hit in cache, 5 nanoseconds for a read operation with a miss in cache, 2 nanoseconds for a write operation with a hit in cache and 10 nanoseconds for a write operation with a miss in cache. Execution of a sequence of instructions involves 100 instruction fetch operations, 60 memory operand read operations and 40 memory operand write operations.

The cache hit-ratio is 0.9. The average memory access time (in nanoseconds) in executing the sequence of instructions is \_\_\_\_\_.

[GATE-2014, Ans: (1.68)]

- T7. Assume that for a certain processor, a read request takes 50 nanoseconds on a cache miss and 5 nanoseconds on a cache hit. Suppose while running a program, it was observed that 80% of the processor's read requests result in a cache hit. The average read access time in nanoseconds is

[GATE-2015 Ans. (14)]

- T8. Consider a machine with a byte addressable main memory of  $2^{20}$  bytes, block size of 16 bytes and a direct mapped cache having  $2^{12}$  cache lines. Let the addresses of two consecutive bytes in main memory be  $(E201F)_{16}$  and  $(E2020)_{16}$ .

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without written consent.

What are the tag and cache line address (in hex) for main memory address  $(E201F)_{16}$ ?



[GATE-2015, Ans: (a)]

- T9. Consider the direct mapped cache organization consists of m-lines with a line size of  $2^w$  words/bytes. Main memory address can be viewed as consisting of three fields. The least significant w-bits identify a unique word within the block of main memory. The remaining 'S' bits specify one of the  $2^S$  block of main memory. Assume that cache is initially empty. Main memory blocks are referenced by the CPU in a sequential order.

Which one of the following sequence of the blocks are mapped on the cache lines in sequential order from the initial line respectively.

- (a)  $2m + 1, 2m + 2, \dots, 3m - 1$
  - (b)  $2^S - m, 2^S - m+1, \dots, 2^S - 1$
  - (c)  $1, 2, 3, \dots, m - 1$
  - (d)  $0, 2, 4, \dots, (m - 1)$

[Ans: (b)]

- T10. A two way set associative cache has lines of 16 byte and a total size of 8 K bytes. The 256 M byte main memory is byte addressable. Which one of the following main memory block is mapped on to the set '0' of the cache memory.

- (a)  $(CFED09B)_{16}$       (b)  $(FCED90C)_{16}$   
 (c)  $(CFED00B)_{16}$       (d)  $(FECD10C)_{16}$

[Ans: (c)]

- T11. Consider a machine with a byte addressable main memory of  $2^{16}$  bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used with this machine. Into what lines would bytes with each of the following addresses be stored respectively.

|                                       |      |      |      |
|---------------------------------------|------|------|------|
| 0001                                  | 0001 | 1001 | 1011 |
| 1100                                  | 0011 | 0010 | 0100 |
| 1010                                  | 1010 | 0110 | 1010 |
| (a) $(4)_{10}, (19)_{10}, (13)_{10}$  |      |      |      |
| (b) $(19)_{10}, (4)_{10}, (13)_{10}$  |      |      |      |
| (c) $(F)_{16}, (IC)_{16}, (A)_{16}$   |      |      |      |
| (d) $(11101)_2, (01011)_2, (01110)_2$ |      |      |      |

[Ans: (b)]

- T12. A computer has a 256 K bytes, 8 way set associative write back data cache with block size of 32 bytes. The processor sends 40 bit address to the cache controller each cache tag directory entry contains in addition to address tag 4 valid bits, 1 modified bits and 2 replacement bits. What is the size of the cache tag directory.

(a) 64 K bits      (b) 256 K bits  
 (c) 32 K bits      (d) 68 K bits

[Ans: (b)]

- T13. Hit ratio of the cache memory read request is 85% and the cache memory is 5 times faster than main memory. Block size in memory organization is words. The access time of the main memory is 72 ns per word. Write through protocol is used in the system. CPU generates 60% of the read requests to read the data and the remaining for write operation.

What is the average access time of the memory when considering both read and write operations.

- (a) 89.28 ns      (b) 62.2 ns  
 (c) 42.624 ns      (d) 72 ns

[Ans: (c)]

- T14. A file system uses an in-memory cache to cache disk blocks. The miss rate of the cache is shown in the figure. The latency to read a block from the cache is 1 ms and to read a block from the disk is 10 ms. Assume that the cost of checking whether a block exists in the cache is negligible. Available cache sizes are in multiples of 10 MB.



The smallest cache size required to ensure an average read latency of less than 6 ms is \_\_\_\_\_ MB.

[GATE-2016, Ans: (30)]



## 5

# Input-Output Interface



## Multiple Choice Questions

- Q.1** A certain device dumps data into its interface register every 200 ns. The main memory access time is 50 ns. If the CPU were interfaced to this device in cycle stealing mode. What percentage of time does the CPU be in the hold state.
- 20
  - 25
  - 50
  - None of these

- Q.2** A certain microprocessor requires 4.5 microseconds to respond to an interrupt. Assuming that the three interrupts  $I_1$ ,  $I_2$  and  $I_3$  require the following execution time after the interrupt is recognized.
- $I_1$  requires 25  $\mu$ sec
  - $I_2$  requires 35  $\mu$ sec
  - $I_3$  requires 20  $\mu$ sec

$I_1$  has the highest priority and  $I_3$  has the lowest. What is the possible range of time for  $I_3$  to be executed assuming that it may or may not occur simultaneously with other interrupts.

- 24.5  $\mu$ sec to 39.5  $\mu$ sec
- 24.5  $\mu$ sec to 93.5  $\mu$ sec
- 4.5  $\mu$ sec to 24.5  $\mu$ sec
- 29.5  $\mu$ sec to 93.5  $\mu$ sec

[ISRO 2009]

- Q.3** Which one of the following is true for a CPU having single interrupt request line and a single interrupt grant line.
- Neither vectored interrupt nor multiple interrupting devices are possible
  - Vectored interrupts are not possible but multiple interrupting devices are possible

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

- Vectored interrupts and multiple interrupting devices are both possible
- Vectored interrupts is possible but multiple interrupting devices are not possible

[GATE 2005]

- Q.4** Consider a device of 1 MBPS is operating in a cycle stealing mode of DMA. Whenever 16 B word is available it is transferred into memory in 4  $\mu$ sec. What is the % of time the processor is blocked due to DMA
- 10%
  - 20%
  - 80%
  - 90%

- Q.5** A device employing INTA line for device interrupt puts the CALL instruction on the data bus while
- $\overline{\text{INTA}}$  is active
  - HOLD is active
  - Ready is active
  - None of these

[GATE 2002]

- Q.6** On receiving an interrupt from an IO device, the CPU
- Halts for a predetermined time
  - Branches off to the interrupt service routine after completion of the current instruction
  - Branches off to the interrupt service routine immediately
  - Hands over control of address bus and data bus to the interrupting device

[ISRO 2009]

- Q.7** In a n-CPU shared bus system, if  $z$  is the probability that any CPU request the bus in a given cycle, the probability that only one CPU uses the bus is given by
- $nz(1-z)^{n-1}$
  - $z(1-z)^{n-1}$
  - $n(1-z)^n$
  - $(n-1)z(1-z)^n$

[DRDO 2008]

MADE E

Q.8 A vari  
state  
state  
jump  
state  
state  
the v  
vari  
(a)  
(b)  
(c)  
(d)

Q.9 T

Q.1

- Q.8** A variable X has been assigned fresh values in statements numbered 6, 9 and 12 in a 25 statement program which does not have any jump instructions. This variable is used in statements numbered 7, 8, 10, 16 and 17. The statement ranges where the register, used by the variable X, could be assigned to some other variable are

  - (a) 8–9, 10–12, 17–25
  - (b) 11, 18–25
  - (c) 17–25
  - (d) None of these

[DRDO 2008]

- Q.9** The address bus of a computer has 16 address lines  $A_{15-0}$ . If the address assigned to one device is  $7CA4_{16}$  and the address decoder for that device ignores lines  $A_8$  and  $A_9$ . What are all the addresses to which this device will respond.

  - (a)  $7CA4_{16}, 7CA5_{16}, 7CA6_{16}$  and  $7CA7_{16}$
  - (b)  $7C84_{16}, 7C94_{16}, 7CA4_{16}$  and  $7CB4_{16}$
  - (c)  $7CA4_{16}, 7DA4_{16}, 7EA4_{16}$  and  $7FA4_{16}$
  - (d)  $7CA4_{16}, 6CA4_{16}, 5CA4_{16}$  and  $4CA4_{16}$

[DRDO 2009]



[DRDO 2008]

- 1.11 An assembly language application contains 1200 assembly language instructions. It takes

12 second to run on benchmark. The application programmer then works on the assembly language code to make it better after this the application takes 10 seconds to run. Calculate the speed up



- Q.12** Normally user programs are prevented from handling IO directly by IO instructions in them. For CPUs having explicit IO instructions, such protection is ensured by having the IO instructions privileged. In a CPU with memory mapped IO, there is no explicit IO instructions. Which one of the following is true for a CPU with memory mapped IO.

  - (a) IO protection is ensured by operating system routine
  - (b) IO protection is ensured by H/W trap
  - (c) IO protection is ensured during system configuration
  - (d) IO protection is not possible

[GATE 2005]

- Q.13** Consider a disk pack with 16 surfaces, 128 tracks per surface and 256 sector per track. 512 bytes of data are stored in a bit serial manner in a sector. The capacity of the disk pack and the number of bits required to specify a particular sector in the disk are respectively

(a) 256 MB, 19 bits (b) 256 MB, 28 bits  
(c) 512 MB, 20 bits (d) 64 GB, 28 bits

[GATE 2007]

### Common Data For Q.14 & Q.15

A digital computer has memory unit with 24 bits word. The instruction set consists of 150 different operations. All instructions have an operation code part and an address part. Each instruction is stored in one word of memory.

- Q.14** How many bits are needed for the OP-CODE and how many bit are left for the address of the instruction

  - (a) 4, 8
  - (b) 8, 16
  - (c) 8, 64
  - (d) 16, 64

- Q.15** What is the maximum available size for memory and the largest unsigned binary number that can be accommodated in one word of memory.

(a)  $2^{16}, 2^{24} + 1$       (b)  $2^{16}, 2^{24}$   
(c)  $2^{16}, 2^{24} - 1$       (d) None of these



- Q.17** A computer handles several interrupt sources of which the following are relevant for this question.

  1. Interrupt from CPU temperature sensor (raises interrupt if CPU temperature is too high)
  2. Interrupt from mouse (raises interrupt if the mouse is moved or a button is pressed)
  3. Interrupt from keyboard (raises interrupt if a key is pressed or released)
  4. Interrupt from hard disk (raises interrupt when a disk read is completed)

Which one of the these will be handled at the highest priority?

- (a) Interrupt from hard disk
  - (b) Interrupt from mouse
  - (c) Interrupt from keyboard
  - (d) Interrupt from CPU temperature sensor

[GATE 2011]

- Q.18** A processor that communicates with remote terminals over telephone and other communication media in a serial fashion is called

  - (a) IO processor
  - (b) DMA controller
  - (c) Data communication processor
  - (d) UART

- Q.19** Which of the following statements true about firewire

  1. The firewire interface is a high speed serial interface that transfer data in packets.

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.






[GATE 2008]

- Q.22** If a disk system has an average seek time of 30 ns and rotation rate of 360 RPM. Each track of the disk has 512 sectors each of size 512 Bytes. What is the time taken to read '4' successive sectors, also compute the effective Data transfer rate?

  - (a) 0.0843 sec, 1536 kbps
  - (b) 0.123 sec, 1436 kbps
  - (c) 0.156 sec, 1326 kbps
  - (d) 0.135 sec, 1252 kbps

Common Data for Q. 23 and Q. 24

Consider a disk system with the following specifications.

Number of surfaces = 8, outer diameter = 12 cm. Inner diameter = 4 cm, number of sectors per track is 20. Sector size = 4 KB, inter track distance = 0.2 mm.

- Q.23** What is the approximate total capacity of the disk?

  - (a) 6.4 MB
  - (b) 128 MB
  - (c) 228 MB
  - (d) 32 MB

**Q.24** What is the total number of sectors per surface on the disk?

- (a) 200
- (b) 2000
- (c) 4000
- (d) 1600

**Q.25** If a Disk system has an average seek time of 60 ns and rotation rate of 3600 RPM. Each track of the disk has 256 sectors each of size 2 KB. What is the approximate time required to read 1200 random sectors, and also compute, what is the data transfer rate?

- (a) 20 sec, 60 mbps
- (b) 10 sec, 30 mbps
- (c) 15 sec, 50 mbps
- (d) 40 sec, 28 mbps

**Q.26** Consider a disk pack with 16 surfaces, 128 tracks per surface and 256 sectors per track. 512 bytes of data are stored in a bit serial manner in a sector. The capacity of the disk pack and the number of bits required to specify a particular sector in the disk are respectively.

- (a) 256 MB, 19 bits
- (b) 256 MB, 28 bits
- (c) 512 MB, 20 bits
- (d) 64 GB, 28 bits.

**Q.27** Which of the following storage allocation is faster in general?

- (a) Best fit
- (b) First fit
- (c) Worst fit
- (d) All of these



### Numerical Data Type Questions

**Q.28** A hard disk with a transfer rate of 10 MBPS is constantly transferring data to memory using DMA. The processor runs at 600 MHz and takes 300 and 900 cycles to initiate and complete DMA transfer respectively. If the size of the transfer is 20 kB. What is the percentage of processor time consumed for the transfer operation.

[GATE 2004]

**Q.29** A device with data transfer rate 10 KBPS is connected to a CPU. Data is transferred byte-wise. Let the interrupt overhead be 4  $\mu$ sec. The

byte transfer time between the device interface register and CPU or memory is negligible. What is the minimum performance gain of operating the device under interrupt mode over operating it under program controlled mode

[GATE 2005]

**Q.30** The two numbers given below are multiplied using the Booth's algorithm

Multiplicand : 0101 1010 1110 1110

Multiplier : 0111 0111 1011 1101

How many additions/subtractions are required for multiplication of the above two numbers?

[ISRO 2009]

**Q.31** Consider a disk drive with the following specifications. 16 surfaces, 512 tracks/surface, 512 sectors/track 1 kB/sector, rotation speed 3000 rpm. The disk is operated in cycle stealing mode where by whenever one byte work is ready it is sent to memory, similarly for writing the disk interface reads a 4 byte word from the memory in each DMA cycle. Memory cycle time 40 nsec. The maximum percentage of time that the CPU gets blocked during DMA operation is \_\_\_\_\_.

[GATE 2005]

**Q.32** On a non pipelined sequential processor a program segment, which is a part of the interrupt service routine is given to transfer 500 bytes from an IO device to memory.

Initialize the address register

Initialize the count to 500

loop: Load a byte from devices

Store in memory at address given by address register

Increment the address register

Decrement the count

If count! = 0 go to loop

Assume that each statement in this program is equivalent to a machine instruction which takes one cycle to execute if it is a non load-store instruction. The load store instruction takes two cycles to execute.

The designer of the system also has an alternate approach of using the DMA controller to

implement the same transfer. The DMA controller require 20 cycles for initialization and other overheads. Each DMA transfer cycle takes 2 cycles to transfer one byte of data from the device to the memory. What is the approximate speed up when the DMA controller based design is used in place of the interrupt driven program based input-output.

[GATE 2011]

**Try Yourself**

- T1. Which one of the following is true.
- In the programmed IO mode processor utilization is efficient.
  - In the interrupt driven IO mode processor undergoes waiting until completion of the IO operation. So processor utilization is poor.
  - In the DMA-mode bulk amount of data will be transferred from IO to main memory without involvement of the CPU.
  - Buffer is a non programmable interface in the CPU design.
- (a) (i), (ii) and (iii)      (b) (ii) and (iii)  
 (c) (iii) and (iv)      (d) (ii), (iii) and (iv)
- [Ans: (c)]

- T2. For the daisy chain scene of connecting I/O devices, which of the following statements is true?
- It gives non-uniform priority to various devices
  - It gives uniform priority to all devices
  - It is only useful for connecting slow devices to a processor device
  - It requires a separate interrupt pin on the processor for each device
- [Ans: (a)]

- T3. I/O redirection
- Implies changing the name of a file
  - Can be employed to use an existing file as input file for a program

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

- T4. (c) Implies connection of 2 programs through a pipe  
 (d) None of the above

[Ans: (c)]

- T4. Consider the following statements.  
 S<sub>1</sub>: Change the program counter  
 S<sub>2</sub>: Change the page table register content.  
 S<sub>3</sub>: Disable interrupts  
 S<sub>4</sub>: Initiate I/O operation on a disk using memory mapped I/O.

Which of the above statements are only executed in system mode.

- S<sub>1</sub>, S<sub>3</sub> and S<sub>4</sub>
- S<sub>2</sub>, S<sub>3</sub> and S<sub>4</sub>
- S<sub>1</sub>, S<sub>2</sub> and S<sub>3</sub>
- S<sub>2</sub>, S<sub>4</sub> and S<sub>1</sub>

[Ans: (b)]

- T5. In cycle stealing mode, the DMA gets control over \_\_\_\_\_ buses from CPU.
- Data bus and Address bus
  - Address bus and Control bus
  - Control bus and Data bus
  - None of the above

[Ans: (a)]

- T6. Suppose DMA is employed in cycle stealing mode and transfer of bus control from CPU to I/O or I/O to CPU takes 300 ns. Assume the system in which bus cycle takes 700 ns, and one of the I/O device has a data transfer rate of 50 KB/sec and employs DMA.

Data are transferred one byte at a time. Find the time taken for the I/O device to transfer a block of 100 bytes? (Approx.)

- 100 µs
- 110 µs
- 120 µs
- 130 µs

[Ans: (d)]

- T7. DMA is transferring characters to the processor, from a device transmitting at 16000 bits per second. Assume DMA is using cycle stealing. If the processor needs access to main memory once every microsecond, find how much percentage will the processor be slowed down due to DMA activity?

- (a) 0.20      (b)  
 (c) 0.30      (d)

- T8. Assume that interrupt driven I/O for data an average of 8 basis. Find percentage consumed by the for every byte?  
 (a) 20  
 (c) 60

- T9. Suppose DMA is transfer of bus control. CPU takes 300 ns bus cycle takes 700 ns. The device has a data transfer rate of 50 KB/sec and employs DMA. Find the time taken to transfer a block of 100 bytes.  
 (a) 1 msec  
 (c) 3 msec

- T10. One way to prevent a disk from being removed. Suppose a file requires writing the file, followed by writing the header. Assume a delay of 4 ms and an access time of How long does the memory location take?  
 (a) 109 ms  
 (c) 218 ms

[Ans: (b)]



- (b) 40
- (d) 80

[Ans: (c)]

- T9. Suppose DMA is employed in burst mode and transfer of bus control from CPU to I/O or I/O to CPU takes 300 ns. Assume the system in which bus cycle takes 700 ns, and one of the I/O device has a data transfer rate of 50 KB/sec and employs DMA. Data are transferred one byte at a time.

Find the time taken for the I/O device to transfer a block of 100 bytes? (Approx.)

- (a) 1 msec                  (b) 2 msec  
(c) 3 msec                  (d) None of these

[Ans: (b)]

- T10. One way to preserve contiguous allocation of the disk is to do compaction every time a file is removed. Since all files are contiguous, copying a file requires a seek and rotation delay to read the file, followed by the transfer at full speed. Writing the file back requires the same work. Assume a seek time of 5 m sec. a rotational delay of 4 msec, a transfer rate of 10 MB/sec and an average file size of 1 MB.

How long does it take to read a file into main memory than write it back to the disk in a new location?



[Ans: (\*)]

- T11. An application loads 100 libraries at start-up loading each library requires exactly one disk access. The seek time of the disk to a random location is given as 10 ms. Rotational speed of disk is 6000 rpm. If all 100 libraries are loaded from random locations on the disk, how long does it take to load all libraries?

(The time to transfer data from the disk block once the head has been positioned at the start of block may be neglected).

- (a) 0.50 sec      (b) 1.50 sec  
 (c) 1.25 sec      (d) 1.00 sec

[GATE-2011]

- T12. A hard disk system has the following parameters. Number of tracks = 500  
Number of sectors/track = 100  
Number of bytes/sector = 500  
Time taken by the head to move from one track to adjacent track = 1 m sec  
Rotation speed = 600 rpm.

What is the average time taken for transferring 250 bytes from the disk?

- (a) 300.5 ms      (b) 255.5 ms  
(c) 255 ms      (d) 300 ms

[GATE-IT-2007]

- T13. Consider a disk pack with time 4 ms and rotational speed of 10000 rpm. It has 600 sectors per track and each sector can store 512 bytes of data. Consider a file stored in the disk. The file contains 2000 sectors. Assume that every sector access necessitates a seek and the average rotational latency for accessing each sector is half fo the time for one complete rotation. The total time needed to read the entire file is \_\_\_\_\_ (in ms).

[Ans: (34000)]

- T14. The width of the physical address on a machine is 40 bits. The width of the tag field in a 512 KB 8-way set associative cache is \_\_\_\_\_ bits.

[GATE-2016, Ans: (21)]



## 6

## Floating Point Number System (IEEE)



### Multiple Choice Questions

- Q.1** In the IEEE floating point representations the hexadecimal value  $0 \times 00000000$  corresponds to  
 (a) the normalized value  $2^{-127}$   
 (b) the normalized value  $2^{-126}$   
 (c) the normalized value +0  
 (d) the special value +0
- Q.2** In IEEE floating point representations, all the exponent bits are 1 and mantissa bits are non-zero. This represent  
 (a) 0 (b) infinity  
 (c) denormalized value (d) error
- Q.3** What is the result of evaluating the following two expression using three-digit floating point arithmetic with rounding  
 $(113. + -111.) + 7.51$   
 $113. + (-111. + 7.51)$   
 (a) 9.51 and 10.0 respectively  
 (b) 10.0 and 9.51 respectively  
 (c) 9.51 and 9.51 respectively  
 (d) 10.0 and 10.0 respectively

#### Common Data For Q.4 & Q.5

Consider the following floating point format.

| 30 | 31 | 23 | 0 |
|----|----|----|---|
| S  | E  | M  |   |

Bias = 31

Assume only 0 are padded while shifting a field.

- Q.4** The decimal number  $0.15625 \times 2^5$  has a following hexadecimal representation without normalization

© Copyright Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

- (a) 24 28 00 00 (b) 24 10 00 00  
 (c) 21 28 00 00 (d) 21 10 00 00

- Q.5** The normalized representation of above number is hexadecimal format

- (a) 24 40 00 00 (b) 21 52 00 00  
 (c) 21 40 00 00 (d) 24 52 00 00

- Q.6** Find the decimal equivalent of the bit pattern 1 01111111010000000000000000000000 as per the IEEE 754 standard. In IEEE 754 a single bit precision binary floating point format is stored in 32 bits

|           |            |             |
|-----------|------------|-------------|
| S (1 bit) | E (8 bits) | M (23 bits) |
|-----------|------------|-------------|

Exponent is biased with 128.

- (a) 1.625 (b) -1.875  
 (c) -1.625 (d) .625

- Q.7** The following is a scheme for floating point number representation using 16 bits

| 15 | 14 | 9 8 | 0 |
|----|----|-----|---|
| S  | E  | M   |   |

Sign Exponent Mantissa

Let 'S', 'E' and 'M' be the number represented in binary in the sign exponent and mantissa field respectively. Then the floating point number is represented as

$$(-1)^s (1 + M \times 2^{-9}) 2^{E-31} ; \text{ if exponent } \neq 1111110 \\ ; \text{ otherwise}$$

What is maximum difference between two successive real number representation in this system

- (a)  $2^{-40}$  (b)  $2^{-9}$   
 (c)  $2^{22}$  (d)  $2^{31}$

- Q.8 Using Booth's algorithm for multiplication the multiplier -57 will be recoded as  
 (a) 0-100 100-1    (b) 11000111  
 (c) 0-1001000    (d) 0100-1001  
 [GATE 2005]



### Numerical Data Type Questions

Common Data For Q.9 & Q.10

Consider the following floating point format



Mantissa is a pure fraction in sign magnitude form.

- Q.9 The decimal number  $0.239 \times 2^{13}$  has the following hexadecimal representation without normalization and rounding off

- Q.10 The normalized representation of the above format is specified as follows. The mantissa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalized representation of above number is  $(0.239 \times 2^{13})$  is \_\_\_\_\_.

- Q.11 Consider the following floating point format



The value of floating value number in this system is

$$V = (-1)^5 \times 2^{E-127} \times 1.F$$

Find the value of the corresponding decimal value if the floating point stored is 3F800000?



### Try Yourself

- T1. The value of a *float* type variable is represented using the single-precision 32-bit floating point format of IEEE-754 standard that uses 1 bit for sign, 8 bits for biased exponent and 23 bits for mantissa. A *float* type variable *X* is assigned

© Copyright: Subject matter to MADE EASY Publications, New Delhi. No part of this book may be reproduced or utilised in any form without the written permission.

the decimal value of -14.25. The representation of *X* in hexadecimal notation is

- (a) C1640000H    (b) 416C0000H  
 (c) 41640000H    (d) C16C0000H

[GATE-2014, Ans: (a)]

- T2. If the decimal number is  $3.248 \times 10^4$  then its 32 bit equivalent floating number is

- (a) 

|   |          |                          |
|---|----------|--------------------------|
| 0 | 10001101 | 111110111000000000000000 |
|---|----------|--------------------------|

  
 (b) 

|   |          |                          |
|---|----------|--------------------------|
| 1 | 10001101 | 111110111000000000000000 |
|---|----------|--------------------------|

  
 (c) 

|   |          |                          |
|---|----------|--------------------------|
| 0 | 11010101 | 111110111000000000000000 |
|---|----------|--------------------------|

  
 (d) 

|   |          |                          |
|---|----------|--------------------------|
| 1 | 11010101 | 111110111000000000000000 |
|---|----------|--------------------------|

[Ans: (a)]

- T3. A computer uses 8 digit mantissa and 2 digit exponent. If  $a = 0.052$  and  $b = 28E+11$  then  $b+a-b$  will

- (a) result in an overflow error  
 (b) result in an underflow error  
 (c) be 0  
 (d) be  $5.28E+11$

[Ans: (c)]

- T4. A single precision floating point number contains the sequence of bits 10011110000000000000000000000000. Information is stored in the following left to right order: sign bit, exponent (bias 127) and mantissa.

Which of the following representation in decimal is equivalent

- (a)  $2^{31}(1 + 2^{-12})$   
 (b)  $(-1) \times 2^{31}(1 + 2^{-12})$   
 (c)  $(-1) \times 2^{-65}(1 + 2^{-10})$   
 (d)  $(-1) \times 2^{96}(1 + 2^{-11})$

[Ans: (c)]

- T5. Sign extension is a step in

- (a) floating point multiplication  
 (b) signed 16 bit integer addition  
 (c) arithmetic left shift  
 (d) converting a signed integer from one size to another

[Ans: (d)]