

Q1. A processor  $X_1$  operating at 2 GHz has a standard 5-stage RISC instruction pipeline having a base CPI (cycles per instruction) of one without any pipeline hazards. For a given program P that has 30% branch instructions, control hazards incur 2 cycles stall for every branch. A new version of the processor  $X_2$  operating at same clock frequency has an additional branch predictor unit (BPU) that completely eliminates stalls for correctly predicted branches. There is neither any savings nor any additional stalls for wrong predictions. There are no structural hazards and data hazards for  $X_1$  and  $X_2$ . If the BPU has a prediction accuracy of 90%, the speed up (rounded off to two decimal places) obtained by  $X_2$  over  $X_1$  in executing P is \_\_\_\_\_.

Ans: 1.4

Q2: A five-stage pipeline has stage delays of 150, 120, 150, 160 and 140 nanoseconds. The registers that are used between the pipeline stages have a delay of 5 nanoseconds each. The total time to execute 100 independent instructions on this pipeline, assuming there are no pipeline stalls, is \_\_\_\_\_ nanoseconds.

Ans: **17160**

Q3. Consider a non-pipelined processor operating at 2.5 GHz. It takes 5 clock cycles to complete an instruction. You are going to make a 5-stage pipeline out of this processor. Overheads associated with pipelining force you to operate the pipelined processor at 2 GHz. In a given program, assume that 30% are memory instructions, 60% are ALU instructions and the rest are branch instructions. 5% of the memory instructions cause stalls of 50 clock cycles each due to cache misses and 50% of the branch instructions cause stalls of 2 cycles each. Assume that there are no stalls associated with the execution of ALU instructions. For this program, the speedup achieved by the pipelined processor over the non-pipelined processor (round off to 2 decimal places) is \_\_\_\_\_.

Ans: **2.16**

Q4. The instruction pipeline of a RISC processor has the following stages: Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.

The number of clock cycles required for completion of execution of the sequence of instructions is \_\_\_\_\_.

Ans: 219

Q5. Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is \_\_\_\_\_.

Ans: 3.2

Q6. An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF) instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID,RF1,RF2) each of latency 2.2/3 ns, Also, the EX stage is split into two stages (EX1,EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are P and Q nanoseconds, respectively. The value of P/Q is \_\_\_\_\_.

Ans: 1.54

Q7. Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is\_\_\_\_\_.

Ans:4

Q8. Consider an instruction pipeline with four stages (S1,S2,S3, and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure.



What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

Ans: 2.5

Q9. Consider a pipelined processor with the following four stages

IF: Instruction Fetch

ID: Instruction Decode and Operand Fetch

EX: Execute

WB: Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction.

The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

|     |     |     |    |                         |
|-----|-----|-----|----|-------------------------|
| ADD | R2, | R1, | R0 | $R2 \leftarrow R1 + R0$ |
| MUL | R4, | R3, | R2 | $R4 \leftarrow R3 * R2$ |
| SUB | R6, | R5, | R4 | $R6 \leftarrow R5 - R4$ |

Ans: 8

Q10 A CPU has five stages pipeline and runs at 1 GHz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instruction following a conditional branch until the branch outcome is known. A program executes 109 instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, then total execution time of the program is\_\_\_\_\_

Ans: 1.4

Q11: A 5 stage pipelined CPU has the following sequence of stages IF-Instruction fetch from instruction memory, RD-Instruction decode and register read, EX-Execute: ALU operation for data and address computation, MA-Data memory access-for write access the register read and RD stage it used, WB-Register write back. Consider the following sequence of instructions:

I1:LR0,Loc1;R0<=M[Loc1]

I2:AR0,R0;R0<=R0+R0

I3:AR2,R0;R2<=R2-R0

Let each stage takes one clock cycle.

What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of I1?

Ans: 10

Q11: A 4-stage pipeline has the stage delays as 150,120,160 and 140 nano seconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds

each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be\_\_\_\_\_

Ans: 165.5 microsec

Q12: A RISC machine where length of each instruction is 4 bytes. Conditional and unconditional branch instructions are also used on that machine and these instructions use PC-relative addressing mode with Offset specified in bytes to the target location of the branch instruction.

It is important to note that the Offset is always with respect to the address of the next instruction in the program sequence.

Consider the following program consisting of 4 instructions

Instruction i: ADD R2,R3,R4

Instruction i+1: SUB R5,R6,R7

Instruction i+2: SEQ R1,R9,R10

Instruction i+3: BEQZ R1,Offset

Now If the target of the branch instruction is i, then what will be the decimal value of Offset will be?

**Answer:** -16

Q13: Consider a pipeline consist of 5 stages named as IF, ID, OF, EX and WB with the respective stage delays of 2 ns, 6 ns, 5 ns, 8 ns and 1 ns. The alternative pipeline 'y' contain the same number of stages but EX stage is divided into 2 substages, (EX1 and EX2) with equal delay i.e. (8 ns/2) and ID stage is divided into 3 substages (ID1, ID2 and ID3) with equal delays of (6 ns/3). In the pipeline x and y memory reference instructions are not overlapped so the penalty of memory reference instructions in the pipeline x is 4 cycles and in the pipeline 'y' is 8 cycles. If the program contain 20% of the instructions which are memory based instructions, what is the ratio of speed-up of x to speed-up of y?