

### Assignment 5

Q1) Fill in the blanks

- In complex instructions, when the instruction needs to be translated into more than 4 micro-operations, then the decoder transfers the task to microcode Rom.
- The front module of Pentium 4 consists of front end branch predictor, trace cache, microcode Rom.
- The feature of Pentium 4 is cache technology.
- SPARC processor have the three modules IU, FPU & CU.
- Bus cycle cases write back occurs in burst bus cycle of Pentium processor.

Q2) Choose correct options.

- Which of the following processors have a 64-bit data bus?

Ans: a) Pentium

- Which of the following mode of processor is a characteristic of virtual memory?

Ans: b) Virtual mode

- In Superscalar architecture, which of the following

decides which instructions are to be issued concurrently at run time. If BCD = 0, then the operation is

Ans: d) Hardware.

- a) Which of the direction based prediction used to reduce branch penalty.

Ans: 2. Predict branch/jump instructions AND branch direction not taken.

- 5) Speculatively execute instructions along the predicted path.

Ans: a) 1235

(Q3) State whether the following statements are true or false.

- a) The speed of integer arithmetic of Pentium is increased to a large extent by 4-stage pipeline.

Ans: True

- b) In fetch-decode unit, the number of parallel decoders that accept the stream of fetched instructions and decode them is 2. False

- c) During the execution of instructions, if an instruction is executed, then next instruction is executed only

when the data is read by bus interface unit.  
True.

Q4) Name the following or define or design the following.

a) Write the Hyper threading technology and its use in Pentium IV.

Ans: Hyper Threading Technology is a technology that allows a single processor to operate like two separate processors to the OS and the application programs that use it.

- The technology enables a single <sup>physical</sup> processor to execute two or more separate code streams concurrently using shared execution resources.
- HT technology is one form of hardware multi-threading capability in IA-32 processor families.
- It differs from multi-processor capability using separate physically distinct packages with each physical processor package mated with a physical socket. HT technology provides hardware multi-threading capability with a single package by using shared execution resources in a processor core.
- It consists of two or more logical processors, each of which has its own IA-32 architectural state. Each logical processor consists of a full set of IA-32 data registers, segment registers, control

registers, debug registers and most of the MSRs.

(b) Write the branch prediction logic in Pentium IV.

Ans:



Branch Prediction Logic.

- If the branch is predicted to be taken, then the active queue is no longer used. Instead, the prefetcher starts fetching instructions from the branch address & stores them into the second queue which now becomes the active queue. This queue now starts feeding instructions into the two pipes.
- If branch is predicted to be not taken, then nothing changes, & the active queue remains active & instructions are fetched from the sequentially next locations.

(c) Draw & explain the Pentium IV - Net Burst micro architecture.

Ans:

Features of Net Burst are as follows:

- Rapid Execution Engine.
- Enhanced branch prediction

- Hyper pipelined technology
- New cache subsystem.
- Advanced Dynamic Execution.



Q5) Answer the following questions in brief (20 to 30 words)

(a) Justify which floating point pipeline stages used for Pentium processor.

Ans:

Prefetch - Identical to the integer prefetch stage  
 Instruction De-Code 1 - Identical to the integer D1 stage  
 Instruction De-Code 2 - Identical to the integer D2 stage.  
 Execution Stage (Ex) - Register read, memory.

read or memory write performed as required by the instruction (to access an operand).

FP Execution 1 stage - Information from register or memory is written into a FP register. Data is converted to floating point format before being loaded into the floating point unit.

FP Execution 2 stage - Floating point operation performed with floating-point unit.

~~Write FP 3 stage~~ - Floating-point operation performed within floating-point unit.

Error Reporting - If an error is detected, an error reporting stage is entered where the error is reported & the FPU status word is updated.

(e) Compare 8086, 80386, Pentium I, Pentium II, Pentium III

Ans:

|                    | Attribute | 8086    | 80386   | Pentium |
|--------------------|-----------|---------|---------|---------|
| 1. Processor size  | 8 16-bit  | 32-bit  | 32-bit  |         |
| 2. Data bus        | 16-bit    | 32-bit  | 64-bit  |         |
| 3. Memory banks    | 2 banks   | 4 banks | 8 banks |         |
| 4. Address bus     | 20bit     | 32-bit  | 32-bit  |         |
| 5. Memory size     | 1 MB      | 4 GB    | 4 GB    |         |
| 6. Pipeline stages | 2         | 3       | 5       |         |
| 7. ALU size        | 16-bit    | 32-bit  | 32-bit  |         |

c) Justify with your answer how flushing of pipeline problem is minimized in Pentium architecture.

- Ans:
1. Performance gain through pipelining can be reduced by the presence of program transfer instructions.
  2. They change the sequence causing all the instructions that entered the pipeline after program transfer instruction invalid.
  3. Suppose instruction 13 is a conditional jump to 15 at some other address (target address), then the instructions that entered after 13 is invalid and new sequence beginning with 150 need to be loaded in.
  4. This causes bubbles in pipeline, where no work is done as the pipeline stages are reloaded.
  5. To avoid this problem, the Pentium uses a scheme called Dynamic Branch Prediction.
  6. In this scheme, a prediction is made concerning the branch instruction currently in pipeline.
  7. Prediction will be either taken or not taken.
  8. If the prediction turns out to be true, the pipeline will not be flushed & no clock cycles will be lost. If the prediction turns out to be false, the pipeline is flushed & started over with the correct instruction.
  9. It results in a 3 cycle penalty if a branch is executed in the u-pipeline & 4 cycle penalty in v-pipeline.
  10. It is implemented using a 4-way set associative cache with 256 entries. This is referred to as the Branch Target Buffer (BTB).