

# **MICROPROCESSOR**

## **CSC 405**



**Subject Incharge**  
**Dakshata Panchal**  
Assistant Professor  
email: [dakshatapanchal@sfit.ac.in](mailto:dakshatapanchal@sfit.ac.in)



# **CSC 405 Microprocessor**

## **Module 5**

### Pentium Processor



# Contents as per syllabus

---

- Introduction
- Pentium Architecture
- Superscalar Operation
- Integer & Floating-Point Pipeline Stages
- Branch Prediction Logic
- Cache Organization
- MESI Model



# Salient features of Pentium Processor

---

- 1) It is a **32 bit Microprocessor**.
- 2) It has a **64 bit data bus**.
- 3) It has **8 memory banks**.
- 4) It has a **32 bit address bus**.
- 5) It can access **4 GB of physical memory**.
- 6) It has **5 Pipeline stages for integer operations**.
- 7) It has an **internal Floating point unit**.
- 8) It has an **8 stage Floating point Pipeline**.
- 9) It is **2 way superscalar**. This means it has two pipes called the u-pipe and the v-pipe.



# Salient features of Pentium Processor

---

- 10) It operates on **66 MHz – 99 MHz** frequency.
- 11) The **Integer Pipeline** stages are called:  
**PF – Prefetch; D1 – Decode; D2 – Address Translation;**  
**EX – Execute; WR – Write Back.**
- 12) It has **31,00,000 transistors**.
- 13) It was released in the year **1993**.
- 14) It has a protection mechanism with **4 privilege levels**.
- 15) It has **on-chip L1 Code cache and L1 Data Cache both 8 KB each**.
- 16) It has a **branch prediction logic** with a **256 entry Branch Target Buffer (BTB)**



# Pentium Architecture



# Pentium Architecture

---

Pentium has a 2 way superscalar architecture giving extremely superior performance.

It has two pipes called the u-pipe and the v-pipe. Each performs a 5-stage integer pipeline.

## Bus Unit

- 1) **The Bus unit is responsible for transferring data in and out of the  $\mu$ P.**
- 2) It is connected to the external memory and I/O devices, using the system bus.

## L1 Code Cache

- 1) Pentium has an on chip **8 KB L1 Code cache**. It is 2 – way set associative.
- 2) It contains the most recently used instructions.

## Prefetch Unit

- 1) It prefetches instructions from the L1 Code cache.
- 2) It has **two queues each of 32 bytes**.
- 3) One queue acts as the active queue, where as the other is used during branch prediction.



# Pentium Architecture

---

## Decode Unit

- 1) It decodes two instructions simultaneously for U and v pipes.
- 2) Simple instructions are decoded by the hardwired control unit.
- 3) Complex instructions are decoded by the micro programmed control unit.

## Integer Execution Unit

- 1) It can handle two integer instructions simultaneously.
- 2) This first one goes to u-pipe and the second to v-pipe.
- 3) There are address generation units for each pipe.
- 4) If the instruction uses memory operand the address generation unit generates physical address of the operand and fetches it from the **8 KB L1 Data Cache**.
- 5) There are two separate ALUs for U and V Pipes.
- 6) The **U-pipe ALU is equipped with a barrel shifter** and hence can handle complex arithmetic like MUL and DIV. Both ALUs are 32-bits each.
- 7) The integer unit uses 32-bit integer registers like EAX, EBX etc.



# Pentium Architecture

---

## Floating Point Unit

- 1) It performs Floating Point operations.
- 2) It uses **80-bit F.P. Registers**.
- 3) It has its own F.P. Control unit and independent circuits for F.P. arithmetic operations.

## Branch Prediction Logic

- 1) Pentium does branch prediction to minimize the pipeline penalty during branch operations.
- 2) It uses a **Branch Target Buffer with 256 entries**.
- 3) It gives history of previous branches and helps in **predicting the next branch instruction**.



# Integer Pipelining stages

---

Pentium performs integer instructions in a **five-stage pipeline**.

**PF** - **Prefetch**

**DI** - **Instruction Decode**

**D2** - **Address Generate**

**EX** - **Execute - ALU and Cache Access**

**WB** - **Write-Back**



# Integer Pipelining stages

---

## Stage 1: Prefetch

- Here instructions are fetched from the **L1 Cache** and stores them into the Prefetch queue.
- The Prefetch queue is of **32 bytes** as it needs atleast two full instructions to be present inside for feeding the two pipelines, and maximum size of an instruction is **15 bytes**.
- There are two Prefetch queues but **only one of them is active** at a time. It supplies the instructions to the two pipes. The other one is used when branch prediction logic predicts a branch to be "**taken**".
- Since the bus from L1 cache to the prefetcher is of **256 bits** (32 Bytes), the entire queue can be fetched in **1 Cycle**. (T State)

## Stage 2: Decode

- The decode stage decodes the instruction opcode.
- It also checks for instruction pairing and performs branch prediction.
- Certain rules are provided for instruction pairing. Not all instructions are pairable.
- If the two instructions can be paired, the first one is given to the u pipe and the second one to the v pipe. If not, then the first one is given to the u pipe and the second one is held back and then paired with the forthcoming instruction.



# Integer Pipelining stages

---

## **INSTRUCTION PAIRING ALGORITHM (ISSUE ALGORITHM):**

**Consider two consecutive instructions I1 and I2, decoded by the µP...**

**If all the following are true:**

**I1 is a Simple instruction**

**I2 is a Simple instruction**

**I1 is not a Jump instruction**

**Destination of I1 not the same as Source of I2**

**Destination of I1 not the same as Destination of I2**

**Then**

**Issue I1 to U-Pipe and I2 to V-Pipe**

**Else**

**Issue I1 to U-Pipe**

## **Branch Prediction**

- The Pentium processor includes branch prediction logic.
- This prevents flushing of pipelines during a branch operation.
- When a branch operation is correctly predicted, no performance penalty is incurred.



# Integer Pipelining stages

---

## Stage 3: Decode 2 or Address Generation Stage

- It performs address generation where it generates the physical address of the required memory operand using segment translation and page translation.
- Even protection checks are performed at this stage.
- The address calculation is fast due to segment descriptor caches and TLB.
- In most cases the address translation is performed in 1 cycle itself.

## Stage 4: Execution Stage

- The Execution stage mainly uses the ALU.
- The **U pipeline's ALU has a barrel shifter**, while the **V pipeline's does not**.
- Instructions involving shifting like MUL, DIV etc can **only be done by U pipeline**.
- Operands are either provided by registers or by data cache (assuming a hit).
- Both, u and v pipes can access the **data cache simultaneously**.
- During execution, if the u pipe instruction stalls, the v pipe one has to also stall.  
But if the v pipe instruction stalls, the u pipe one can continue.

## Stage 5: Write-Back stage

- As the name suggests, the **result is written back** into the appropriate registers.
- The **flags are updated** accordingly.



# Floating point instruction pipeline stages

| Stage                 | Description                                                                                                                                                     |
|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Prefetch              | Identical to the integer prefetch stage.                                                                                                                        |
| Instruction De-code 1 | Identical to the integer D1 stage.                                                                                                                              |
| Instruction De-code 2 | Identical to the integer D2 stage.                                                                                                                              |
| Execution Stage (EX)  | Register read, memory read, or memory write performed as required by the instruction (to access an operand).                                                    |
| FP Execution 1 Stage  | Information from register or memory is written into a FP register. Data is converted to floating-point format before being loaded into the floating-point unit. |
| FP Execution 2 Stage  | Floating-point operation performed within floating-point unit.                                                                                                  |
| Write FP Result       | Floating-point results are rounded and the result is written to the target floating-point register.                                                             |
| Error Reporting       | If an error is detected, an error reporting stage is entered where the error is reported and the FPU status word is updated.                                    |

# Floating point instruction pipeline stages

Most floating point instructions are issued singly to the U pipeline and cannot be paired with integer instructions. It consists of eight pipeline stages. The first four stages are shared with integer pipeline and the last four reside within the floating point unit itself.



# Comparison on various processors

| S No | Attribute           | 8085       | 8086     | 80286    | 80386    | 80486     | Pentium   |
|------|---------------------|------------|----------|----------|----------|-----------|-----------|
| 1    | Processor Size      | 8 – bit    | 16 – bit | 16 – bit | 32 – bit | 32 – bit  | 32 – bit  |
| 2    | Data Bus            | 8 – bit    | 16 – bit | 16 – bit | 32 – bit | 32 – bit  | 64 – bit  |
| 3    | Memory Banks        | --- NA --- | 2 banks  | 2 banks  | 4 banks  | 4 banks   | 8 banks   |
| 4    | Address Bus         | 16 – bit   | 20 – bit | 24 – bit | 32 – bit | 32 – bit  | 32 – bit  |
| 5    | Memory Size         | 64 KB      | 1 MB     | 16 MB    | 4 GB     | 4 GB      | 4 GB      |
| 6    | Pipeline Stages     | --- NA --- | 2        | 3        | 3        | 5         | 5         |
| 7    | ALU Size            | 8 – bit    | 16 – bit | 16 – bit | 32 – bit | 32 – bit  | 32 – bit  |
| 8    | No of Transistors   | 6500       | 29,000   | 1,34,000 | 2,75,000 | 11,80,235 | 31,00,000 |
| 9    | Year of Release     | 1976       | 1978     | 1982     | 1985     | 1989      | 1993      |
| 10   | Operating Frequency | 3 MHz      | 6 MHz    | 12 MHz   | 33 MHz   | 60 MHz    | 100 MHz   |

