

CPU :

- Microcontroller : In – Order.
- Microprocessor : OOO.

OOO Pipeline Structure :







### FETCH STAGE :



**DECODE STAGE :**



## Decode (Module 2)



## Decode (Module 2)



### RAT, Dispatch and Issue stage :



Rename



Rename



## Reorder Buffer

(Module 6)

mul P13, P12, P0  
add P14, P1, #4  
ldr P15, [P14]  
b #252

## Commit Register Map Table

X0 is in P0  
X1 is in P2  
X2 is in P10  
X3 is in P11  
X4 is in P4  
X5 is in P5  
X6 is in P6  
X7 is in P7

Not Finished  
Finished



mul P13, P12, P0  
add P14, P1, #4  
ldr P15, [P14]

Issue window /Queue  
(Module 5)

## Reorder Buffer

(Module 6)

str P13, [P4]  
b #252  
ldr P15, [P4]  
add P14, P1, #4  
mul P13, P12, P0  
tst P12, #0  
eor P12, P4, #4  
mul P1, P10, P11

## Commit Register Map Table

X0 is in P0  
X1 is in P2  
X2 is in P10  
X3 is in P11  
X4 is in P4  
X5 is in P5  
X6 is in P6  
X7 is in P7

Not Finished  
Finished



Issue Window /Queue  
(Module 5)



## EXECUTE STAGE :



## MEMORY ACCESS STAGE :



## Writeback Stage :







## ARM Architecture : Cortex-A72. Link



**Figure 1. Cortex-A720 cores example configuration**



**arm**

**CORTEX®-A720**



## Intel x86 : CPU



## Intel x86 : Lion Cove









AGU : Address Generation Unit.

In theory I read Id -> st (WAR) and st -> Id (RAW) dependencies.

Id r4, 10[r5]

st r3, 10[r2]

The address is same. Conflicting address.

LSQ was introduced. To provide the values early instead of fetching value from d-cache like forwarding.

The basic summary table from ACA concept :





## Micro - Architecture : Intel





### Speculative Execution:

Like in theory read the Aggressive Speculation :

- Load – Store
- Address
- Latency
- Value

20% of the time the instructions are Branch Instructions. Most of the time is spent here whenever there is misprediction we need to flush the pipeline (This is costly operation) and may lead to a lot of issues, latency and correctness issue.







## Intel x86 : Multi Core CPU

Kind of ARM Cluster : Multiple CPU core grouped to one.



Register Renaming : 2 to 4 Stages

- RAT
- Dependency Check Logic
- Free List or Free Queue

The uop written to RoB. The allocation happens after Decode and removal happens at Commit time. (All are in in-order)







Dual Processor :





Multiple Functional Units :



Intel's 1990-2000 : Innovation. Max Pipeline and Resource Utilization.

In given clk cycle itself Multiple Instr of different pgm are getting issued and executed.

Symmetric/Simultaneous Multi-Threading (SMT) :



Resource Utilization :



## Multi – Pipeline Stages :



## Multi Core : (Each core with Multiple Pipelines).



- Each Thread has its own PC, ROB's and RRF.
- Each Instruction Packet, rename table entry, LSQ entry, physical register is tagged with thread id.
- HT : Dynamic issue slot split.
- SMT : Static partitioning (50-50).

### Hyperthreading (Single core 2 threads):



Figure 6: Out-of-order execution engine detailed pipeline

## Hyperthreading (Single core 2 threads):



## Hyperthreading (Single core 2 threads):



Figure 6: Out-of-order execution engine detailed pipeline

Figure 6: Out-of-order execution engine detailed pipeline