



# Inside the 6<sup>th</sup> Gen Intel® Skylake Core –

*Past, Present, and Future of a new microarchitecture*

Peter Hu (zh369)

Magdalene College

@ Advanced Topics in Computer Architecture, 3<sup>rd</sup> Feb 2026



## Client Devices



## Scalable Client Architecture



# Requirements

Various applications motivate a **scalable** design.

**client:** SoC, tablets, PC (Laptop, Desktop), etc.

**server:** database, web, network, AI workstations, remote renderer, etc.

Increased range for power (4.5-95 W)/performance.  
Higher memory bandwidth.

## Goals

(1) low **power**: active/idle power reduction, DVFS, measurement, etc.

(2) high **performance**: extract parallelism, monitoring unit, etc.

# Power management



|                                                    | <b>SpeedStep®</b>  | <b>Speed Shift</b>                    |
|----------------------------------------------------|--------------------|---------------------------------------|
| Managed by                                         | software           | <u>hardware</u>                       |
| P-state from                                       | OS (ACPI governor) | CPU hardware                          |
| Latency                                            | ms-scale           | <b>μs-scale</b>                       |
| <b>Responsiveness</b>                              | moderate           | <b>high</b>                           |
| Runtime behaviour and<br>Microarch. state (memory) | limited visibility | <b>more information<br/>available</b> |
| Power efficiency                                   | governor-dependent | <b>better</b>                         |

**Future:** (1) more Boost heuristics (Panther Lake), (2) useful hints from OS, (3) workload prediction with AI.

# 4<sup>TH</sup> Gen Micro Arch. Skylake

| Feature (unit)         | Nehalem                                                                     | Sandy Bridge | Haswell        | <u>Skylake</u>   |
|------------------------|-----------------------------------------------------------------------------|--------------|----------------|------------------|
| BR predictor           | BTB + two-level predictor for history: 32B<br>Global Buffer + Pattern Table |              |                |                  |
| Decoders (/cycle)      | 4                                                                           | 4            | 4              | <b>5</b>         |
| Queue (per thread)     | 28                                                                          | 28           | 56 total       | <b>64</b>        |
| Reorder buffer (ops)   | 128                                                                         | 168          | 192            | <b>224</b>       |
| Integer/FP Rename      | In ROB                                                                      | 160 / 144    | 168 / 168      | <b>180 / 168</b> |
| ld/st buffer (entries) | 48 / 32                                                                     | 64 / 36      | <b>72 / 42</b> | <b>72 / 56</b>   |
| Scheduler (entries)    | 36                                                                          | 54           | 60             | <b>97</b>        |
| Issue width            | 4                                                                           | 6            | <b>8</b>       | <b>8</b>         |
| Arithmetic units       | throughput increases, latency deduces                                       |              |                |                  |
| Load                   | <b>reduce</b> store-to-load forward, split-load cost                        |              |                |                  |
| Store                  | deep buffers, request to L2 for earlier L1 miss                             |              |                |                  |
| Cache                  | bandwidth <b>move</b> from L2 to shared L3                                  |              |                |                  |



# Pros & Cons

- overall enhancements over Intel legacy **superscalar** pipeline.
  - **wider frontend + deeper backend** and **issue queue**.
- **out-of-issue** provides **higher MLP, ILP**.
- **Better arithmetic.**
  - SIMD exploits the **DLP**.
- **Improved memory subsystem.**
  - Larger L3 (LLC), eDRAM, etc.
- diminishing returns due to ILP limits, BR mispredict, memory stalls.
- power increases, to be tuned by 7<sup>th</sup> Gen (Kaby Lake).
- **speculative exec.** enables spectre attack.
  - solution: clear related cache when speculation failed to **tradeoff** security over performance.
- incremental novelty, no fundamental different microarchitecture design.

# Fabric, Cache, and Memory

**four rings:** request, snoop, data, acknowledge.  
high bw, low latency/power, modular, less control logic.

scalable fabric, shared LLC,  
larger bandwidth, more coherency.



**Figure 15.1.1: Sandy Bridge block diagram.**



**Weakness:** poor scalability, contention for multi-core, clocking.

For large dies used in server, **mesh** later from 2017 replaced the ring.

# Memory security & integrity

Software Guard Extensions (SGX) : a new ISA extension

protect DRAM against **memory bus snooping** and **cold boot attacks** for enclave code and data.  
usage: key management, password vaults, secure analytics, confidential ML inference.



**Weakness:** limited size, performance overhead (sys call), side-channel attacks (cache).

# Graphics

- Lossless render compression,
- Low-power media, video quality,
- Slice vs. Unslice,
  - different clock domains, power-gate.
  - pure media, higher throughput.
- Idle management,
  - Voltage-Frequency (V-F) curves,
  - C-state: C0 (fully active), C1 (halt), C2 (clocks off), etc.





Intel® longest-serving  
CPU architecture: Skylake.

- *Incremental* novelty in
  - power management,
  - superscalar microarchitecture,
  - fabric, interconnect and memory system.
- *Evolutionary* novelty in
  - memory encryption (SGX),
  - decoupling of slice and unslice graphics unit.
- **Results:** meet the thermal/power constraints, with
  - better computational & 3D performance,
  - longer battery life,
  - wider cooling capability.

## Efficiency-first design choices

Front-end width  
Instruction fetch  
Instruction window  
Execution back-end  
Larger private L2 caches

shallower pipelines, **reduce freq.** for energy (10<sup>th</sup> Ice Lake / 12<sup>th</sup> Alder Lake).

wider fetch/decode, higher µOP delivery rate to reduce bubbles.  
larger BTBs, deeper history; smarter prefetching.  
ROB and scheduler size increased → more MLP & ILP.  
more ports, AGUs, load/store bandwidth.  
reduced L3 traffic, better core-locality.

## Heterogeneous cores

HW-DVFS (Speed Shift)

**P+E:** High IPC cores for **perf.** + small cores for **energy** (12<sup>th</sup> Alder Lake+). ↗

## HW-OS co-DVFS

Block- & tile-based gating

quicker frequency changes, aggressive.

**Thread Director** guides workload-aware freq. steering (12<sup>th</sup> Alder Lake).

independent power/thermal control **per block** (Meteor Lake+, 2023).

# Post-Skylake changes



### Performance-cores

- Optimized for handling single & lightly-threaded performance
- Enhancing gaming and productivity workload



### Efficient-cores

- Optimized for handling scaling highly-threaded workloads
- Minimizing interruptions from background task management



# Thank you and welcome your thoughts!

## *Reference*

- Inside 6th-Generation Intel® Core: New Microarchitecture Code-Named Skylake, Jack D. et al., [IEEE Micro](#), 2017.
- Intel® SGX – [Key Management](#) on the 3rd Generation Xeon® Scalable Processor, 2021.
- Gen9 - Microarchitectures – Intel®, [Wiki-Chip](#).
- Skylake case study, Prof Robert Mullins, [Cambridge Uni Advanced Computer Arch](#).
- Computer arch Intel® Skylake, Prof Christopher Batten, [Cornell University ECE 4750](#).
- Spectre Attacks: Exploiting Speculative Execution, Paul K. et al., [2019 IEEE Symposium on Security and Privacy \(SP\)](#).
- A Fully Integrated Multi-C/GPU and Memory Controller 32nm Processor, Marcelo Y. et al., [Intel Sandy Bridge](#), 2011.
- The 1<sup>st</sup> to latest Gen. Intel® architectural design details, from various documentations, reviews, videos, blogs, etc.