

# Comp 590-184: Hardware Security and Side-Channels

## Lecture 9: Transient Execution Attacks

February 12, 2026  
Andrew Kwong



THE UNIVERSITY  
of NORTH CAROLINA  
at CHAPEL HILL

Slides adapted from  
Mengjia Yan  
(shd.mit.edu)

# Outline

---



- What are transient execution attacks?
- How does Meltdown work?
  - Background on out-of-order processors
  - We will connect the dots between a hardware optimization and a software optimization.
- How do Spectre and its variations work?
  - Background on speculative execution
  - Let's try to see through these variations and understand the fundamental problem.

# Impact

6



**Geometric Mean Of All Test Results**  
Result Composite



Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom:

**Spectre Attacks: Exploiting Speculative Execution.**

*IEEE Symposium on Security and Privacy (S&P), 2019*

2731 cites at [Google Scholar](#) | 3286% above average of year | Last visited: Jan-2024 | Paper: DOI

55

Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg: **Meltdown: Reading Kernel Memory from User Space.**

*USENIX Security Symposium, 2018*

1644 cites at [Google Scholar](#) | 1413% above average of year | Last visited: Jan-2024 | Paper: DOI

3

# Meltdown

---



# Meltdown



## Meltdown Root Causes

---

- Due to the combination of both a hardware and software optimization
  - Out of order execution
  - Mapping kernel memory into user space

# Recap: 5-stage Pipeline

---



## Recap: 5-stage Pipeline

---

- In-order execution:
  - Execute instructions according to the program order
  - What is the ideal instruction throughput? -- instruction per cycle (IPC)

| <i>time</i>  | t0              | t1              | t2              | t3              | t4              | t5              | t6              | .t7             | ...             |
|--------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| instruction1 | IF <sub>1</sub> | ID <sub>1</sub> | EX <sub>1</sub> | MA <sub>1</sub> | WB <sub>1</sub> |                 |                 |                 |                 |
| instruction2 |                 | IF <sub>2</sub> | ID <sub>2</sub> | EX <sub>2</sub> | MA <sub>2</sub> | WB <sub>2</sub> |                 |                 |                 |
| instruction3 |                 |                 | IF <sub>3</sub> | ID <sub>3</sub> | EX <sub>3</sub> | MA <sub>3</sub> | WB <sub>3</sub> |                 |                 |
| instruction4 |                 |                 |                 | IF <sub>4</sub> | ID <sub>4</sub> | EX <sub>4</sub> | MA <sub>4</sub> | WB <sub>4</sub> |                 |
| instruction5 |                 |                 |                 |                 | IF <sub>5</sub> | ID <sub>5</sub> | EX <sub>5</sub> | MA <sub>5</sub> | WB <sub>5</sub> |

# Build High-Performance Processors

---

Example #1:

```
FMUL f1, f2, f3 ; 10 cycles  
ADD r4, r4, r1 ; 1 cycle -> repeat  
.....
```



Instruction-Level Parallelism (ILP)

Example #2:

```
LD r3, 0(r2) ; 1-100 cycles  
ADD r4, r4, r1 ; 1 cycle -> repeat 10 times  
.....
```

when there is no data-dependency or control-flow dependency between instructions

## Technique #1: Add More Functional Units



## Technique #1: Add More Functional Units



## Technique #1: Add More Functional Units



## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            |       |          |          |          |
| Fdiv            |       |          |          |          |

1: FMUL f1, f2, f3

2: ADD r4, r4, r1

## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Y     | f1       | f2       | f3       |
| Fdiv            |       |          |          |          |

1: FMUL f1, f2, f3

2: ADD r4, r4, r1

## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         | Y     | r4       | r4       | r1       |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Y     | f1       | f2       | f3       |
| Fdiv            |       |          |          |          |

1: FMUL f1, f2, f3

2: ADD r4, r4, r1

## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            |       |          |          |          |
| Fdiv            |       |          |          |          |

1: FMUL f1, f2, f3

2: FDIV f5, f1, f4

## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Y     | f1       | f2       | f3       |
| Fdiv            |       |          |          |          |

1: FMUL f1, f2, f3

2: FDIV f5, f1, f4

## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Y     | f1       | f2       | f3       |
| Fdiv            | Y     | f5       | f1       | f4       |

1: FMUL f1, f2, f3

2: FDIV f5, f1, f4

Read-after-write (RAW)

## Technique #2: Scoreboard

---

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            |       |          |          |          |
| Fdiv            |       |          |          |          |

write-after-write (WAW)

1: FMUL f1, f2, f3 ;10 cycles  
2: FADD f1, f4, f5 ;4 cycles

## Technique #2: Scoreboard

---

- Upon issue of an instruction, check:
  1. Whether any ongoing instructions will generate values for my source registers
  2. Whether any ongoing instructions will modify my destination register
- We call such a processor: **in-order issue, out-of-order completion.**
- A problem: how to handle interrupts/exceptions?

## Another Way to Draw It

---



## Exception in OoO Processors: Example #1

```
1: LD  r3, 0(r2) ; Exception in 3 cycles  
2: ADD r4, r4, r1 ; 1 cycle
```

Need to delay WB

|        | 1  | 2  | 3     | 4     | 5   | 6   | 7   | 8         |
|--------|----|----|-------|-------|-----|-----|-----|-----------|
| 1: LD  | IF | ID | Issue | ALU   | Mem | Mem | Mem | Exception |
| 2: ADD |    | IF | ID    | Issue | ALU | WB  |     |           |

## Exception in OoO Processors: Example #2

```
1: FMUL f1, f2, f3 ; 10 cycles  
2: LD r3, 0(r2) ; Exception in 1 cycle
```

Need to delay  
Exception

|         | 1  | 2  | 3     | 4     | 5    | 6    | 7         | 8   |
|---------|----|----|-------|-------|------|------|-----------|-----|
| 1: FMUL | IF | ID | Issue | FMUL  | FMUL | FMUL | FMUL      | ... |
| 2: LD   |    | IF | ID    | Issue | ALU  | Mem  | Exception |     |

## Technique #3: In-order Commit



## Re-examine Examples With In-order Commit

---

```
1: LD  r3, 0(r2)    ; Exception in 3 cycles  
2: ADD r4, r4, r1    ; 1 cycle
```

```
1: FMUL f1, f2, f3    ; 10 cycles  
2: LD  r3, 0(r2)    ; Exception in 1 cycle
```

# Recap: Page Mapping



# Mapping Kernel Pages



# Jumping Between User and Kernel Space

---



- Context switch overhead:

- Page table changes introduce perf overhead, e.g., flush TLB in some processors

- And sometimes, we only go to kernel to do some simple things, `getpid()`

- Performance optimization:

- Map kernel address into user space in a **secure** way, so no need to swap page tables

# Map Kernel Pages Into User Space **Securely**

---



# Virtual Memory

## Virtual memory (x86\_64 Linux)

0x0000...0000



user function:

```
void hello_world(){ printf("hello word!"); }
```

libc function:

```
void printf(){ ... write(); ... }
```

system call:

```
void sys_write(){ ... }
```

0xfffff8000...00

0xfffff...ffff

# Map Kernel Pages Into User Space Securely



# Meltdown

---

- Put two optimizations together, we have Meltdown
  - Hardware optimization: out-of-order execution
    - Deferred exception handling
  - Software optimization: mapping kernel addresses into user space
- Attack outcome: user space applications can read arbitrary kernel data

```
.....  
Ld1: uint8_t secret = *kernel_address;  
Ld2: unit8_t dummy = probe_array[secret*64];
```



2<sup>nd</sup> line of code can transiently execute before the exception occurs!

## Technique #3: In-order Commit



# Meltdown Timing

```
.....  
Ld1: uint8_t secret = *kernel_address;  
Ld2: unit8_t dummy = probe_array[secret*64];
```

**Case 1: Fail.** Ld2 is squashed before the corresponding memory access is issued.



**Case 2: Attack works.** Ld2's request is sent out before the instruction is squashed.

