

# Recitation 2: Cache Attacks

Mengjia Yan

Spring 2026



# Agenda

- Tour of the unicorn/dobby machines
- Explanation for the TLB-flush demo

# Recall the Flush+Reload Demo



# Core and Private Caches



Simultaneous Multithreading (**SMT**) = Hyperthreading

- Running more than 1 threads in one physical core

**Inclusive** cache:

- Any L1 cache block has a duplicated copy in the L2

Cache operations:

- Lookup
  - A cache miss if missing L1 and L2
- Insertion
  - **Q:** To fulfill a cache miss, we insert to L1, L2, or both?
- Eviction
  - **Q:** What will happen if we have L1-D conflicts?
  - **Q:** what will happen if we have L2 conflicts?

# Network-on-Chip and Shared Caches



# Network-on-Chip and Shared Caches



# Shared Cache Operations



## Non-inclusive cache:

- An L2 cache block **may or may not** have a duplicated copy in the L3

## Cache operations:

- Lookup**
  - Q:** Do we have a cache miss if we have an L3 miss?
- Insertion**
  - Q:** To fulfill a cache miss, we insert to L1, L2, L3 or all of them?
- Eviction**
  - Q:** What will happen if we have an L2 conflict?
  - Q:** what will happen if we have an L3 conflict?

# Composing A Chip



# Multi-socket Machine



**NUMA:** non-uniform memory access

- Accesses to different memory banks take different latencies

**NUCA:** non-uniform cache access

- Accesses to different LLC banks take different latencies

Now let's explain the flush+reload attack.



*Pictures from wikichip*



Unicorn and doby both have 2 sockets.  
In theory: we have  $28 * 2 * 2 = 112$  cores

Why the cap is 96?

| <b>(0, 0)</b>                      | <b>(0, 1)</b>                      | <b>(0, 2)</b>                       | <b>(0, 3)</b>                       | <b>(0, 4)</b>                       | <b>(0, 5)</b>                       |
|------------------------------------|------------------------------------|-------------------------------------|-------------------------------------|-------------------------------------|-------------------------------------|
| cpu 0<br>slice 0                   | cpu 1<br>slice 4                   | cpu 15<br>slice 9                   | cpu 16<br>slice 13                  | cpu 17<br>slice 17                  | cpu 12<br>slice 22                  |
| <b>(1, 0)</b><br>IMC 0             | <b>(1, 1)</b><br>cpu 14<br>slice 5 | <b>(1, 2)</b><br>cpu 9<br>slice 10  | <b>(1, 3)</b><br>cpu 10<br>slice 14 | <b>(1, 4)</b><br>cpu 11<br>slice 18 | <b>(1, 5)</b><br>IMC 1              |
| <b>(2, 0)</b><br>cpu 13<br>slice 1 | <b>(2, 1)</b><br>cpu 8<br>slice 6  | <b>(2, 2)</b><br>cpu 20<br>slice 11 | <b>(2, 3)</b><br>cpu 21<br>slice 15 | <b>(2, 4)</b><br>cpu 22<br>slice 19 | <b>(2, 5)</b><br>cpu 23<br>slice 23 |
| <b>(3, 0)</b><br>cpu 7<br>slice 2  | <b>(3, 1)</b><br>cpu 19<br>slice 7 | <b>(3, 2)</b><br>cpu 3<br>slice 12  | <b>(3, 3)</b><br>X                  | <b>(3, 4)</b><br>cpu 5<br>slice 20  | <b>(3, 5)</b><br>cpu 6<br>slice 24  |
| <b>(4, 0)</b><br>slice 3           | <b>(4, 1)</b><br>cpu 2<br>slice 8  | <b>(4, 2)</b><br>X                  | <b>(4, 3)</b><br>cpu 4<br>slice 16  | <b>(4, 4)</b><br>cpu 18<br>slice 21 | <b>(4, 5)</b><br>slice 25           |

Figure 2: An example tile layout of an Intel Cascade Lake processor with 24 active cores and 26 active LLC slices.

# Next: Transient Execution Attacks



# The Heartbeat Demo from Lecture 2



- Sender: send a heartbeat every 5 seconds

```
while(1) {  
    allocate a buffer;  
    sleep(5);  
    free the buffer;  
}
```

- Receiver: sample system status every 1 second

```
allocate a buffer;  
while(1) {  
    latency = time(access the buffer);  
    report latency;  
    sleep(1);  
}
```