

# Timing Anomaly through Branch Prediction

Andrei Ilin

Université Grenoble Alpes

23 June 2025

*Supervised by Lionel Rieg, Florian Brandner and Mihail Asavoae*



## ① Introduction

## ② Hardware

## ③ Timing Anomalies

## ④ Contribution

## ⑤ Conclusion

# Critical Real-Time Systems

Can be found in:

- Cars
- Planes
- Life-supporting equipment



Strict timing requirements for the programs.

## Example

Car break system that must respond in 5 ms.

# WCET Analysis

## Worst Case Execution Time Analysis:

- Hardware + Software
- Upper bound for execution time?

### Worst-Case Execution Time



# WCET Analysis

## Worst Case Execution Time Analysis:

- Hardware + Software
- Upper bound for execution time?

### Worst-Case Execution Time



Analysis is split into multiple stages and uses abstractions.

# Timing Anomalies

## Timing Anomaly (TA)

When a local speedup leads to a global slowdown.

### Example

Faster completion of A leads to a slowdown of the whole trace.

LD      r1, o(r2) ; A  
  ↓ ADD    r3, r1, r4 ; B  
  ↓ ADD    r5, r6, r7 ; C  
  ↓ LD     r8, (o)r5 ; D



## ① Introduction

## ② Hardware

## ③ Timing Anomalies

## ④ Contribution

## ⑤ Conclusion

# OoO Multiscalar Pipeline

Processor:

- **Pipelined:** Divided into consecutive stages
- **Superscalar:** Fetch multiple instruction in the same time
- **Out-of-Order:** Execution order dictated by scheduling policy



# OoO Multiscalar Pipeline

Processor:

- **Pipelined:** Divided into consecutive stages
- **Superscalar:** Fetch multiple instruction in the same time
- **Out-of-Order:** Execution order dictated by scheduling policy



|                                | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |
|--------------------------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| Fetch 2 instructions per cycle |    |    |     |     |     |     |     |     |     |     |     |
| A                              | IF | ID | FU1 | FU1 | FU1 | COM |     |     |     |     |     |
| B                              | IF | ID | rs2 | rs2 | rs2 | rs2 | FU2 | FU2 | FU2 | COM |     |
| C                              |    | IF | ID  | FU2 | FU2 | FU2 | rob | rob | rob | COM |     |
| D                              |    |    | rs1 | rs1 | rs1 | rs1 | FU1 | FU1 | FU1 | rob | COM |

# OoO Multiscalar Pipeline

Processor:

- **Pipelined:** Divided into consecutive stages
- **Superscalar:** Fetch multiple instruction in the same time
- **Out-of-Order:** Execution order dictated by scheduling policy



| Execute out-of-order |    |    |     |     |     |     |     |     |     |     |  |
|----------------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|--|
| 1                    | 2  | 3  | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |  |
| A                    | IF | ID | FU1 | FU1 | FU1 | COM |     |     |     |     |  |
| B                    | IF | ID | rs2 | rs2 | rs2 | FU2 | FU2 | FU2 | COM |     |  |
| C                    | .  | ID | FU2 | FU2 | FU2 | rob | rob | rob | COM |     |  |
| D                    | .  | ID | rs1 | rs1 | rs1 | FU1 | FU1 | FU1 | rob | COM |  |

# OoO Multiscalar Pipeline

Processor:

- **Pipelined:** Divided into consecutive stages
- **Superscalar:** Fetch multiple instruction in the same time
- **Out-of-Order:** Execution order dictated by scheduling policy



|   | Commit in-order |    |     |     |     |     |     |     |     |     |     |
|---|-----------------|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
|   | 1               | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |
| A | IF              | ID | FU1 | FU1 | FU1 | COM |     |     |     |     |     |
| B | IF              | ID | rs2 | rs2 | rs2 | rs2 | FU2 | FU2 | FU2 | COM |     |
| C | .               | IF | ID  | FU2 | FU2 | FU2 | rob | rob | rob | COM |     |
| D | .               | IF | ID  | rs1 | rs1 | rs1 | FU1 | FU1 | FU1 | rob | COM |

## Branch Prediction

- Branch target unknown until branch is resolved.



# Branch Prediction

- Branch target unknown until branch is resolved.
- Predict next address and execute speculatively.



# Branch Prediction

- Branch target unknown until branch is resolved.
- Predict next address and execute speculatively.
- Misprediction detected → start fetching the correct branch



## ① Introduction

## ② Hardware

## ③ Timing Anomalies

## ④ Contribution

## ⑤ Conclusion

# Formal Definition?

## How do we formally define a TA?

### Many existing definitions

- Step Heights (by Gebhard et al.)
- Step-functions Intersections (by Cassez et al.)
- Component Occupation (by Kirner et al.)
- Instruction Locality (by Reineke et al.)
- Causality Dependency Graph (by Binder et al.)

# Formal Definition?

How do we formally define a TA?

Many existing definitions

- Step Heights (by Gebhard et al.)
- Step-functions Intersections (by Cassez et al.)
- Component Occupation (by Kirner et al.)
- Instruction Locality (by Reineke et al.)
- Causality Dependency Graph (by Binder et al.)

# Step Heights by Gebhard et al.

## Definition

TA = instruction latency (compared to previous instr.) smaller, global time greater

|          | 1 | 2  | 3  | 4               | 5               | 6                          | 7               | 8                         | 9               | 10              | 11              | 12              | 13  |
|----------|---|----|----|-----------------|-----------------|----------------------------|-----------------|---------------------------|-----------------|-----------------|-----------------|-----------------|-----|
| $\alpha$ | A | IF | ID | FU <sub>1</sub> | COM             | $\downarrow \Delta_\alpha$ |                 |                           |                 |                 |                 |                 |     |
|          | B | IF | ID | RS <sub>2</sub> | FU <sub>2</sub> | FU <sub>2</sub>            | FU <sub>2</sub> | COM                       |                 |                 |                 |                 |     |
|          | C | IF | ID | RS <sub>2</sub> | RS <sub>2</sub> | RS <sub>2</sub>            | FU <sub>2</sub> | FU <sub>2</sub>           | FU <sub>2</sub> | COM             |                 |                 |     |
|          | D | IF | ID | RS <sub>1</sub> | RS <sub>1</sub> | RS <sub>1</sub>            | RS <sub>1</sub> | RS <sub>1</sub>           | RS <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | COM |
| $\beta$  | A | IF | ID | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub>            | COM             | $\downarrow \Delta_\beta$ |                 |                 |                 |                 |     |
|          | B | IF | ID | RS <sub>2</sub> | RS <sub>2</sub> | RS <sub>2</sub>            | RS <sub>2</sub> | FU <sub>2</sub>           | FU <sub>2</sub> | FU <sub>2</sub> | COM             |                 |     |
|          | C | IF | ID | FU <sub>2</sub> | FU <sub>2</sub> | FU <sub>2</sub>            | ROB             | ROB                       | ROB             | COM             |                 |                 |     |
|          | D | IF | ID | RS <sub>1</sub> | RS <sub>1</sub> | RS <sub>1</sub>            | FU <sub>1</sub> | FU <sub>1</sub>           | FU <sub>1</sub> | ROB             | COM             |                 |     |

# Step Heights by Gebhard et al.

## Definition

TA = instruction latency (compared to previous instr.) smaller,  
global time greater

## Counterexample!

|          | 1 | 2     | 3               | 4               | 5               | 6               | 7               | 8               | 9                  | 10  |
|----------|---|-------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|--------------------|-----|
| $\alpha$ | A | IF ID | FU <sub>1</sub> | COM             |                 |                 |                 |                 |                    |     |
|          | B | IF ID | RS <sub>2</sub> | FU <sub>2</sub> | COM             |                 |                 |                 |                    |     |
|          | C | IF    | ID              | RS <sub>2</sub> | FU <sub>2</sub> | COM             | $\downarrow$    | $\Delta_\alpha$ |                    |     |
|          | D | IF    | ID              | RS <sub>1</sub> | RS <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | COM                |     |
| $\beta$  | A | IF ID | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | COM             |                 |                 |                    |     |
|          | B | IF ID | RS <sub>2</sub> | RS <sub>2</sub> | RS <sub>2</sub> | FU <sub>2</sub> | COM             | $\downarrow$    | $\Delta_\beta = 0$ |     |
|          | C | IF    | ID              | FU <sub>2</sub> | ROB             | ROB             | COM             |                 |                    |     |
|          | D | IF    | ID              | RS <sub>1</sub> | RS <sub>1</sub> | RS <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub>    | COM |

# Step Function Intersections by Cassez et al.

## Definition

TA = some instruction finishes earlier in trace  $\alpha$  than in  $\beta$ ,  
subsequent instruction finishes later in  $\alpha$  than in  $\beta$

|          | 1 | 2  | 3  | 4               | 5               | 6                          | 7               | 8                         | 9               | 10              | 11              | 12              | 13   |
|----------|---|----|----|-----------------|-----------------|----------------------------|-----------------|---------------------------|-----------------|-----------------|-----------------|-----------------|------|
| $\alpha$ | A | IF | ID | FU <sub>1</sub> | COM             | $\downarrow \Delta_\alpha$ |                 |                           |                 |                 |                 |                 |      |
|          | B | IF | ID | RS <sub>2</sub> | FU <sub>2</sub> | FU <sub>2</sub>            | FU <sub>2</sub> | COM                       |                 |                 |                 |                 |      |
|          | C | IF | ID | RS <sub>2</sub> | RS <sub>2</sub> | RS <sub>2</sub>            | FU <sub>2</sub> | FU <sub>2</sub>           | FU <sub>2</sub> | COM             |                 |                 |      |
|          | D | IF | ID | RS <sub>1</sub> | RS <sub>1</sub> | RS <sub>1</sub>            | RS <sub>1</sub> | RS <sub>1</sub>           | RS <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub> | COM• |
| $\beta$  | A | IF | ID | FU <sub>1</sub> | FU <sub>1</sub> | FU <sub>1</sub>            | COM             | $\downarrow \Delta_\beta$ |                 |                 |                 |                 |      |
|          | B | IF | ID | RS <sub>2</sub> | RS <sub>2</sub> | RS <sub>2</sub>            | RS <sub>2</sub> | FU <sub>2</sub>           | FU <sub>2</sub> | FU <sub>2</sub> | COM             |                 |      |
|          | C | IF | ID | FU <sub>2</sub> | FU <sub>2</sub> | FU <sub>2</sub>            | ROB             | ROB                       | ROB             | COM             |                 |                 |      |
|          | D | IF | ID | RS <sub>1</sub> | RS <sub>1</sub> | RS <sub>1</sub>            | RS <sub>1</sub> | FU <sub>1</sub>           | FU <sub>1</sub> | FU <sub>1</sub> | ROB             | COM•            |      |

Step Function Intersections by Cassez et al.

## Definition

TA = some instruction finishes earlier in trace  $\alpha$  than in  $\beta$ ,  
 subsequent instruction finishes later in  $\alpha$  than in  $\beta$

## Counterexample!



# Causality Graph by Binder et al.

## 3 Components

- ① Variation
- ② Slowdown
- ③ Causality

| $\alpha$        | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  |
|-----------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| $\neg A$        | IF | ID | FU1 | COM |     |     |     |     |     |     |     |     |     |
| $\rightarrow B$ | IF | ID | rs  | FU2 | FU2 | FU2 | COM |     |     |     |     |     |     |
| $\neg C$        | .  | IF | ID  | rs  | rs  | rs  | FU2 | FU2 | FU2 | COM |     |     |     |
| $\rightarrow D$ | .  | IF | ID  | rs  | rs  | rs  | rs  | rs  | rs  | FU1 | FU1 | FU1 | COM |

| $\beta$         | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |  |
|-----------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--|
| $\neg A$        | IF | ID | FU1 | FU1 | FU1 | COM |     |     |     |     |     |  |
| $\rightarrow B$ | IF | ID | rs  | rs  | rs  | rs  | FU2 | FU2 | FU2 | COM |     |  |
| $\neg C$        | .  | IF | ID  | FU2 | FU2 | FU2 | rob | rob | rob | COM |     |  |
| $\rightarrow D$ | .  | IF | ID  | rs  | rs  | rs  | FU1 | FU1 | FU1 | rob | COM |  |

# Causality Graph by Binder et al.

## 3 Components

- ① Variation in latency
- ② Slowdown
- ③ Causality

| $\alpha$             | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  |
|----------------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| $\curvearrowleft A$  | IF | ID | FU1 | COM |     |     |     |     |     |     |     |     |     |
| $\curvearrowright B$ | IF | ID | rs  | FU2 | FU2 | FU2 | COM |     |     |     |     |     |     |
| $\curvearrowleft C$  | .  | IF | ID  | rs  | rs  | rs  | FU2 | FU2 | FU2 | COM |     |     |     |
| $\curvearrowright D$ | .  | IF | ID  | rs  | rs  | rs  | rs  | rs  | rs  | FU1 | FU1 | FU1 | COM |

| $\beta$              | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |
|----------------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| $\curvearrowleft A$  | IF | ID | FU1 | FU1 | FU1 | COM |     |     |     |     |     |
| $\curvearrowright B$ | IF | ID | rs  | rs  | rs  | rs  | FU2 | FU2 | FU2 | COM |     |
| $\curvearrowleft C$  | .  | IF | ID  | FU2 | FU2 | FU2 | rob | rob | rob | COM |     |
| $\curvearrowright D$ | .  | IF | ID  | rs  | rs  | rs  | FU1 | FU1 | FU1 | rob | COM |



# Causality Graph by Binder et al.

## 3 Components

- ① Variation in latency
- ② Slowdown between end of the variation and a later event
- ③ Causality



# Causality Graph by Binder et al.

## 3 Components

- ① Variation in latency
- ② Slowdown between end of the variation and a later event
- ③ Causality link = event X cannot be earlier because of Y



## Goal

Develop a consistent TA definition applicable for branch prediction.

- **Question 1** Can timing anomaly be caused by branch prediction?
- **Question 2** Can we extend the existing Binder's definition?
- **Question 3** Do we cover all the aspects of branch prediction?

## ① Introduction

## ② Hardware

## ③ Timing Anomalies

## ④ Contribution

## ⑤ Conclusion

# Research Plan

- ① Create a tool for automatic example generation.
- ② Generate some candidate scenarios.
- ③ Try to adapt definition on them.
- ④ Find new scenarios to verify the definition.

# Tool Implementation

## Existing TLA<sup>+</sup> framework

- ✗ Slow: 2-3 seconds for single example
- ✗ No branch and speculation support
- ✗ Lengthy input format
- ✓ Expressive property specification using temporal logic

# Tool Implementation

## Existing TLA<sup>+</sup> framework

- ✗ Slow: 2-3 seconds for single example
- ✓ No branch and speculation support **Added**
- ✓ Lengthy input format  
**Generate TLA<sup>+</sup> from pretty input**
- ✓ Expressive property specification using temporal logic

# Tool Implementation

## Existing TLA<sup>+</sup> framework

- ✗ Slow: 2-3 seconds for single example
- ✓ Added branch and speculation support
- ✓ Generate TLA<sup>+</sup> from pretty input
- ✓ Expressive property specification using temporal logic

## Our C++ framework

- ✓ Fast: a few ms per example
- ✓ Traces with speculation
- ✓ Consise input format
- ✗ Less expressive properties
- ✓ 3 operation modes:
  - ① Interactive
  - ② Randomized
  - ③ State Exploration

# Input Format

- Each branch instruction is followed by **misprediction region** – sequence of instruction representing the wrong branch.
- A pair of traces is generated from a single input.

|           | Res | Dep. | FU | Lat |
|-----------|-----|------|----|-----|
| <i>A</i>  | FU1 |      |    | 4   |
| <i>*B</i> | FU2 | {A}  |    | 1   |
| <i>C</i>  | FU2 |      |    | 4   |
| <i>D</i>  | FU2 |      |    | 4   |
| <i>E</i>  | FU2 |      |    | 4   |

| $\alpha$ | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10 |  |
|----------|----|----|-----|-----|-----|-----|-----|-----|-----|----|--|
| A        | IF | ID | FU1 | FU1 | FU1 | FU1 | COM |     |     |    |  |
| B        | .  | IF | ID  | FU2 | rob | rob | rob | COM |     |    |  |
| C        |    |    |     |     |     |     |     |     |     |    |  |
| D        |    |    |     |     |     |     |     |     |     |    |  |
| E        | .  | .  | IF  | ID  | FU2 | FU2 | FU2 | FU2 | COM |    |  |

  

| $\beta$ | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12 |
|---------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----|
| A       | IF | ID | FU1 | FU1 | FU1 | FU1 | COM |     |     |     |     |    |
| B       | .  | IF | ID  | FU2 | rob | rob | rob | COM |     |     |     |    |
| C       | .  | .  | IF  | ID  | X   |     |     |     |     |     |     |    |
| D       | .  | .  | .   | IF  | X   |     |     |     |     |     |     |    |
| E       | .  | .  | .   | .   | IF  | ID  | FU2 | FU2 | FU2 | FU2 | COM |    |

# Applying the definition of Binder et al.

**TA example:** Correct prediction leads to a longer execution time.

| $\alpha$ | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  | 15  | 16  |
|----------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| A        | IF | ID | FU1 | FU1 | FU1 | FU1 | COM |     |     |     |     |     |     |     |     |     |
| B        | -  | IF | ID  | rs  | rs  | rs  | rs  | rs  |     | FU2 | FU2 | FU2 | FU2 | COM |     |     |
| C        | -  | -  | IF  | ID  | FU2 | rob | COM |     |
| D        | -  | -  | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   |     |
| E        | -  | -  | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   | -   |     |
| F        | -  | -  | -   | -   | IF  | ID  | FU2 | FU2 | FU2 | FU2 | rob | rob | rob | rob | rob | COM |

| $\beta$ | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  | 15  |
|---------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| A       | IF | ID | FU1 | FU1 | FU1 | FU1 | COM |     |     |     |     |     |     |     |     |
| B       | -  | IF | ID  | rs  | rs  | rs  | FU2 | FU2 | FU2 | FU2 | COM |     |     |     |     |
| C       | -  | -  | IF  | ID  | FU2 | rob | rob | rob | rob | rob | rob | COM |     |     |     |
| D       | -  | -  | -   | IF  | ID  | X   |     |     |     |     |     |     |     |     |     |
| E       | -  | -  | -   | -   | IF  | X   |     |     |     |     |     |     |     |     |     |
| F       | -  | -  | -   | -   | -   | IF  | ID  | rs  | rs  | rs  | FU2 | FU2 | FU2 | FU2 | COM |

|    | Res.       | Dep. | Lat. |
|----|------------|------|------|
| A  | FU1        |      | 4    |
| B  | FU2        | {A}  | 4    |
| *C | <b>FU2</b> |      | 1    |
| D  | FU1        |      | 4    |
| E  | FU1        |      | 4    |
| F  | FU2        |      | 4    |

- ① Variation in latency?
- ② Slowdown?
- ③ Causality?

# Applying the definition of Binder et al.

**TA example:** Correct prediction leads to a longer execution time.

|   | 1  | 2  | 3   | 4    | 5       | 6    | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  | 15  | 16  |
|---|----|----|-----|------|---------|------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| A | IF | ID | FU1 | FU1  | FU1     | FU1  | COM |     |     |     |     |     |     |     |     |     |
| B | -  | IF | ID  | rs   | rs      | rs   | rs  | rs  | rs  | FU2 | FU2 | FU2 | FU2 | COM |     |     |
| C | -  | .  | IF  | ● ID | FU2     | rob  | rob | rob | rob | rob | rob | rob | rob | rob | COM |     |
| D | -  | .  | .   | IF   | ID      | rob  | rob | rob | rob | rob | rob | rob | rob | rob | COM |     |
| E | -  | .  | .   | .    | lat = 0 |      |     |     |     |     |     |     |     |     |     |     |
| F | -  | .  | .   | .    | .       | ● IF | ID  | FU2 | FU2 | FU2 | FU2 | rob | rob | rob | rob | COM |

- Branch Prediction
- Correct Branch Taken

↔ Latency

|   | 1  | 2  | 3   | 4    | 5   | 6    | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  | 15  |  |
|---|----|----|-----|------|-----|------|-----|-----|-----|-----|-----|-----|-----|-----|-----|--|
| A | IF | ID | FU1 | FU1  | FU1 | FU1  | COM |     |     |     |     |     |     |     |     |  |
| B | -  | IF | ID  | rs   | rs  | rs   | FU2 | FU2 | FU2 | FU2 | COM |     |     |     |     |  |
| C | -  | .  | IF  | ● ID | FU2 | rob  | rob | rob | rob | rob | rob | COM |     |     |     |  |
| D | -  | .  | .   | IF   | ID  | X    |     |     |     |     |     |     |     |     |     |  |
| E | -  | .  | .   | .    | IF  | X    |     |     |     |     |     |     |     |     |     |  |
| F | -  | .  | .   | .    | .   | ● IF | ID  | rs  | rs  | rs  | FU2 | FU2 | FU2 | FU2 | COM |  |

|    | Res. | Dep. | Lat. |
|----|------|------|------|
| A  | FU1  |      | 4    |
| B  | FU2  | {A}  | 4    |
| *C | FU2  |      | 1    |
| D  | FU1  |      | 4    |
| E  | FU1  |      | 4    |
| F  | FU2  |      | 4    |

- ① Variation in latency between "branch prediction" and "correct branch taken" events.
- ② Slowdown?
- ③ Causality?

# Applying the definition of Binder et al.

**TA example:** Correct prediction leads to a longer execution time.



- ① Variation in latency between "branch prediction" and "correct branch taken" events.
- ② Slowdown as in the original definition.
- ③ Causality?

# Applying the definition of Binder et al.

**TA example:** Correct prediction leads to a longer execution time.



- ① Variation in latency between "branch prediction" and "correct branch taken" events.
- ② Slowdown as in the original definition.
- ③ Causality as in the original definition.

# FU release by FU acquisition

Example: when a new rule is required.

| $\alpha$ | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  |
|----------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| A        | IF | ID | FU1 | FU1 | FU1 | FU1 | COM |     |     |     |     |     |     |     |
| B        | -  | IF | ID  | rs2 | rs2 | rs2 | rs2 | rs2 | FU2 | FU2 | FU2 | FU2 | COM |     |
| C        | -  | -  | IF  | ID  | rs1 | rs1 | rs1 | FU1 | rob | rob | rob | rob | rob | COM |
| D        | -  | -  | -   | IF  | ID  | FU2 | rob | rob |     |     |     |     |     |     |
| E        |    |    |     |     |     |     |     |     |     |     |     |     |     |     |
| F        |    |    |     |     |     |     |     |     |     |     |     |     |     |     |
| G        | -  | -  | -   | -   | IF  | ID  | FU2 | FU2 |     |     |     |     |     |     |

- Branch Prediction
- Correct Branch Taken
- Causality Arc
- ↔ Latency
- Δ= Slowdown

| $\beta$ | 1  | 2  | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  |     |
|---------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| A       | IF | ID | FU1 | FU1 | FU1 | FU1 | COM |     |     |     |     |     |     |     |
| B       | -  | IF | ID  | rs2 | rs2 | rs2 | rs2 | rs2 | FU2 | FU2 | FU2 | FU2 | COM |     |
| C       | -  | -  | IF  | ID  | rs1 | rs1 | rs1 | FU1 | rob | rob | rob | rob | rob | COM |
| D       | -  | -  | -   | IF  | ID  | FU2 | rob | rob |     |     |     |     |     |     |
| E       | -  | -  | -   | -   | IF  | ID  |     |     |     |     |     |     |     |     |
| F       | -  | -  | -   | -   | -   | IF  |     |     |     |     |     |     |     |     |
| G       | -  | -  | -   | -   | -   | -   | IF  | ID  |     |     |     |     |     |     |

|    | Res. | Dep. | Lat. |
|----|------|------|------|
| A  | FU1  |      | 5    |
| B  | FU2  | {A}  | 4    |
| C  | FU1  |      | 1    |
| *D | FU2  |      | 1    |
| E  | FU2  |      | 4    |
| F  | FU2  |      | 4    |
| G  | FU2  |      | 4    |

# FU release by FU acquisition

Example: when a new rule is required.



# FU release by FU acquisition

Example: when a new rule is required.



# FU release by FU acquisition

Example: when a new rule is required.



# FU release by FU acquisition

Example: when a new rule is required.



- Branch Prediction
- Correct Branch Taken
- Causality Arc
- ↔ Latency
- ← Δ=... → Slowdown



|   | Res. | Dep. | Lat. |
|---|------|------|------|
| A | FU1  |      | 5    |
| B | FU2  | {A}  | 4    |
| C | FU1  |      | 1    |
| D | FU2  |      | 1    |
| E | FU2  |      | 4    |
| F | FU2  |      | 4    |
| G | FU2  |      | 4    |

Binder's rule does not work

"4 cycles must be spent inside FU to establish causality."

# FU release by FU acquisition

Example: when a new rule is required.



- Branch Prediction
- Correct Branch Taken
- Causality Arc
- Latency
- $\Delta=$  Slowdown



|   | Res. | Dep. | Lat. |
|---|------|------|------|
| A | FU1  |      | 5    |
| B | FU2  | {A}  | 4    |
| C | FU1  |      | 1    |
| D | FU2  |      | 1    |
| E | FU2  |      | 4    |
| F | FU2  |      | 4    |
| G | FU2  |      | 4    |

## "Acquisition" Rule

The FU acquisition is always causal to the respective FU release.

# FU release by squashing

The problem with the "Acquisition" rule.



# FU release by squashing

The problem with the "Acquisition" rule.



# FU release by squashing

The problem with the "Acquisition" rule.



"Acquisition" rule does not work

"The FU acquisition is always causal to the respective FU release."

# FU release by squashing

The problem with the "Acquisition" rule.



## "Squashing" Rule

If FU is released as the result of squashing, the FU release is causal to branch resolve.

# Results

- The definition of Binder et al. can be applied in branch prediction context with minimal adjustments.
- It is not clear how to modify the rule set to stay consistent.

# Gap Problem

Even the original definition by Binder et al. has a problem.



# Gap Problem

Even the original definition by Binder et al. has a problem.



## Gap Problem

Even the original definition by Binder et al. has a problem.



To conclude:

The problem is in the causality notion itself.

## ① Introduction

## ② Hardware

## ③ Timing Anomalies

## ④ Contribution

## ⑤ Conclusion

# Conclusion

## What was Done:

- Analyzed existing timing anomaly (TA) definitions; adopted Binder's causality-based approach.
- Developed a tool to systematically generate and analyze branch prediction-induced TAs.

## Results:

- Binder's definition is adaptable, but controversial cases and a "gap problem" remain.

## Future Work:

- A new causality definition based on event constraints to address these issues (work in progress).
- Study the impact of branch predictor state.

## ⑥ Backup Slides

# Branch Prediction: possible implementations

## Additional Hardware

- Pattern History Table (PHT)
- Branch Target Buffer (BTB)

## Example: 2-bit counter

For each branch store a 4-state automaton in PHT

## Strategies

**Static:** always take, never take, take backwards.

## Dynamic:

- 1 or 2-Bit Counter
- Global or Local History
- Global share

and more...



## Binder's definition: details

7 events for each instruction

- ①  $\uparrow$  IF
- ②  $\downarrow$  IF
- ③  $\uparrow$  ID
- ④  $\downarrow$  ID
- ⑤  $\uparrow$  FU
- ⑥  $\downarrow$  FU
- ⑦ COM

### Event Time Dependency Graph

Each  $(X \xrightarrow{t} Y)$  arc means at least  $t$  cycles must pass between event  $X$  and  $Y$

### 5 Rules for ETDG arcs

- ① **Order of Pipeline Stages** (acq. of next stage after rel. of previous)
- ② **Resource Use** (cycles inside res.)
- ③ **Instruction Order** (for IF, ID, COM)
- ④ **Data Dependencies**
- ⑤ **Resource Contention** (using the same FU)

**Causality Graph** = ETDG with the most relevant arcs left.

## Latency Impact on Branch Variation



## Misprediction region from CFG

