

# 314513064-lab3-report

## 1. INTRODUCTION

This report presents a Reorder Buffer (ROB) for a simplified single-issue RISC-V core to ensure in-order commit under out-of-order execution with ROB bypass. We validate with directed tests (incl. WAW) and evaluate cycles/IPC with ROB size sweeps, concluding with limits and future improvements.

## 2. DESIGN

The Reorder Buffer (ROB) is a circular queue with head/tail pointers. Each entry keeps a **valid bit**, **pending bit**, and **physical register ID** to manage instruction allocation, completion, and commit in order.

- **Alloc:**

When `rob_alloc_req_val` is high, the tail entry is marked valid and pending, assigned `rob_alloc_req_preg`, and the tail pointer advances.

- **Fill:**

When `rob_fill_val` is high, the ROB clears the pending bit of `rob_fill_slot`, marking the instruction finished but not yet committed.

- **Commit:**

If the head entry is valid and not pending, the ROB commits it by clearing its fields, advancing the head, and outputting its register (`rf_waddr`, `commit_slot`).

Using separate valid/pending bits clearly distinguishes allocation, execution, and commit stages, simplifying hazard control. Compared to the baseline, this design adds explicit control signals and synchronized head/tail updates, improving timing clarity and debug visibility. It ensures correct in-order commit, WAW handling, and provides a foundation for future multi-issue expansion.

### 3. TESTING METHODOLOGY

#### 1) riscv-test1.S

```
1 //=====
2 // riscv-test1.S -- Demonstrate why ROBs must commit in order
3 //=====
4
5 #include "riscv-macros.h"
6
7     TEST_RISCV_BEGIN
8
9     //-----
10    // Older long-latency write followed by a short write to same rd.
11    // Without a ROB, the divide completes late and overwrites x4.
12    //-----
13
14    li      x2, 40
15    li      x3, 5
16    div    x4, x2, x3
17    addi   x4, x0, 42
18
19    TEST_CHECK_EQ( x4, 42 )
20
21    TEST_RISCV_END
```

Uses a long-latency div before a short addi to the same rd, so a no-ROB core overwrites the younger value while the ROB-enforced version preserves in-order commits exactly as Section 4 requests.

#### 2) riscv-test2.S

```
1 //=====
2 // riscv-test2.S -- ROB bypass needed for consumers before commit
3 //=====
4
5 #include "riscv-macros.h"
6
7     TEST_RISCV_BEGIN
8
9     //-----
10    // The divide pins the ROB head so later instructions cannot commit.
11    // x6 must read x5's value directly from the ROB entry.
12    //-----
13
14    li      x10, 63
15    li      x11, 7
16    div    x12, x10, x11
17
18    addi   x5, x0, 7
19    add    x6, x5, x5
20
21    TEST_CHECK_EQ( x6, 14 )
22
23    TEST_RISCV_END
```

Holds the ROB head with a divide, then produces a value in x5 that must be consumed by add x6,x5,x5 before commit; the consumer only succeeds when ROB bypassing is implemented, matching the second bullet.

#### 3) riscv-test3.S

```
1 //=====
2 // riscv-test3.S -- Safe WAW thanks to long delay between the writes
3 //=====
4
5 #include "riscv-macros.h"
6
7     TEST_RISCV_BEGIN
8
9     //-----
10    // The busy loop burns enough cycles for the first write to commit
11    // before the second reaches decode, so both cores run correctly.
12    //-----
13
14    li      x5, 0
15    li      x6, 5
16
17    addi   x5, x0, 11
18
19    1:    addi   x6, x6, -1
20    bne   x6, x0, 1b
21
22    addi   x5, x0, 22
23
24    TEST_CHECK_EQ( x5, 22 )
25
26    TEST_RISCV_END
```

Demonstrates a WAW that works on both cores because a five-iteration delay loop lets the first write commit long before the second issues; this matches the requirement to explain the timing condition.

#### 4) riscv-test4.S

```
1 //=====
2 // riscv-test4.S -- Branch thrash where in-order beats speculative OOO
3 //=====
4 // riscvlong never speculates past a branch, but riscvooo does. This loop
5 // alternates the branch outcome every iteration, so riscvooo repeatedly
6 // mispredicts, wastes fetch/issue bandwidth, and can end up with a lower
7 // IPC even though the work is identical.
8 //=====
9
10 #include "riscv-macros.h"
11
12     TEST_RISCV_BEGIN
13
14     li    x4, 0          // iterations counted on even branch path
15     li    x5, 0          // iterations counted on odd branch path
16     li    x6, 8          // total iterations (keep < 30 inst overall)
17     li    x7, 0          // toggles branch direction
18
19     1:    xorl  x7, x7, 1
20         bne   x7, x0, 2f
21         addi  x4, x4, 1
22         j     3f
23     2:    addi  x5, x5, 1
24     3:    addi  x6, x6, -1
25         bne   x6, x0, 1b
26
27         add   x8, x4, x5
28     TEST_CHECK_EQ( x8, 8 )
29
30     TEST_RISCV_END
```

Alternating branch outcomes explode speculation/misprediction in riscvooo but not in riscvlong (which lacks branch prediction/speculative issue), so riscvlong achieves higher IPC as the prompt asks.

#### 5) riscv-test5.S

```
1 //=====
2 // riscv-test5.S -- Fill every ROB entry before the head can drain
3 //=====
4
5 #include "riscv-macros.h"
6
7     TEST_RISCV_BEGIN
8
9     //-----
10    // A long-latency divide (after two quick immediates) holds the ROB
11    // head, while the next fifteen independent writes (x4-x18) allocate
12    // the remaining slots in the 16-entry buffer.
13    //-----
14
15    li    x1, 120
16    li    x2, 3
17    div   x3, x1, x2
18
19    addi x4, x0, 1
20    addi x5, x0, 2
21    addi x6, x0, 3
22    addi x7, x0, 4
23    addi x8, x0, 5
24    addi x9, x0, 6
25    addi x10, x0, 7
26    addi x11, x0, 8
27    addi x12, x0, 9
28    addi x13, x0, 10
29    addi x14, x0, 11
30    addi x15, x0, 12
31    addi x16, x0, 13
32    addi x17, x0, 14
33    addi x18, x0, 15
34
35     TEST_CHECK_EQ( x3, 40 )
36     TEST_CHECK_EQ( x18, 15 )
37
38     TEST_RISCV_END
```

Issues an initial long div plus 15 independent addis, occupying all 16 ROB slots defined in riscvooo-InstMsg.v (SLOTS = 16), satisfying the “as many ROB slots as your design can” clause.

## 4. EVALUATION

| Benchmark                 | riscvlong<br>(cycles) | riscvo00<br>(cycles) | ΔCycle<br>(%) | riscvlong<br>(IPC) | riscvo00<br>(IPC) | ΔIPC<br>(%) |
|---------------------------|-----------------------|----------------------|---------------|--------------------|-------------------|-------------|
| <b>vvadd</b>              | 1449                  | 1524                 | +5.2%         | 0.7357             | 0.6975            | -5.2%       |
| <b>vvadd-rand</b>         | 9788                  | 10396                | +6.2%         | 0.1087             | 0.1022            | -6.0%       |
| <b>masked-filter</b>      | 10047                 | 11253                | +12.0%        | 0.6933             | 0.6188            | -10.7%      |
| <b>masked-filter-rand</b> | 54408                 | 60190                | +10.7%        | 0.1280             | 0.1157            | -9.6%       |
| <b>cmplx-mult</b>         | 3912                  | 4094                 | +4.7%         | 0.7543             | 0.7201            | -4.5%       |
| <b>cmplx-mult-rand</b>    | 18751                 | 27117                | +44.6%        | 0.1112             | 0.1087            | -2.2%       |
| <b>bin-search</b>         | 1541                  | 1577                 | +2.3%         | 0.7320             | 0.7134            | -2.5%       |
| <b>bin-search-rand</b>    | 5586                  | 6211                 | +11.2%        | 0.2014             | 0.1811            | -10.1%      |

### Discussion: riscvlong vs. riscvo00

riscvo00 with a Reorder Buffer works correctly but shows **slightly lower performance** than riscvlong. On average, it uses **5–15% more cycles** and achieves **5–10% lower IPC**, mainly because the tested programs have **low instruction-level parallelism (ILP)**.

For serial workloads (e.g., *vvadd*, *masked-filter*, *cmplx-mult*), the commit stage remains in-order, so the ROB only adds minor overhead.

In random or memory-heavy benchmarks (e.g., *vvadd-rand*, *masked-filter-rand*), the ROB often fills up while waiting for slow loads to commit, causing backpressure and stalls.

For control-heavy cases (*bin-search*), both cores behave similarly, since branches dominate execution time.

## 5. FIGURES



Figure 1: Reorder buffer