

- Please maintain academic integrity.
- No doubts are allowed during the quiz. If needed make your assumptions and write it down in your answer sheet.

---

1. [5 marks] In this question, assume the basic five stage pipeline discussed in class.

I1: lw \$r1, 40(\$r6)  
I2: add \$r6, \$r2, \$r2  
I3: sw \$r6, 50(\$r1)

- (a) [2 marks] Identify all read-after-write (RAW), read-after-read (RAR), write-after-read (WAR), and write-after-write (WAW) dependencies among the three instructions and write down such dependencies.
- (b) [1 mark] Assuming no data forwarding, identify hazards (if any) in this code and indicate bubbles/stalls that need to be added for avoiding these hazards.
- (c) [2 marks] Assuming data forwarding identify hazards (if any) in the given code and indicate bubbles/stalls that need to be added for avoiding these hazards.

2. [8 marks] Consider a processor with the following specification:

- Standard five stage pipeline. (denoted as F, D, E, M, W short for IF, ID, EX, MEM, WB)
- No data forwarding.
- Stalls on all data and control hazards.
- Non-delayed branches (no branch delay slot).
- Branch comparison occurs during the second stage.
- Separate Instruction and Data memory (cache).
- The same register can be read and written on the same clock cycle.

- (a) [4 marks] Count how many cycles will be needed to execute the code below and write out each instruction's progress through the pipeline by making a table similar to Table 1 with the stages (F, D, E, M, W). This is a standard pipelined implementation with no enhancements or modifications.

```
xor $r1, $r1, $r1
      addi $r1, $zero, 1024
loop: lw    $r2, 1023($r1)
      sw    $r2, 2047($r1)
      subi $r1, $r1, 1
      bne   $r1, $zero, loop
```

- (b) [4 marks] Consider the following changes to **part (a)** and the same code.

- Data forwarding enabled.
- Branch delay slot.

How many clock cycles will be needed now ? Can you modify the given code to reduce the number of cycles, but maintain its original functionality, specifically value of \$r1 at the end.

NOTE: Please include the pipeline diagram and indicate forwarding if any, and add appropriate explanation where needed.

The single-cycle pipelined datapath that we discussed in class is shown in Figures 1, for reference.

Table 1: 4(a) Non-optimized pipeline.

| Cycle → | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
|---------|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|
| Inst1   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| Inst2   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| Inst3   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| Inst4   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| Inst5   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| Inst6   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| Inst7   |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |

Figure 1: Pipelined datapath. (Adapted from *Computer Architecture* by Behrooz Parhami.)

3. [7 marks] The cache for a certain processor has a size of 64 KB and is byte-addressable. Assume that the block size is 64 bytes, the size of the physical memory is 2 GB, and the cache is 4-way set associative.

- (a) [2 marks] How many bits are needed for the cache entry fields ? (show your work)

| Tag | Index | Offset |
|-----|-------|--------|
|     |       |        |

- (b) [5 marks] For each of the following changes to the initial conditions above, indicate how these bits (i.e., the width of these fields) shift around. e.g., if a bit field stays the same, write “0”, if a bit field increases by 5, write “+5”, if a bit field decreases by 1, write “-1”.

| Change                                         | Tag | Index | Offset |
|------------------------------------------------|-----|-------|--------|
| Double the cache size                          |     |       |        |
| Change cache organization to direct-mapped     |     |       |        |
| Change the associativity to full associativity |     |       |        |