

2 March 2012 -- Computer Architectures -- part 2/2

*Surname, Name, Matricola*

## Question 1

Considering the MIPS64 architecture presented in the following:

- Integer ALU: 1 clock cycle
  - Data memory: 1 clock cycle
  - FP multiplier unit: pipelined 8 stages
  - FP arithmetic unit: pipelined 4 stages
  - FP divider unit: not pipelined unit that requires 10 clock cycles
  - branch delay slot: 1 clock cycle, and the branch delay slot is not enable
  - forwarding is enabled
  - it is possible to complete instruction EXE stage in an out-of-order fashion.

○ and using the following code fragment, show the timing of the presented loop-based program and compute how many cycles does this program take to execute?

- and using the following code fragment, show the timing of the presented loop-based program and compute how many cycles does this program take to execute?

```
; **** * MIPS64 **** *
;     for (i = 0; i < 100; i++) {
;         v5[i] = v1[i]/v2[i];
;         v6[i] = v2[i]*v3[i];
;         v7[i] = v6[i]/v4[i] + v3[i];
;     }
```

```

    .data
V1: .double "100 values"
...
V4: .double "100 values"
V5: .double "100 zeros"
...
V7: .double "100 zeros"

    .text
main: daddui r1,r0,0
      daddui r2,r0,100
loop: l.d f1,v1(r1)
      l.d f2,v2(r1)
      div.d f5,f1,f2
      s.d f5,v5(r1)
      l.d f3,v3(r1)
      mul.d f6,f2,f3
      l.d f4,v4(r1)
      div.d f7,f6,f4
      add.d f7,f7,f3
      s.d f6,v6(r1)
      s.d f7,v7(r1)
      daddi r2,r2,-1
      daddui r1,r1,8
      bnez r2,loop
      Halt

```

| comments               | Clock cycles          |
|------------------------|-----------------------|
| r1 ← pointer           | 5                     |
| r2 <= 100              | 1                     |
| f1 <= v1[i]            | 1                     |
| f2 <= v2[i]            | 1                     |
| f5 ← v1[i]/v2[i]       | 11                    |
| v5[i] ← f5             | 1                     |
| f3 <= v3[i]            | 1                     |
| f6 ← v2[i]*v3[i]       | 9                     |
| f4 <= v4[i]            | 0                     |
| f7 ← v6[i]/v4[i]       | 10                    |
| f7 ← v6[i]/v4[i]+v3[i] | 4                     |
| v6[i] ← f6             | 0                     |
| v7[i] ← f7             | 1                     |
| r2 ← r2 - 1            | 1                     |
| r1 ← r1 + 8            | 1                     |
|                        | 1                     |
|                        | 1                     |
| Total                  | (43 · 100) + 6 = 4306 |

## Exam 2 March 2012

... . . . . 45 26

$$\begin{array}{r} 45 \ 26 \\ \times 11 \\ \hline 46 \end{array}$$

w

$$\begin{array}{r} 46 \\ \times 11 \\ \hline 46 \end{array}$$

m w

$$\begin{array}{r} 16 \\ \times 11 \\ \hline 16 \end{array}$$

DE M W

$$\begin{array}{r} 22 \\ \times 11 \\ \hline 22 \end{array}$$

F S D E M W

$$\begin{array}{r} 4 \\ \times 11 \\ \hline 4 \end{array}$$

F x x x x

Calculation:

$$(46 \cdot 100) + 6 = 4606$$

4606

1 6 | 1

60 . . 23

4306



w  
n w  
E M w  
x x x x

|        |                                                      |                   |           |                |           |                |
|--------|------------------------------------------------------|-------------------|-----------|----------------|-----------|----------------|
|        | 1 . . . . .                                          | 6   1 - . . . . . | . . . . . | 15 - . . . . . | . . . . . | 30 - . . . . . |
| daddui | F D E M W                                            |                   |           |                |           |                |
| daddui | F D E M W                                            |                   |           |                |           |                |
| ld     | F D E M W                                            |                   |           |                |           |                |
| ld     | F D E M W                                            |                   |           |                |           |                |
| div    | F D (S d d d d d d d d d) M W                        |                   |           |                |           |                |
| sd     | F (D S)(S S S S S S S S E) M W                       |                   |           |                |           |                |
| ld     | (F S)(D S) r S S S S S S E M W                       |                   |           |                |           |                |
| mul    | (F ' S S S S S S S S) D (S m m m m m m m m m m) M UW |                   |           |                |           |                |
| ld     | F D E M W                                            |                   |           |                |           |                |
| div    | F D (S S S S S S S d d d d d d d d d d) M W          |                   |           |                |           |                |
| add    | F (S S S r S S S D)(S S S S S S S S S Q Q Q Q) M W   |                   |           |                |           |                |
| sd     | (F S S S S S S) D E M W                              |                   |           |                |           |                |
| sd     | F (D S S S S S S)(S S r S E) M W                     |                   |           |                |           |                |
| daddi  | (F S S S S S S)(D S S S S S) E M                     |                   |           |                |           |                |
| daddui | (F S S S S S) D E                                    |                   |           |                |           |                |
| bnez   | F O                                                  |                   |           |                |           |                |
| halt   | F                                                    |                   |           |                |           |                |

... 43 ... 46 |

5

|

|

11

|

|

9

0

10

4

0

|

w

|

M w

|

E M w

|

D E M w

|

4406 ✓

## 2 March 2012 -- Computer Architectures -- part 2/2

Surname, Name, Matricola

---

### Question 2

Considering the same loop-based program, and assuming the following processor architecture for a superscalar MIPS64 processor implemented with multiple-issue and speculation:

- issue 2 instructions per clock cycle
- jump instructions require 1 issue
- handle 2 instructions commit per clock cycle
- timing facts for the following separate functional units:
  - i. 1 Memory address 1 clock cycle
  - ii. 1 Integer ALU 1 clock cycle
  - iii. 1 Jump unit 1 clock cycle
  - iv. 1 FP multiplier unit, which is pipelined: 8 stages
  - v. 1 FP divider unit, which is not pipelined: 10 clock cycles
  - vi. 1 FP Arithmetic unit, which is pipelined: 4 stages
- Branch prediction is always correct
- There are no cache misses
- There are 2 CDB (Common Data Bus).

- Complete the table reported below showing the processor behavior for the 2 initial iterations.

| # iteration |                | Issue | EXE       | MEM | CDB x2 | COMMIT x2 |
|-------------|----------------|-------|-----------|-----|--------|-----------|
| 1           | l.d f1,v1(r1)  | 1     | 2m        | 3   | 4      | 5         |
| 1           | l.d f2,v2(r1)  | 1     | 3m        | 4   | 5      | 6         |
| 1           | div.d f5,f1,f2 | 2     | 6d        |     | 16     | 17        |
| 1           | s.d f5,v5(r1)  | 2     | 4m        |     |        | 17        |
| 1           | l.d f3,v3(r1)  | 3     | 5m        | 6   | 7      | 18        |
| 1           | mul.d f6,f2,f3 | 3     | 8x        |     | 16     | 18        |
| 1           | l.d f4,v4(r1)  | 4     | 6m        | 7   | 8      | 19        |
| 1           | div.d f7,f6,f4 | 4     | 17d 26d 6 |     | 27 36  | 28 37     |
| 1           | add.d f7,f7,f3 | 5     | 28x 37x   |     | 32 41  | 33 42     |
| 1           | s.d f6,v6(r1)  | 5     | 7m        |     |        | 33 42     |
| 1           | s.d f7,v7(r1)  | 6     | 8m        |     |        | 34 43     |
| 1           | daddi r2,r2,-1 | 6     | 7i        |     | 8      | 35 43     |
| 1           | daddui r1,r1,8 | 7     | 8i        |     | 9      | 35 44     |
| 1           | bneq r2,loop   | 8     | 9j        |     |        | 36 44     |
| 2           | l.d f1,v1(r1)  | 9     | 10m       | 11  | 12     | 37 45     |
| 2           | l.d f2,v2(r1)  | 9     | 11m       | 12  | 13     | 37 45     |
| 2           | div.d f5,f1,f2 | 10    | 16d       |     | 26     | 46        |
| 2           | s.d f5,v5(r1)  | 10    | 12m       |     |        | 46        |
| 2           | l.d f3,v3(r1)  | 11    | 13m       | 14  | 15     | 47        |
| 2           | mul.d f6,f2,f3 | 11    | 16x       |     | 24     | 47        |
| 2           | l.d f4,v4(r1)  | 12    | 14m       | 15  | 16     | 48        |
| 2           | div.d f7,f6,f4 | 12    | 36d       |     | 46     | 48        |
| 2           | add.d f7,f7,f3 | 13    | 47x       |     | 51     | 52        |
| 2           | s.d f6,v6(r1)  | 13    | 15m       |     |        | 52        |
| 2           | s.d f7,v7(r1)  | 14    | 16m       |     |        | 53        |
| 2           | daddi r2,r2,-1 | 14    | 15i       |     | 16     | 53        |
| 2           | daddui r1,r1,8 | 15    | 16i       |     | 17     | 54        |
| 2           | bneq r2,loop   | 16    | 17j       |     |        | 54        |