

## *Computer Architecture - Lab Assignment 3*

# **Performance evaluation of pipelined processors**

The principal objective if this lab assignment 3 consists in the evaluation of the pipelined processors Nios V/m and Nios V/g. Additionally, we analyze the effect on the performance of the software technique based on reordering instructions. Finally, we propose to do an exercise where we theoretically evaluate a modification of the Nios V microarchitecture using the data obtained during the hands-on activity.

This assignment is divided into 4 parts which are the following

Part 1. Analysis of the usage of instructions types in a benchmark program and the CPI of Nios V/{m,g} processors.

You will do an analysis of the executed instructions in a benchmark program to found out the percentage of each type of the instruction: ALU, memory, jump/branch.

Part 2. You will do an analysis of the performance of the pipelined processors Nios V/m and Nios V/g to know in which circumstances of the execution of a benchmark program is limited by memory accesses or by ALU operations. Additionally, we will analyze the reduction in performance of these processors that is caused by the branch and jump instructions.

Part 3. We compare the effect on performance of the pipelined processors Nios V/m and Nios V/g that is caused by the software technique based on reordering instructions.

Part 4. We propose you to evaluate the new design of the pipelined processor.

The material for this lab assignment includes the following files:

- benchNIOSV2024\_dotProduct
- benchNIOSV2024\_Part2
- benchNIOSV2024\_Part3
- NVgdCache512B-4bytes

You will use the Terasic DE0-Nano board that it is found in the Computer Architecture laboratory at the School of Computer Science of the ULPGC university (see Figure 1). You should connect this board to a desktop computer. Then, you should execute the Nios V Command Shell that allows to interact with the board (see Figure 2). Using this command shell, the board will be configured several times with various SoC microarchitectures. In Figure 3, the microarchitecture of two SoCs called *DE0-Nano Nios V/m Basic Computer* and *DE0-Nano Nios V/g Basic Computer* can be seen. Both configurations will activate different computers on the same DE0-Nano board.

## **Part I. Analysis of the usage of instructions types in a benchmark program and the CPI of Nios V/{m,g} processors**

General description. You will use a synthetic benchmark named *benchNIOSV2024\_dotProduct* to analyze the mix of RISC-V instructions for the 32-bit Nios V soft processor. This program implements the dot product



Figure 1: DE0-Nano board.

```
Administrator: Nios V Command Shell (Quartus Prime 23.1std)
Entering Nios V shell
Microsoft Windows [Versión 10.0.19045.5131]
(c) Microsoft Corporation. Todos los derechos reservados.

[niosv>shell] C:\altera\23.1std>
```

Figure 2: Nios V Command Shell (integrated into Intel/Altera Quartus Prime Standard 23.1).



Figure 3: Microarchitectures of the soft System-on-Chips named *DE0-Nano Nios V/m Basic Computer* and *DE0-Nano Nios V/g Basic Computer*.

of two vectors repetitively, as indicated by the constant `ITER_BENCH` (see the main program in the file called `benchNIOSV2024_dotProduct.s`).

Objective 1. Classify the instructions by counting the number of times that executes each one in the subroutine `PRODUCTO_ESCALAR` of the source code: `producto_scalar.s`. Figure 2. 4 shows the flowchart of the principal activities that the benchmark does.

Objective 2: Now calculate the total number of executed instructions and the percentage of each one type of instruction (ALU, MEMORY, JUMP and others) and fill in the Table 1.

Objective 3: Register the total number of clock cycles that execute the benchmark, for the both processors Nios V/m and Nios V/g, and calculate the CPI of the program for these processors. **Important:** Count only the executed instructions that are integrated into the compute kernel and assume that the rest of instruction do not influence significantly on the number of instructions involved in obtaining the CPI.

Hands-on method for the Nios V/m pipelined processor:

1. Create a new directory, for example: `part1` and make cd. Execute the following command in the Nios V Command Shell Window.

```
$ cd part1
$ sh
```

2. Copy the files: `benchNIOSV2024_dotProduct.s`, `productoEscalar.s`, `escribir_jtag.s`, `DIV.s`, `BCD.s`, `Makefile` in this directory.
3. Compile the benchmark program: `benchNIOSV2024_dotProduct`.

```
$ riscv32-unknown-elf-as.exe benchNIOSV2024_dotProduct.s -alsg -o
benchNIOSV2024_dotProduct.s.obj > benchNIOSV2024_dotProduct.s.s.log
$ riscv32-unknown-elf-as.exe productoEscalar.s -alsg -o productoEscalar.s.obj
> productoEscalar.s.log
$ riscv32-unknown-elf-as.exe escribir_jtag.s -alsg -o escribir_jtag.s.obj >
escribir_jtag.s.log $ riscv32-unknown-elf-as.exe DIV.s -alsg -o DIV.s.obj >
DIV.s.log
$ riscv32-unknown-elf-as.exe BCD.s -alsg -o BCD.s.obj > BCD.s.log
```

4. Link the benchmark program: `benchNIOSV2024_dotProduct`.

```
$ riscv32-unknown-elf-ld.exe -g -T linker_SDRAM.x -nostdlib -e _start -u _start
--defsym __alt_stack_pointer=0x08001F00 --defsym __alt_stack_base=0x08002000
--defsym __alt_heap_limit=0x8002000 --defsym __alt_heap_start=0x8002000
-o benchNIOSV2024_dotProduct.elf benchNIOSV2024_dotProduct.s.obj
productoEscalar.s.obj escribir_JTAG.s.obj BCD.s.obj DIV.s.obj
$ niosv-stack-report.exe -p riscv32-unknown-elf- benchNIOSV2024_dotProduct.elf
```

5. Configure the FPGA in the DE0-Nano board using the file `DE0_NanoBasic_Computer_22jul24.sof` into the board DE0-Nano.

```
$ quartus pgm.exe -c 1 -m JTAG -o "p;DE0_NanoBasic_Computer_22jul24.sof@1"
```

6. Download the file `benchNIOSV2024_dotProduct.elf` into the board DE0-Nano.

```
$ niosv-download.exe -g benchNIOSV2024_dotProduct.elf
```

7. Execute step by step to count the type of instructions that execute in the kernel of compute that you identify in the Figure 1 using breakpoints in the zone corresponding with the executable program.

For this stage of the method, use the *OpenOCD* and *GDB* tools. For this step, open three Nios V Command Shell terminals and execute the following commands.

#### Terminal 1

```
$ cd part1
$ sh
$ openocd-cfg-gen ./niosv.cfg
$ openocd -f ./niosv.cfg --> cursor in terminal remains blinking
```

#### Terminal 2

```
$ cd part1
$ sh
$ riscv32-unknown-elf-gdb -ex "set arch riscv:rv32" -ex "target
extended-remote localhost:3333" -ex "file benchNIOSV2024_dotProduct.elf" -ex
"load"
Are you sure you want to change the file? (y or n) y --> press "y"
(gdb) x 0x080000e4
(gdb) continue
(gdb) stepi
```

#### Terminal 3

```
$ cd part1
$ sh
$ juart-terminal.exe
```

Hands-on method for the Nios V/g pipelined processor:

Follow the same method as previously used for Nios V/m but taking the *DE0\_NanoBasic\_Computer\_23jul24.sof* configuration file for the DE0-Nano board.

#### Question 1.

What type of program is *benchNIOSV2024\_dotProduct.elf*: arithmetic, memory, or branch/jump? Justify and argue your answer.

For Question 1, analyze the benchmark program for Part I by counting the number of instructions for each type shown in Figure 1. This benchmark is divided into three sections (see Figure 4).

1. **Preamble.** In this section, three actions are executed.

- (a) Control registers of the Nios V processors are initialized for activating the interrupt controller.
- (b) Program waits for pressing the typing key ‘a’ into the keyboard.
- (c) The Timer controller of the soft SoC for the DE0-Nano board is configured. Its interrupt signal is enabled and an initial count is saved in the *Counter Start Value (low, high)* timer registers.

2. **Kernel.** In this section, the dot product of two vectors with six components in each vector repeats 5,000 times. Previously, a snapshot of the time controller allows to get the initial value for time.

3. **Epilogue.** In this section, another snapshot is registered to obtain the final value for execution time. Then, final and initial values are subtracted to calculate the time interval. Finally, this hexadecimal value is translated into BCD format and shown on the terminal.



Figure 4: Flowchart of the benchmark program whose principal program is in the `benchNIOSV2024_dotProduct.s` file.

Table 1: Percentage of instructions executed by Nios V processors for the **PRODUCTO\_ESCALAR** subroutine that is coded in the `producto_escalar.s` file

| <b>ALU instructions</b>                                    | <b>Number of executions</b> | <b>MEMORY instructions</b> | <b>Number of executions</b>      | <b>BRANCH &amp; JUMP instructions</b> | <b>Number of executions</b>      | <b>OTHER instructions</b>                      | <b>Number of executions</b> |  |
|------------------------------------------------------------|-----------------------------|----------------------------|----------------------------------|---------------------------------------|----------------------------------|------------------------------------------------|-----------------------------|--|
| addi                                                       |                             | lw                         |                                  | beq                                   |                                  | nop                                            |                             |  |
| ...                                                        |                             | ...                        |                                  | ...                                   |                                  | ...                                            |                             |  |
| ...                                                        |                             | ...                        |                                  | ...                                   |                                  | ...                                            |                             |  |
| <b>Total ALU instructions</b>                              |                             |                            | <b>Total MEMORY instructions</b> |                                       |                                  | <b>Total OTHER instructions</b>                |                             |  |
| <b>N</b><br><b>(total number of executed instructions)</b> |                             |                            |                                  |                                       |                                  |                                                |                             |  |
| <b>% ALU</b>                                               |                             | <b>% MEMORY</b>            |                                  | <b>% BRANCH &amp; JUMPS</b>           |                                  | <b>% OTHER</b>                                 |                             |  |
| <b>Cicles</b><br><b>Nios V/m</b>                           |                             |                            |                                  |                                       | <b>Cicles</b><br><b>Nios V/g</b> |                                                |                             |  |
| <b>Total CPI of program</b><br><b>Nios V/m</b>             |                             |                            |                                  |                                       |                                  | <b>Total CPI of program</b><br><b>Nios V/g</b> |                             |  |

**Question 2.**

Taking the following average number of cycles per instruction for the Nios V/m and Nios V/g soft processors: 1 cycle (ALU ops), 1 cycle (memory ops), 2 cycles (jump and branch ops), calculate the theoretical CPI of the program when both processors execute the benchmark program. Justify and argue your answer.

**Question 3.**

What are the differences you found out in the values obtained for the CPIs of Nios V/m and Nios V/g processors? What are the causes for these differences? Justify and argue your answer.

## Part II

## Part III

## Part IV

## References