

---

# Computer Architecture

## Dynamic scheduling with Scoreboard Algorithm

14348008, Mingxin Chen, Network Engineering - December 26, 2016

---

## Abstraction

This report is about a program for simulating the dynamic scheduling of instruction in the processor with the algorithm Scoreboard which is usually implied by the hardware in the processor of computers. The program is able to check the dependence between the instructions and try to dispatch it, issue it, execute it and write the result back out of order. I will summarise the algorithm supporting first and then introduce the design briefly, the last part is the usage of the program.

## Algorithm

In order to promote the level of instruction-level parallelism (ILP), there are several ways of pipeline design and instruction scheduling. This first theory is to divide the execution of instruction into several stages, which can keep all components in the processor busy in an ideal condition with pipeline technology. However, the execution depending may not be as useful as the designers' expectation even if there are bypasses in the pipeline. There are other hazards from which the processor cannot escape, for example, structure hazard( which does not occur in the five stages pipeline) and WAR hazard which will limit the performance. Designers naturally considered the possibility of Out of Order Scheduling in order to utilise instructions executable to full up the bubbles which generate because of stalling.

In this project, the algorithm for Out of Order Scheduling is named Scoreboard. Scoreboard has no bypass or renaming function, which is just a simple implement for the dynamic scheduling inside of the superscalar processor. However, we are already able to learn the concept of Out of Order from it.

There are four kinds of hazards which need to be discovered by the processor as the table below. If the hazard was occurred before the instruction entered the stage, the instruction would be being stalled until the hazard disappears.

| Stage  | D                       | S   | X | W   |
|--------|-------------------------|-----|---|-----|
| Hazard | WAW<br>Structure hazard | RAW | - | WAR |

The fetch stage had not been put in the table because scoreboard algorithm does not involve the fetch stage which is in order. Although it cannot be denied that structure hazard for instruction buffer may occur if the buffer ran out of the space, we will not consider that

---

in the program as the limitation in the number of function unit have more negative influence on the performance than the lack of instruction buffer space. So we assume that the instruction buffer space is infinite.

As I have mentioned above, scoreboard will still take the action of stalling when it came to hazard. The method it increases the instruction level parallelism(ILP) is to execute the instructions with no dependence first to burst the bubbles. According to the design pattern, only the structure hazard and RAW hazard can really limit the order handled.

The approach is not complex so it will not be explained in detail.

## Design

The design of the simulator can be divided into three parts such as the interaction part that communicates the user and the program, hardware part that copies the elements in the processor and data structure that represents some of the register structures for data transmission.

### Interaction Part

Window. The window displays the status and data inside of the processor and listens to the commands from users. After the user loaded a set of instructions or entered them on the table, the window would try to create a new task if the “START” button was clicked. Then, the user can choose to take one step forward or finish the execution.

Task. The task represents one mission about scheduling a set of instructions. This class is the intermediary between the window which is to interact with users and the processor which is the kernel of the program. After one object of task was created, the user can try to move the processor ahead by one clock or just finish this task. The purpose of setting the class Task is to separate the algorithm logic and the interface design.

### Hardware Part

Processor. Processor class contained the objects of the following classes. It can schedule the instruction set in a better order which is the only ambition of the scoreboard algorithm.

Instruction Buffer. Instruction buffer stores the instructions and the information about scheduling such as whether the instruction was finished, the current stage of the instruction, the time of dispatching, issuing, executing and writing back. Limited by the interface design, the maximum of the instructions is seventeen but it can be guaranteed

that the correctness is unbeatable no matter how many the instructions are needed to be executed.

**Function Unit Table.** Function unit table contains the information of the function units. There are five function units in this case, two for floating point number and one for scalar number, one for loading and one for storing.

**Register Status Table.** Register status table records the willing to write back for each register according to the information in function unit table. The status is the name of one of function units.

## Data Structure

**Instruction.** Instruction is a class utilised to store the option, source registers, destination register and some execution information of the instruction. It is designed to assist the programming.

**Note.** Note class contains a static String which records every movement of the processor, which provide an entrance for the user to know the process of dynamic scheduling.

| Class                   | Features                                                                                                  | Main Function                                                                                                                                                                                              |
|-------------------------|-----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Window                  | Task<br>Screen                                                                                            |                                                                                                                                                                                                            |
| Task                    | Processor                                                                                                 |                                                                                                                                                                                                            |
| Process                 | Instruction Buffer Set<br>Instruction Set Size<br>Function Unit Table<br>Register Status<br>Current Clock | D Check(is it able to be dispatched)<br>S Check(is it able to be issued)<br>X Check(is it able to be executed)<br>W Check(is it able to be written back)<br>Finish Check(is it able to take one more step) |
| Instruction Buffer Unit | Instruction<br>D Cycle<br>S Cycle<br>X Cycle<br>W Cycle<br>Finish                                         |                                                                                                                                                                                                            |
| Instruction             | Instruction Option<br>R<br>R1<br>R2<br>Clock Needed in X stage<br>Finish                                  | Function Unit Needed Check(find out the function unit that which is needed by the certain instruction)                                                                                                     |

| Class                      | Features                                                               | Main Function                                    |
|----------------------------|------------------------------------------------------------------------|--------------------------------------------------|
| Function Unit Table        | Function Unit Set<br>Function Unit Set Size                            | Structure Hazard Check<br>WAR Check<br>RAW Check |
| Function Unit Table Unit   | Function Unit Name<br>Option<br>R<br>R1<br>R2<br>T1<br>T2<br>Available |                                                  |
| Register Status Table      | Register Status Unit Set<br>Register Status Unit Set Sum               | WAW Check                                        |
| Register Status Table Unit | Register Name<br>Status                                                |                                                  |
| Note                       | Note String                                                            |                                                  |

## Usage

Here are some screen slots taken while the program is running. I would like to use them to explain the usage of my program. There are four examples available for the fundamental tests but the user can input the instruction set randomly.

It is essential to point out that the order of the button to click is unchangeable because the programmer did not consider too many situations for the user's operations. The right order is "ExampleN/Input user-defined instructions -> START -> CLOCK/ FINISH -> CLEAN -> ExampleN/Input user-defined instructions -> ...".

## Operation

## Window

Open the jar file



Click Example 4



## Operation

## Window

Click START

The screenshot shows a processor simulation interface. On the left, there is a table of instructions with columns D, S, X, and W. On the right, there is a scoreboard table with columns FU, Busy, OP, R, R1, R2, T1, and T2. A large digital timer on the right displays '00'. Below the scoreboard are four buttons: START, CLOCK, FINISH, and CLEAN, each with an associated example label (Example1 through Example4). A text box on the right provides instructions for using the simulator.

| Instruction     | D | S | X | W |
|-----------------|---|---|---|---|
| addi r0,1400,r0 | - | - | - | - |
| sub r1,r2,r2    | - | - | - | - |
| ldf z(r0),f1    | - | - | - | - |
| st r2,z(r0)     | - | - | - | - |
| addf f1,f0,f2   | - | - | - | - |
| ldf z(r0),f1    | - | - | - | - |
| mulf f1,f2,f0   | - | - | - | - |
| addi r0,4,r0    | - | - | - | - |
| ld z(r1),f2     | - | - | - | - |
| div r1,r0,r2    | - | - | - | - |
| mulf f1,f2,f2   | - | - | - | - |
| stf f2,z(r2)    | - | - | - | - |
| sub r1,r1,r1    | - | - | - | - |
| subf f0,f1,f2   | - | - | - | - |
| sub r0,r0,r0    | - | - | - | - |
| st r2,z(r1)     | - | - | - | - |
| addi r0,10,r1   | - | - | - | - |

| FU  | Busy | OP | R | R1 | R2 | T1 | T2 |
|-----|------|----|---|----|----|----|----|
| ALU | -    | -  | - | -  | -  | -  | -  |
| LD  | -    | -  | - | -  | -  | -  | -  |
| ST  | -    | -  | - | -  | -  | -  | -  |
| FP1 | -    | -  | - | -  | -  | -  | -  |
| FP2 | -    | -  | - | -  | -  | -  | -  |

**Scoreboard**

@Chen Mingxin

00

START    CLOCK    FINISH    CLEAN

Example1    Example2    Example3    Example4

This program is aimed at simulating the order of the execution of instructions in the processor with Scoreboard. The set of options supported is (st, ld, add, addi, sub, subi, div, divi, mul, muli, stf, ldf, addf, subf, divf, mulf). You can enter the set by yourself or pick up one of the examples on the right.  
Version 4 Dec 18, 2016

Example/Enter -> START -> CLOCK/FINISH -> CLEAN -> ...  
eg. 'ld z(r0),f0'

Click CLOCK for 11 times

The screenshot shows the processor simulation after 11 clock cycles. The instruction table now includes columns C1 through C11. The scoreboard shows various operations being issued and completed. The digital timer on the right displays '11'. The text box on the right provides a detailed log of the executed instructions and their effects.

| Instruction     | D | S | X | W | C1 | C2  | C3  | C4  | C5  | C6 | C7 | C8 | C9 | C10 | C11 |
|-----------------|---|---|---|---|----|-----|-----|-----|-----|----|----|----|----|-----|-----|
| addi r0,1400,r0 | - | - | - | - | C1 | C2  | C3  | C4  | -   | -  | -  | -  | -  | -   | -   |
| sub r1,r2,r2    | - | - | - | - | -  | C4  | C5  | C6  | C7  | -  | -  | -  | -  | -   | -   |
| ldf z(r0),f1    | - | - | - | - | -  | C5  | C6  | C7  | C8  | -  | -  | -  | -  | -   | -   |
| st r2,z(r0)     | - | - | - | - | -  | C6  | C7  | C8  | C9  | -  | -  | -  | -  | -   | -   |
| addf f1,f0,f2   | - | - | - | - | -  | C7  | C8  | C9  | -   | -  | -  | -  | -  | -   | -   |
| ldf z(r0),f1    | - | - | - | - | -  | C8  | C9  | C10 | C11 | -  | -  | -  | -  | -   | -   |
| mulf f1,f2,f0   | - | - | - | - | -  | C9  | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| addi r0,4,r0    | - | - | - | - | -  | C10 | C11 | -   | -   | -  | -  | -  | -  | -   | -   |
| ld z(r1),f2     | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| div r1,r0,r2    | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| mulf f1,f2,f2   | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| stf f2,z(r2)    | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| sub r1,r1,r1    | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| subf f0,f1,f2   | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| sub r0,r0,r0    | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| st r2,z(r1)     | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |
| addi r0,10,r1   | - | - | - | - | -  | -   | -   | -   | -   | -  | -  | -  | -  | -   | -   |

| FU  | Busy | OP   | R  | R1 | R2 | T1 | T2  |
|-----|------|------|----|----|----|----|-----|
| ALU | x    | addi | r0 | r0 | -  | -  | -   |
| LD  | ✓    | -    | -  | -  | -  | -  | -   |
| ST  | ✓    | -    | -  | -  | -  | -  | -   |
| FP1 | x    | addf | f2 | f1 | f0 | -  | -   |
| FP2 | x    | mulf | f0 | f1 | f2 | -  | FP1 |

**Scoreboard**

@Chen Mingxin

11

START    CLOCK    FINISH    CLEAN

Example1    Example2    Example3    Example4

ldf z(r0),f1 Dispatch  
st r2,z(r0) Write back  
addf f1,f0,f2 starts execution  
ldf z(r0),f1 issue  
mulf f1,f2,f0 Dispatch  
addf f1,f0,f2 continues execution  
ldf z(r0),f1 starts execution  
mulf f1,f2,f0 RAW -> cannot issue  
addi r0,4,r0 Dispatch  
addf f1,f0,f2 continues execution  
ldf z(r0),f1 Write back  
mulf f1,f2,f0 RAW -> cannot issue  
addi r0,4,r0 issue  
ld z(r1),f2 RAW -> cannot Dispatch

## Operation

## Window

Click FINISH

**Scoreboard**

| FU  | Busy | OP | R | R1 | R2 | T1 | T2 |
|-----|------|----|---|----|----|----|----|
| ALU | ✓    | -  | - | -  | -  | -  | -  |
| LD  | ✓    | -  | - | -  | -  | -  | -  |
| ST  | ✓    | -  | - | -  | -  | -  | -  |
| FP1 | ✓    | -  | - | -  | -  | -  | -  |
| FP2 | ✓    | -  | - | -  | -  | -  | -  |

**Scoreboard**  
@Chen Mingxin

**Reg** **Status**

|    |   |
|----|---|
| F0 | - |
| F1 | - |
| F2 | - |
| R0 | - |
| R1 | - |
| R2 | - |

addi r0,10,r1 Structure hazard -> cannot Dispatch  
 subf f0,f1,f2 continues eXecution  
 sub r0,r0,r0 Write back  
 st r2,z(r1) starts eXecution  
 addi r0,10,r1 Dispatch  
 subf f0,f1,f2 Write back  
 st r2,z(r1) Write back  
 addi r0,10,r1 iSSue  
 addi r0,10,r1 starts eXecution  
 addi r0,10,r1 Write back

Finish  
Finish

End!

Example1

Example2

Example3

Example4

START  
CLOCK  
FINISH  
CLEAN

Click CLEAN

**Scoreboard**

| FU  | Busy | OP | R | R1 | R2 | T1 | T2 |
|-----|------|----|---|----|----|----|----|
| ALU |      |    |   |    |    |    |    |
| LD  |      |    |   |    |    |    |    |
| ST  |      |    |   |    |    |    |    |
| FP1 |      |    |   |    |    |    |    |
| FP2 |      |    |   |    |    |    |    |

**Scoreboard**  
@Chen Mingxin

**Reg** **Status**

|    |   |
|----|---|
| F0 | - |
| F1 | - |
| F2 | - |
| R0 | - |
| R1 | - |
| R2 | - |

addi r0,10,r1 Structure hazard -> cannot Dispatch  
 subf f0,f1,f2 continues eXecution  
 sub r0,r0,r0 Write back  
 st r2,z(r1) starts eXecution  
 addi r0,10,r1 Dispatch  
 subf f0,f1,f2 Write back  
 st r2,z(r1) Write back  
 addi r0,10,r1 iSSue  
 addi r0,10,r1 starts eXecution  
 addi r0,10,r1 Write back

Finish  
Finish

End!

Example1

Example2

Example3

Example4

START  
CLOCK  
FINISH  
CLEAN

| Operation                                                   |  | Window                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |  |  |  |
|-------------------------------------------------------------|--|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| Input the user-defined set of instructions then click START |  | <p>The screenshot shows a window titled "Scoreboard" with a timestamp "00". On the left, there is an "Instruction" table with columns D, S, X, W. On the right, there is a "Scoreboard" table with columns FU, Busy, OP, R, R1, R2, T1, T2. Below the scoreboard are buttons for "START", "CLOCK", "Example1", "Example2", "Example3", "Example4", "FINISH", and "CLEAN". To the right of the scoreboard is a log window displaying assembly code and its execution details.</p> |  |  |  |  |  |
|                                                             |  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |

## Conclusion

It is a pity that scoreboard did not contain bypass, which may be caused by the complexity of superscalar design but, in fact, it is not a fundamental problem. To eliminate the hazard as much as possible, the renaming technology and bypass are indispensable, which is implied in the Tomasulo's Scheduling algorithm. Scoreboard algorithm is the first try of dynamic scheduling by IBM 360 and it proved the possibility of out of order execution with stable ISA semantic.

The history of computer architecture has come a long way. Although it is heard that there was no landmark breakthrough after the golden era in 1960s, it is still worthwhile to be expectant to some subversive concepts, for example, the quantum computer. It is meaningful to learn about the computer architecture especially when we need to measure every slight improvement of the performance of our programs or the outcomes of highly parallel methods. The effort I have made for presenting what I had learned is inappreciable but the men who behind those theories are so respected that their contributions created the life of modern citizens. What I need to do is to understand more valuable ideas about computer and make myself a better programmer.