

# ELE 206/COS 306 Lab 6: PUnC – A Microprocessor

**Checkpoint 1 Due Date: December 8th, 2017**

**Checkpoint 2 Due Date: December 15th, 2017**

**Final Submission Due Date: January 16th, 2018 at 5PM**

**Final Demonstration Required.**

**Note:** This lab should be completed with a partner.

## Introduction

This lab is a large-scale project intended to bring together all of your knowledge on RTL design, datapath construction, and modular testing. This project is significantly more involved than previous labs, so it's best to get started early. Note that there are weekly due dates for portions of the lab, although the only demonstration is for the final product.

In this lab, you will design and build the Princeton University Computer (PUnC), a 16-bit processor with a simple but versatile instruction set. This processor is Turing complete and is a full-fledged stored program computer, so you will be able to compile code and run it on your design!

As always, you will test your circuit thoroughly via simulation. In addition, you will synthesize your Verilog design and program an FPGA so that you can physically test your processor.

To begin the lab, download and unzip *punc.zip* from Blackboard, and follow the instructions in this document. All of the source code for this lab, including testbenches, is stored in the */punc* folder. Additional documents are also found on Blackboard.

## Introduction to PUnC

PUnC is a 16-bit stored-program computer. In PUnC, all data and instructions are aligned 16-bit words, and both programs and data reside in the same memory unit. PUnC has several required hardware modules, which are described in detail in the following section.

### Hardware Modules

#### Memory

PUnC memory addresses are 16 bits long, and each unique address points to a full 16-bit data word. For example, the address 0x0000 points to the first 16-bit entry in memory, while the address 0x0001 points to the next 16-bit entry in memory.

Ideally, PUnC's memory would contain  $2^{16}$  16-bit entries, but simulation and synthesis with that many entries takes a long time. For efficiency, our version only implements the first 128 entries, leaving the rest disconnected. If you find that you need more memory elements, please ask a lab TA for help expanding the memory.

*Memory.v* contains an implementation of PUnC's memory module. It has already been wired into your datapath for use in simulation.

Here's an overview of the Memory module's ports:

| Port                  | I/O | Width | Use                         |
|-----------------------|-----|-------|-----------------------------|
| <code>clk</code>      | I   | 1     | Clock                       |
| <code>rst</code>      | I   | 1     | Synchronous Reset           |
| <code>r_addr_0</code> | I   | 16    | Async Read Address          |
| <code>r_addr_1</code> | I   | 16    | External Debug Read Address |
| <code>w_addr</code>   | I   | 16    | Synchronous Write Address   |
| <code>w_data</code>   | I   | 16    | Synchronous Write Data      |
| <code>w_en</code>     | I   | 1     | Synchronous Write Enable    |
| <code>r_data_0</code> | O   | 16    | Async Read Data             |
| <code>r_data_1</code> | O   | 16    | External Debug Read Data    |

This memory has asynchronous reads and synchronous writes. As soon as an address appears on one of the read address ports, the corresponding data will appear on the matching read data port. On the other hand, the data on `w_data` will be written to memory on positive clock edges and only whenever

whenever `w_en` is high. Resetting the memory will restore whatever program and data was initially present in the memory.

The second read port on the memory is used to allow external inspection of memory. It is already wired up for you, and it will be useful for debugging in both simulation and on the FPGA.

### Register File

PUnC has eight general-purpose 16-bit registers, addressed from `0x0` to `0x7`. These registers are used as scratch space for arithmetic operations and are used in nearly every instruction. None of the registers have a special meaning, but register 7 (`0x7`) is automatically used to save the program counter during Jump to Subroutine calls and is used to load a new program counter during the Return instruction.

*RegisterFile.v* contains an implementation of PUnC’s register file. It has already been wired into your datapath for use in simulation and synthesis.

Here’s an overview of the register file’s ports:

| Port                  | I/O | Width | Use                         |
|-----------------------|-----|-------|-----------------------------|
| <code>clk</code>      | I   | 1     | Clock                       |
| <code>rst</code>      | I   | 1     | Synchronous Reset           |
| <code>r_addr_0</code> | I   | 3     | Async Read Address          |
| <code>r_addr_1</code> | I   | 3     | Async Read Address          |
| <code>r_addr_2</code> | I   | 3     | External Debug Read Address |
| <code>w_addr</code>   | I   | 3     | Synchronous Write Address   |
| <code>w_data</code>   | I   | 16    | Synchronous Write Data      |
| <code>w_en</code>     | I   | 1     | Synchronous Write Enable    |
| <code>r_data_0</code> | O   | 16    | Async Read Data             |
| <code>r_data_1</code> | O   | 16    | Async Read Data             |
| <code>r_data_2</code> | O   | 16    | External Debug Read Data    |

Much like the memory, this register file has asynchronous reads and synchronous writes. Resetting the register file sets all registers to zero. The first two read ports on the register file are for you to use when constructing your processor – the third is used for external inspection of the register file. It is already wired up for you, and it will be useful for debugging in both simulation and on the FPGA.

## Condition Codes

PUnC also has three 1-bit condition code registers: N (Negative), Z (Zero), and P (Positive). These condition codes are set by arithmetic and load operations based on the value of the data being saved to the register file. PUnC treats that data as a two's complement number, and sets N to 1 if the number is negative, Z to 1 if the number is exactly zero, and P to 1 if the number is positive. All codes that are not set to 1 are set to zero, so only one of N, Z, and P can be 1 at any given time.

Resetting the processor should set all condition code registers to zero.

## Program Counter

In PUnC, all programs are stored in memory as a series of 16-bit binary instructions. To keep track of which instruction we should currently be executing, PUnC has a 16-bit program counter (PC) register that is used to address memory.

In ordinary operation, the PC increments by 1 immediately after fetching and decoding an instruction, so the program counter actually points to the *next* instruction to be fetched while the current instruction executes. The program counter can also be modified by control flow instructions, which will be described in detail later.

Note that the program counter's value is tied to a signal (`pc_debug_data`) that goes all the way to the top of the PUnC hierarchy in the given source code. This signal enables external inspection of the program counter, so keep it in place for simulation and synthesis. Resetting the processor should set the PC to zero.

## Instruction Register

Finally, while not architecturally required, it will be very useful to have one 16-bit instruction register (IR) that stores the currently-executing instruction. Saving fetched instructions in the IR will make it much easier to set control signals during execution.

## Other

There may be other hardware modules you need in your datapath. Feel free to include any as you see fit.

## Ports

PUnC has only seven external ports. They are listed in the table below:

| Port                        | I/O | Width | Use                        |
|-----------------------------|-----|-------|----------------------------|
| <code>clk</code>            | I   | 1     | Clock                      |
| <code>rst</code>            | I   | 1     | Synchronous Reset          |
| <code>mem_debug_addr</code> | I   | 16    | Memory Inspection Address  |
| <code>rf_debug_addr</code>  | I   | 16    | RegFile Inspection Address |
| <code>mem_debug_data</code> | O   | 16    | Memory Inspection Data     |
| <code>rf_debug_data</code>  | O   | 16    | RegFile Inspection Address |
| <code>pc_debug_data</code>  | O   | 16    | Current Program Counter    |

The memory and register file inspection addresses can be set to view the contents of those modules, and the PC value is always available for viewing on the `pc_debug_data` line.

## The LC3 Instruction Set

PUnC uses the LC3 instruction set, an educational instruction-set architecture (ISA) that was created by Professors Yale N. Patt and Sanjay J. Patel. The assignment files contain *LC3ISA.pdf*, a full specification of the LC3 architecture.

For this project, we will only be implementing a subset of the LC3 ISA. Specifically, you will implement the instructions shown in Figure 6.1. Each instruction is 16 bits long and begins with an opcode, which is a 4-bit code that indicates what type of instruction it is. Some instructions share an opcode – specifically, there are two types of ADD, two types of AND, and two types of JSR (and RET is just a special version of the JMP instruction). In the event that two instructions share an opcode, some other bit is present that distinguishes the two instructions – for example, the two ADD instructions are differentiated by the bit in position 5.

In addition to the opcode, there are several other pieces of information in each instruction. Here's a list of each named field and what it means:

- **DR** – The 3-bit address of the destination register. The value generated by the instruction (whether through an arithmetic operation or through a load from memory) is written to `regfile[DR]`.
- **SR/SR1/SR2/BaseR** – The 3-bit address of a source register. Source registers are used as operands for arithmetic operations, targets for branching instructions, or sources for stores to memory.

- **imm5/offset6/PCoffset9/PCoffset11** – Immediate values. These fields are bit sequences that are stored as part of the instruction and denote a constant value. They are used in arithmetic operations, such as calculating memory addresses for the PC or for loads and stores. Before they are used, they are sign-extended to 16 bits. This means that the highest-order bit in the immediate is copied to pad the immediate to 16 bits. For example, an immediate of  $5'b10001$  would be extended to  $16'b11111111110001$ , while an immediate of  $5'b00011$  would become  $16'b0000000000000011$ .

All of the instructions listed are described in detail in *LC3ISA.pdf* with the exception of HALT. HALT should simply stop the processor – the program counter will be left pointing to the instruction after HALT, and the only way to get the processor to continue executing is to reset it.

Refer to *LC3ISA.pdf* (pages 524-540) as you implement your PUnC design. This document can be found on Blackboard. There are a couple of key things to be aware of as you read.

First, all arithmetic operations that involve the PC are performed with the *incremented* PC. For example, when calculating the target address for the BR instruction, we add the sign-extended offset to the PC. This PC should be pointing to the instruction directly *after* the BR instruction during the computation. This should actually require almost no work to implement, as the PC should be incremented immediately after decoding but before executing an instruction.

Secondly, all instructions that are marked with a “+” sign in Figure 6.1 set the condition code registers based on whatever value is being stored into DR. No other instructions should modify the condition codes.

Finally, do not worry about any topics covered in *LC3ISA.pdf* that do not involve the instructions shown in Figure 6.1. We’re just implementing a simple subset of LC3’s functionality, so don’t worry about interrupts or LC3’s other available instructions.

|                  | 15   | 14 | 13 | 12   | 11 | 10    | 9 | 8 | 7  | 6 | 5          | 4 | 3 | 2 | 1 | 0 |
|------------------|------|----|----|------|----|-------|---|---|----|---|------------|---|---|---|---|---|
| ADD <sup>+</sup> | 0001 |    |    | DR   |    | SR1   |   | 0 | 00 |   | SR2        |   |   |   |   |   |
| ADD <sup>+</sup> | 0001 |    |    | DR   |    | SR1   |   | 1 |    |   | imm5       |   |   |   |   |   |
| AND <sup>+</sup> | 0101 |    |    | DR   |    | SR1   |   | 0 | 00 |   | SR2        |   |   |   |   |   |
| AND <sup>+</sup> | 0101 |    |    | DR   |    | SR1   |   | 1 |    |   | imm5       |   |   |   |   |   |
| BR               | 0000 |    | n  | z    | p  |       |   |   |    |   | PCoffset9  |   |   |   |   |   |
| JMP              | 1100 |    |    | 000  |    | BaseR |   |   |    |   | 000000     |   |   |   |   |   |
| JSR              | 0100 |    | 1  |      |    |       |   |   |    |   | PCoffset11 |   |   |   |   |   |
| JSRR             | 0100 |    | 0  | 00   |    | BaseR |   |   |    |   | 000000     |   |   |   |   |   |
| LD <sup>+</sup>  | 0010 |    |    | DR   |    |       |   |   |    |   | PCoffset9  |   |   |   |   |   |
| LDI <sup>+</sup> | 1010 |    |    | DR   |    |       |   |   |    |   | PCoffset9  |   |   |   |   |   |
| LDR <sup>+</sup> | 0110 |    |    | DR   |    | BaseR |   |   |    |   | offset6    |   |   |   |   |   |
| LEA <sup>+</sup> | 1110 |    |    | DR   |    |       |   |   |    |   | PCoffset9  |   |   |   |   |   |
| NOT <sup>+</sup> | 1001 |    |    | DR   |    | SR    |   |   |    |   | 111111     |   |   |   |   |   |
| RET              | 1100 |    |    | 000  |    | 111   |   |   |    |   | 000000     |   |   |   |   |   |
| ST               | 0011 |    |    | SR   |    |       |   |   |    |   | PCoffset9  |   |   |   |   |   |
| STI              | 1011 |    |    | SR   |    |       |   |   |    |   | PCoffset9  |   |   |   |   |   |
| STR              | 0111 |    |    | SR   |    | BaseR |   |   |    |   | offset6    |   |   |   |   |   |
| HALT             | 1111 |    |    | 0000 |    |       |   |   |    |   |            |   |   |   |   |   |

Figure 6.1: LC3 Instruction Set

## Instruction Stages

PUnC works in a continuous cycle of three main stages, each of which is described in detail below. PUnC is an unpipelined processor, which means that only one instruction is in flight at any time – an instruction must fully complete its execution before the next instruction can be fetched. Here are the three main phases in the life of an instruction:

1. **Fetch** – An instruction is loaded from memory into the IR ( $IR \Leftarrow \text{Mem}[PC]$ ). This phase takes one cycle.
2. **Decode** – The instruction is now in the IR. In this stage, we simply increment the PC. On a pipelined processor, we would also prepare the control signals for executing the instruction, but since PUnC is unpipelined, we'll actually do that in the execute phase. This phase takes one cycle.
3. **Execute** – The control unit sets control signals to manipulate the data-path into executing the decoded instruction. This could involve reading from memory, executing an arithmetic operation, or even modifying the PC. This phase may take as many cycles as you need, although most instructions can be executed in a single cycle. Some instructions, such as LDI, may require at least two cycles.

As PUnC processes the program stored in memory, it will cycle through these three stages for every executed instruction. The Fetch-Decode-Execute paradigm will serve as an essential starting point for your PUnC design.

## Week 1: Datapath Design

**Checkpoint 1 Due Date: December 8th, 2017**

The first step in tackling PUnC is creating a datapath. This datapath should contain all of the hardware modules listed in the description of PUnC, including:

- A Memory Unit
- A Register File
- A Program Counter
- Status Registers
- An Instruction Register
- Any other items you need, like ALU's, multiplexers, and extra registers.

### Designing Your Datapath

Sitting down to design your datapath may seem like a daunting task, but it can be broken down into a simple series of tasks:

1. Draw your memory unit, register file, program counter, status registers, and instruction register. Attach write-enable signals to all the registers, but leave the other connections blank.
2. Pick an unconnected port on one of your datapath modules (remember that the PC, status registers, and IR all have an implicit input port that holds a new value to be loaded). Then, iterate through all of the LC3 instructions (in addition to the universal fetch and decode stages) and note the possible values that port might need to take on (for input ports) or the possible places the port's value might be used (for output ports). For example, the register file's `w_data` port can take on values from memory's `r_data_0` port, from the PC, or from the result of an arithmetic operation.
3. Once you've compiled a list of the possible inputs to each port and the places the outputs must be routed to, add hardware modules as needed to enable the requisite behavior. For example, I would add a multiplexer that selects between the three possible input values to the register file's `w_data` port.

4. If you find that you need additional local registers, add them.
5. Once you've iterated through all ports and all instructions and are sure you have the elements required to support all possible operations, your datapath is ready to go!

## Drawing Your Datapath

Draw your datapath as you see fit, but please follow these guidelines:

1. Feel free to use high-level modules like adders, comparators, multi-function registers, sign-extenders, and the like. We do **not** want a gate-level description of all of these components – just make assumptions about the control signals each module needs and mark your modules so that it's clear what they do. When in doubt, keep it simple!
2. Clearly label the bit-width of every wire in your design.
3. Clearly indicate the inputs and the outputs of your datapath, and provide names for each. Signal names should be simple but descriptive enough to explain what they do. For example, a control signal that resets a counting register to zero could be labelled with `count_rst`.

Once you have a datapath design that you believe is complete, save a copy of it. If you drew it by hand, take a picture or scan it in.

## Control Signal Table

As a final step, you will set up a table that has a column for every possible phase in your processor – specifically, one column for the fetch phase, one column for the decode phase, and a column for every possible execute phase. You should have an execute phase column for every possible instruction type. If an instruction's execute phase takes multiple cycles, add an execute column for each cycle.

The rows of this table should be the datapath control signals you specified in your diagram. For each phase, fill in the slots for the control signals that matter in that phase. Leaving a cell blank for a phase indicates that the corresponding control signal is zero (for write-enable signals) or that we don't care what its value is for that phase.

For example, here's part of an example output table:

| CTRL SIG                 | Fetch | Decode | ADDR | ... |
|--------------------------|-------|--------|------|-----|
| <code>ir_w_en</code>     | 1     |        |      | ... |
| <code>status_w_en</code> |       |        | 1    | ... |
| ...                      | ...   | ...    | ...  | ... |

Feel free to use named values for multiplexer select signals instead of simple numerical values. For example, if I'm designing a control signal for a multiplexer that selects between the possible inputs to the register file's `w_data` port, then I might name the possible values of that signal `MEM_DATA`, `PC_DATA`, or `ALU_DATA` (instead of just 0, 1, or 2). These names for the multiplexer select signal paint a clearer picture of exactly what each value does to the datapath, and you can incorporate them using define macros in your code for readability.

## Writeup and Submission

Submit your datapath design on Blackboard as a single PDF – if you hand-drew it, take a picture of it and save it as a PDF using a document editor of your choice. Name your PDF `netid1_netid2_datapath.pdf`, with `netid1` and `netid2` replaced with your Princeton NetIDs. Only one student should submit on Blackboard – we will use the NetIDs on the writeup to match that submission to both students.

Save your table in an Excel or .CSV file, and name it `netid1_netid2_signals.xlsx` or `netid1_netid2_signals.csv`. To aid *anonymous* review and grading of this write-up, please don't include your name/netid inside the datapath design or signal table. Submit the signal table on Blackboard with your datapath design PDF.

Congratulations! You've finished work for week one. It's a good idea to start the next part early, as it may take longer to complete!

## Week 2: Datapath and Controller Implementation

**Checkpoint 2 Due Date: December 15th, 2017**

Now that you have a design hammered out for your datapath and the control signals necessary for each instruction, it's time to implement the datapath and controller for PUnC.

### Datapath Implementation

Let's begin with the datapath – open up *PUnCDatapath.v*, which has been started already for you. You'll notice that the register file and memory have already been instantiated. Feel free to connect wires to their ports as needed, but don't modify the signals that are already connected to their debug ports.

Several datapath module ports have already been declared as well. Do not modify these, as they'll be needed for debugging later. However, you may (and should) add your own ports to the datapath module declaration.

Finally, note that the PC and IR are already instantiated for your convenience. If you modify their names, be sure to ensure that the PC remains connected to the `pc_debug_data` port via an `assign` statement.

Write your full datapath using synthesizable Verilog. If you wish to name the possible values for multiplexer select signals, use define statements and place them in *Defines.v* so that they can be used in the controller as well. For example, I might have this set of definitions for the multiplexer select that controls which value is written to the register file:

```

1 `define RF_W_DATA_SEL_ALU    2'b00
2 `define RF_W_DATA_SEL_PC     2'b01
3 `define RF_W_DATA_SEL_MEM   2'b10

```

### Controller Implementation

Once your datapath is completed, you should implement your controller.

In this case, your finite state machine should be very simple, proceeding from Fetch to Decode to a variable number of Execute stages before cycling back to the Fetch stage. The complex part is setting the datapath control signals correctly for each phase based on the instruction being executed. Your signal table from week 1 should be helpful in this regard.

The code for the controller should be written in *PUnCCControl.v*. Once again, some code has been supplied for you to help you get started. Feel free

to modify it as you wish, adding ports to produce the control signals needed by the datapath and to pull in the data predicates generated by the datapath.

## Connection and Testing

Once both your datapath and controller have been completed, it's time to test them out.

Open *PUnC.v*, the top-level module for PUnC. This module simply serves to tie the datapath and controller together, and the provided file already has instantiated them.

Add your ports to the datapath and controller instances, wiring them together. Do not modify the port signals provided by default in *PUnC.v* – they will be used for debugging.

Once you've connected the datapath and controller, it's time to test out your design. We've included a full-fledged suite that tests individual instructions one-by-one. Additionally, the suite includes a program that uses a variety of instructions to calculate the greatest common denominator of two values (see next week's notes for more information on this test). My advice is to get the instructions working in the order that they appear in the testing output, as there are some dependencies between test cases. For example, almost all of the test cases depend on the ADDI instruction working, as it is used to load values into the register file. Finally, note that your HALT instruction must work for the testbench to function. The tests run a simple program to completion, and if the program doesn't complete by halting, then neither will the simulation.

Compile this testbench with the following command:

```
1 iverilog -g2005 -Wall -Wno-timescale -DSIM=1 -o PUnCTest PUnC.t.v
```

Fix any compiler errors or warnings, and then execute the suite with the following command:

```
1 vvp PUnCTest
```

Ensure that there are no failures. You'll see some warnings that look like this when you compile the datapath tests:

```
1 VCD warning: array word PUnCTest.punc.dpath.mem.mem[0] will conflict with an
escaped identifier.
```

Just ignore those type of warnings throughout the rest of the lab – they're caused by dumping out array contents (in this case, your memory and register file contents) to a VCD waveform file.

If you need waveforms for debugging, they will be dumped out as *PUnCTest.vcd*. You can open them with this command:

```
1 gtkwave PUnCTest.vcd
```

## A Note About the Testbench

During your debugging, it may be useful to understand what the testbench is doing and how it works. A test case looks roughly like this:

```
1 `START_TEST("addi");
2 `WAIT_PC_FREEZE;
3 `ASSERT_REG_EQ(0, 16'd3);
```

The first macro, `START_TEST`, takes as an argument the name of a memory image file from the `/images` folder. For example, the provided code will use `/images/addi.vmh`. These memory image files are simply a list of 128 sixteen-bit values that are specified in hexadecimal. They may contain comments, and each value must be on its own line. The `START_TEST` macro loads these files into memory, with the first value in the image file corresponding to the first value in memory (`mem[0]`) and so on. Then, it resets the processor.

Next, the `WAIT_PC_FREEZE` macro waits for the PC to remain unchanged for at least 10 clock cycles. Once this happens, we know that the processor has hit a `HALT` instruction. Every memory image file included in our test cases ends with a `HALT` instruction to terminate the program.

Finally, once the program runs to completion, we examine memory and the register file to ensure that the program ran successfully. The `ASSERT_REG_EQ` and `ASSERT_MEM_EQ` macros take two arguments – an address and a value – and ensure that the corresponding memory element or register holds the given value at the provided address.

All of our memory images are commented with assembly code, which is the low-level language that maps one-to-one to binary machine instructions. The *LC3ISA.pdf* document provides assembly-code examples for each instruction – read through these to understand what each assembly instruction means. Then, you can use the assembly comments to figure out what each program is doing if you find yourself mysteriously failing a test case.

## Write-up and Submission

For this week's writeup, you'll simply be submitting an updated version of your datapath and signal table, plus your current code. To aid *anonymous*

review and grading of this write-up, please don't put your name/netid inside the diagram or signal table.

Submit your updated datapath design on Blackboard as a single PDF – if you hand-drew it, take a picture of it and save it as a PDF using a document editor of your choice. Name your PDF *netid1\_netid2\_datapath.pdf*, with *netid1* and *netid2* replaced with your Princeton NetIDs. Only one student should submit on Blackboard – we will use the NetIDs on the writeup to match that submission to both students.

Save your updated signals table in an Excel or .CSV file, and name it *netid1\_netid2\_signals.xlsx* or *netid1\_netid2\_signals.csv*. Submit it on Blackboard with your datapath design PDF.

Additionally, you should submit your code for PUnC on Blackboard. It's okay if your code is not yet fully debugged – we just expect some major progress on it, but you won't lose points if it doesn't fully work.

Submit the following files on Blackboard in *netid1\_netid2\_code.zip*.

- *PUnCControl.v*
- *PUnCDatapath.v*
- *PUnC.v*

Finally, submit the following to Blackboard:

- *netid1\_netid2\_code.zip*
- *netid1\_netid2\_datapath.pdf*
- *netid1\_netid2\_signals.xlsx* or *netid1\_netid2\_signals.csv*.

Only one student should submit on Blackboard – we will use the NetIDs on the writeup to match submissions to both students.

Congratulations! You've finished the work for week 2. It's a good idea to start the next part early, as there will be many people in the EE lab during Reading Period. The next part will have to be completed in the EE lab!

## Week 3: Creating Your Own Program

**Final Due Date: January 16th, 2018 at 5 PM**

Now that your code passes our basic functional tests, you will create your own test for PUnC. For this part, you will create your own LC3 program that uses at least ten of the provided instructions to perform some sort of interesting computation.

### Our Example Program

For example, our provided program (saved in */images/gcd.vmh*) computes the greatest common denominator of two integers.

In C, the code for this program would look something like this:

```

1 int main(void) {
2     int a = 15;
3     int b = 13;
4     int gcd = 0;
5
6     while (a != b) {
7         if (a > b) {
8             a = a - b;
9         } else {
10            b = b - a;
11        }
12    }
13
14    gcd = a;
15 }
```

Once the program runs to completion, the GCD of A and B will be stored in both **a** and **b**, as well as in **gcd**.

We can translate this to LC3 assembly fairly easily, which produces this program:

```

1 // INITIAL BLOCK:
2 /*0:*/ LD R0, #23           // Load A into R0 from mem[24]
3 /*1:*/ LD R1, #23           // Load B into R1 from mem[25]
4 /*2:*/ NOT R2, R1          // Load -A into R2
5 /*3:*/ ADD R2, R2, #1       // Load -B into R3
6 /*4:*/ NOT R3, R0          // Load -B into R3
7 /*5:*/ ADD R3, R3, #1       // Load A - B into R4
8 /*6:*/ ADD R4, R0, R2       // Load A - B into R4
9 /*7:*/ BRz #13              // If A - B == 0, goto A_EQUAL_TO_B
10 /*8:*/ BRn #6               // IF A - B < 0, goto A_LESS_THAN_B
```

```

11 // A_GREATER_THAN_B:
12 /*9: */ ADD R0, R0, R2          // Load A - B into R0 (A = A - B)
13 /*10: */ NOT R3, R0             // Load -A into R3
14 /*11: */ ADD R3, R3, #1         //
15 /*12: */ ADD R4, R0, R2          // Load A - B into R4
16 /*13: */ BRp #-5               // If A - B > 0, goto A_GREATER_THAN_B
17 /*14: */ BRz #6                // If A - B == 0, goto A_EQUAL_TO_B
18
19
20 // A_LESS_THAN_B:
21 /*15: */ ADD R1, R1, R3          // Store B - A into R1 (B = B - A)
22 /*16: */ NOT R2, R1             // Store -B into R2
23 /*17: */ ADD R2, R2, #1         //
24 /*18: */ ADD R5, R1, R3          // Store B - A into R5
25 /*19: */ BRn #-11              // If B - A < 0, goto A_GREATER_THAN_B
26 /*20: */ BRp #-6                // If B - A > 0, goto A_LESS_THAN_B
27
28 // A_EQUAL_TO_B:
29 /*21: */ ST R0, #1              // Store A into mem[23] (GCD)
30 /*22: */ HALT                 // HALT -- GCD of A and B will be in R0 and R1
31
32 // DATA:
33 /*23: */ 0000                  // GCD
34 /*24: */ 000C                  // A
35 /*25: */ 000F                  // B

```

Note that the numbers at the beginning of each line are the memory addresses for each instruction and are just there for your convenience. They are **not** part of the assembly code, which is why they're commented out.

The initial numbers that we wish to take the GCD of are stored in memory at addresses 24 and 25 respectively. In this case, A is 12 and B is 15, and their GCD is 3. This program generally keeps the current value of A in R0, B in R1, -A in R2, and -B in R3. Once the HALT instruction has been hit, the GCD of A and B is stored in R0, in R1, and in memory at address 23.

Note that data can be written in assembly simply as a 16-bit hexadecimal number instead of an instruction. Hard-coding initial values in memory is a useful way to make a program easy to edit and test for different datasets.

This assembly code produces the following memory image file:

```

1 2017 // LD R0, #23
2 2217 // LD R1, #23
3 947F // NOT R2, R1
4 14A1 // ADD R2, R2, #1
5 963F // NOT R3, R0
6 16E1 // ADD R3, R3, #1
7 1802 // ADD R4, R0, R2
8 040D // BRz #13

```

```

9 0806 // BRn #6
10 1002 // ADD R0, R0, R2
11 963F // NOT R3, R0
12 16E1 // ADD R3, R3, #1
13 1802 // ADD R4, R0, R2
14 03FB // BRp #-5
15 0406 // BRz #6
16 1243 // ADD R1, R1, R3
17 947F // NOT R2, R1
18 14A1 // ADD R2, R2, #1
19 1A43 // ADD R5, R1, R3
20 09F5 // BRn #-11
21 03FA // BRp #-6
22 3001 // ST R0, #1
23 F000 // HALT
24 0000 // 0000
25 000C // 000C
26 000F // 000F
27 0000 // Zeros from here on
28 //.....

```

This file is padded with zeros until memory address 127 is reached. The memory image for the GCD program has been saved as *images/gcd.vmh*, and it was included in the TA testbench from last week's section.

## Using the Assembler

Write your own assembly program using ELE 206's custom LC3 assembler, which can be found on Blackboard. In the left-hand window, simply write your LC3 assembly code (without any comments). Once you're done, click on the "To Mem Image" button to produce a commented memory image file. Type a filename into the "Output Filename" box and click "Download Result" to download the memory image file. For consistency, save the file with a *.vmh* extension.

Once you're sure your program works as you wish, download your assembly code using the "Download ASM" button. For consistency, save your file with an *.asm* extension.

Remember that **your program must use ten unique instructions** from the provided set, and it must produce some sort of meaningful result. It doesn't need to be anything fancy, but don't just place random instructions!

## Running Your Program

Move your *.vhf* file into */images*, and then open up *PUnC.t.v*. Find the end of the provided tests (the last one is our GCD program), and add your own using the provided macros. These macros are described in detail in last week's notes on the TA testbench, but specifically, you can run your test by inserting the following sequence of commands:

```
1 `START_TEST("mytest");
2 `WAIT_PC_FREEZE;
```

This code will load your program into memory and then run it to completion. Ensure that your program produces the correct results by checking the end state of memory and the register file. Use the **ASSERT\_REG\_EQ** and **ASSERT\_MEM\_EQ** macros at your convenience to do this.

Once your program works correctly, save your assembly code – you'll need it later during synthesis, and you'll be required to turn in a commented copy of your program's assembly code as part of your final writeup. Then, move along to the final task for this project – synthesizing your processor on an FPGA.

## Week 3: Synthesizing for an FPGA

In this section, you will be programming an FPGA with a synthesized version of your PUnC code. Unfortunately, the software to synthesize your Verilog and program your FPGAs is only available on the computers in the Undergraduate Lab, so you'll have to go there to work.

### Adding Your Program to Memory

The first step in synthesizing your code is adding your program to *Memory.v*. Return to the assembler webpage, and paste your assembly code in the left-hand menu. This time, instead of clicking on “To Mem Image”, click on “To Verilog”. This will produce a block of Verilog code in the output pane. Copy this, and then paste it in *Memory.v* in between the two block comments. For example, the relevant section of our *Memory.v* looks like this:

```

1 //-----
2 // BEGIN MEMORY INITIALIZATION BLOCK
3 //   - Paste the code you generate for memory initialization in synthesis
4 //     here, deleting the current code.
5 //   - Use the assembler found on Blackboard to generate your Verilog
6 //-----
7 localparam PROGRAM_LENGTH = 26;
8 wire [DATA_WIDTH-1:0] mem_init[PROGRAM_LENGTH-1:0];
9
10 assign mem_init[0] = 16'h2017;    // LD R0, #23
11 assign mem_init[1] = 16'h2217;    // LD R1, #23
12 assign mem_init[2] = 16'h947F;    // NOT R2, R1
13 assign mem_init[3] = 16'h14A1;    // ADD R2, R2, #1
14 assign mem_init[4] = 16'h963F;    // NOT R3, R0
15 assign mem_init[5] = 16'h16E1;    // ADD R3, R3, #1
16 assign mem_init[6] = 16'h1802;    // ADD R4, R0, R2
17 assign mem_init[7] = 16'h040D;    // BRz #13
18 assign mem_init[8] = 16'h0806;    // BRn #6
19 assign mem_init[9] = 16'h1002;    // ADD R0, R0, R2
20 assign mem_init[10] = 16'h963F;   // NOT R3, R0
21 assign mem_init[11] = 16'h16E1;   // ADD R3, R3, #1
22 assign mem_init[12] = 16'h1802;   // ADD R4, R0, R2
23 assign mem_init[13] = 16'h03FB;   // BRp #-5
24 assign mem_init[14] = 16'h0406;   // BRz #6
25 assign mem_init[15] = 16'h1243;   // ADD R1, R1, R3
26 assign mem_init[16] = 16'h947F;   // NOT R2, R1
27 assign mem_init[17] = 16'h14A1;   // ADD R2, R2, #1
28 assign mem_init[18] = 16'h1A43;   // ADD R5, R1, R3
29 assign mem_init[19] = 16'h09F5;   // BRn #-11
30 assign mem_init[20] = 16'h03FA;   // BRp #-6
31 assign mem_init[21] = 16'h3001;   // ST R0, #1

```

```

32 assign mem_init[22] = 16'hF000; // HALT
33 assign mem_init[23] = 16'h0000; // 0000
34 assign mem_init[24] = 16'h000C; // 000C
35 assign mem_init[25] = 16'h000F; // 000F
36
37 //-----
38 // END MEMORY INITIALIZATION BLOCK
39 //-----

```

This causes the memory to reset to the provided initial program on reset in synthesis, but not in simulation. If you find yourself needing to go back and debug in simulation, you don't have to modify *Memory.v* again.

## The Top Module

We've included a top module for synthesis in *TopModule.v*. It's primary job is to provide ports for you to debug the processor by viewing the contents of the PC, memory, and the register file. You should not have to modify this file, but here is a list of the top module's ports and a description of each:

- **rst** – Synchronous reset for the processor.
- **pclk** – A user-activated button clock. This is one of the two clocks available for PUnC.
- **sysclk** – A 50MHz system clock. This is the other clocking option for PUnC.
- **clk\_sel** – Allows the user to select between the two clocks for the processor. When **clk\_sel** is 0, the clock will be **pclk**, and when **clk\_sel** is 1, the clock will be a slowed-down version of **sysclk**.
- **mem\_debug\_addr** – An eight-bit signal that indicates the data we wish to read from memory.
- **rf\_debug\_addr** – A three-bit signal that indicates the data we wish to read from the register file.
- **disp\_sel** – Allows the user to select between data from the register file and data from the memory to be displayed on the 4-digit seven-segment display. When **disp\_sel** is 0, the display will show the data stored in the register file at address **rf\_debug\_addr** in hexadecimal, and when **disp\_sel** is 1, the display will show the data stored in memory at address **mem\_debug\_addr** in hexadecimal.

- `pc_led` – The program counter.
- `SSEG_CA` – An 8-bit signal used to control the seven-segment display cathodes.
- `SSEG_AN` – A 4-bit signal used to control the seven-segment display anodes.

These ports are linked to physical modules on the FPGA. This will allow us to examine the PC, the register file, and memory for debugging.

## FPGA Orientation

The FPGA we'll be using is a Xilinx Artix-7 chip, which is mounted on a Digilent Basys3 board.



Figure 6.2: Full FPGA Board. 1) Power Switch 2) Xilinx FPGA IC 3) USB Programming Port 4) Switches and LED's 5) Buttons

We'll be using special software to convert your Verilog code into gate specifications to program the Xilinx FPGA. Additionally, we have included a pin specification (XDC) file that will link the inputs and outputs of the top module to physical components on the Basys3 board.

Figure 6.3 shows a closeup of the button array. The buttons that we're going to connect to `pclk` and `rst` are labeled.



Figure 6.3: FPGA Buttons

Figure 6.4 shows a closeup of the switch and LED array that we'll be using to interface with the board.



Figure 6.4: FPGA Switches, LEDs, and 4-digit Seven Segment Display

Since our memory only accommodates 128 entries, we have limited memory debug address selection to 8 bits.

## Programming Your FPGA

Next, you will synthesize your code and program your Xilinx FPGA! In order to do this, log onto one of the lab computers and open up the Xilinx Vivado Design Suite (search for “Vivado” in the Windows start menu and the program is called “Vivado 2016.2 or 2016.3”). Please verify you are using a lab computer that has a Basys3 board attached to it.

You should be presented with a main menu that offers the option to create a new project:



Figure 6.5: Xilinx Vivado Main Menu

Simply click the “Create New Project” button to start your PUnC project. A New Project Wizard will appear. Click “Next” to begin creating the project. A menu will appear that looks like this:



Figure 6.6: New Project Menu

Name your project “PUnC” and pick a location to save your project. We recommend using your H:/ drive since computer availability is not guaranteed. Then, click “Next”.

The next menu will ask you to specify the “Project Type”. Select “RTL Project” and select the “Do not specify sources at this time” option. Then, click “Next”.



Figure 6.7: New Project Part Menu

The next menu will allow you to associate the Basys board's FPGA with your project. Use the following information to locate your board in the list:

- **Family** – Artix-7
- **Package** – cpg236
- **Speed Grade** – -1

Once you have selected the appropriate board, click “Next”. Verify that the new “New Project Summary” contains the following details and then click “Finish”:

- **Default Part:** xc7a35tcpg236-1
- **Product:** Artix-7

Now, you'll see the "Project Manager" screen. Pull your source code in /punc over to the lab computer via flash drive, your H:/ drive, or a Dropbox-like service. In the left-hand "Flow Navigator" menu, shown in Figure 6.8, click the "Add Sources" button.



Figure 6.8: Flow Navigator

Select “Add or create design sources”. Then, click “Next”. You will be presented with a blank file list. Here, select the “Scan and add RTL include files into project” and ‘Copy sources into project” options, as shown in Figure 6.9 and add following file to the project:

- *TopModule.v*



Figure 6.9: Add Sources Menu

Since we have ‘`include`’ directives linking the files of our project, Vivado should scan and add all of these files automatically. This should work without error as long as the directives have been unchanged and the files are all located in the same directory. Click “Finish” once you’ve added the file.

Now, we will add the constraints file by, once again, selecting the “Add Sources” button, but this time choosing “Add or create constraints”. Now, you will add the provided *TopModule.xdc* file. Do not forget to select the “Copy constraints file into project” option!

After these steps, the sources hierarchy should look similar to Figure 6.10. If you find that a file is missing, you may need to add it manually. Alternatively, you can add all of the files manually, but when synthesizing, you may be presented with a warning – “overwriting previous definition of module..” – you can safely ignore this.



Figure 6.10: Sources Hierarchy

Finally, we're going to synthesize your code into a bitfile that will be used to program the FPGA. The *TopModule.xdc* file specifies how to connect ports in the PUnC *TopModule* file to physical LEDs, 4-digit display, and switches on the FPGA.

To generate your bitfile, complete the following steps. Each step should pass without errors:

1. **Synthesis:** Select “Run Synthesis” in the “Flow Navigator”, shown in Figure 6.8. If there are any major errors causing failure, resolve them and try again. Note: There may be a few warnings about disconnected ports, but you shouldn’t have to worry about those. If Synthesis is successful, a window will appear. From here, you will choose to “Run Implementation”.
2. **Implementation:** You can also select “Run Implementation” from the “Flow Navigator”. A window will appear if implementation is successful. From here, you will choose to “Generate Bitstream”.
3. **Generate Bitstream:** You can also select “Generate Bitstream” from the “Flow Navigator”. Once again, a window will appear if the bitstream was successfully generated.

Next, you will program the FPGA with the generated bitfile. Make sure to power on the Basys board at the lab computer before proceeding. You should see at least a few LEDs light up on the board.

1. Under “Hardware Manager” in the “Program and Debug” section of the “Flow Navigator”, select “Open Target” then “Auto-Connect”. Now the Hardware Manager should open with a “xc7a35t” device listed, as seen in Figure 6.11.
2. Click “Program Device” in the “Flow Navigator”. You can also right-click device and click “Program Device”. The *TopModule.bit* file should automatically populate the bitstream file field. If not, manually locate the bitstream file. Then, select “Program”. You can ignore any warnings about the missing debug core.
3. Programming should take only a few seconds. The Done LED on the Basys board will turn on when complete.
4. Test out your program and celebrate your success! Use the switches to explore memory and debug any issues. Do not unplug the FPGA from its power supply, as you’ll need to re-program it if it loses power.
5. **Please power off Basys3 board when leaving the lab!**



Figure 6.11: Hardware Manager Window

## Demonstration, Final Write-Up, and Submission

### Demonstration

For your demo, show your assembly code to a TA and demonstrate how it works on an FPGA. You should step through some of the program using `pclk`, explaining the various phases of your processor's execution before flipping to the fast system clock to finish off execution. Both students are expected to be present at the demo; otherwise, each partner can demo at different times, but will only receive credit at that time.

### Write-Up

For your final write-up, create a PDF that includes your revised datapath drawing, an annotated, commented copy of your program's assembly code, and an explanation of what your assembly program does. Additionally, please leave a bit of feedback regarding this lab. How much time did it take, how difficult was it, did you get stuck anywhere? There are no wrong answers here – we'll be using the feedback to adjust this lab for the future.

Feel free to create the write-up in whatever program you want to use, but save and submit it as a PDF. Name your PDF `netid1_netid2_punc_writeup.pdf`, with `netid1` and `netid2` replaced with your Princeton NetIDs.

In addition to your write-up, update your Excel spreadsheet that contains your control signal table and prepare it for final submission. Save your table in an Excel or .CSV file, and name it `netid1_netid2_signals.xlsx` or `netid1_netid2_signals.csv`.

### Submission

Delete `PUnCTest` and any `.vcd` waveform files from the `/punc` directory. You can do this via your file browser or with one of the following commands.

On OS X and Linux:

```
1 rm -f *.vcd PUnCTest
```

On Windows:

```
1 del *.vcd PUnCTest
```

Move your write-up PDF and your control signal Excel document into the `/punc` directory, and then create a `.zip` archive of that directory.

On OS X and Linux, simply change directories in your terminal to the directory that contains the `/punc` directory and then execute this command:

```
1 zip -r netid1_netid2_punc.zip ./punc
```

On Windows, simply open up File Explorer and locate the */punc* folder. From Command Prompt, you can open your current directory's parent in File Explorer by issuing this command:

```
1 explorer.exe ..
```

Right-click on the */punc* folder and select “Send to”, then “Compressed (zipped) folder”. This will create a *.zip* file – rename it as *netid1\_netid2\_punc.zip*, with *netid1* and *netid2* replaced with your Princeton NetIDs.

Submit the *.zip* file that you created to Blackboard. Only one student should submit on Blackboard – we will use the NetIDs on the writeup and zip folder to match that submission to both students.

For reference, here is a checklist of files that should be in your *.zip*:

- *Defines.v*
- *Memory.v*
- *PUnCDatapath.v*
- *PUnCControl.v*
- *PUnC.v*
- *PUnC.t.v*
- */images*, with your program *.vhm* file inside.
- *netid1\_netid2\_punc\_writeup.pdf*
- *netid1\_netid2\_signals.xlsx* or *netid1\_netid2\_signals.csv*.

Finally, submit the following to Blackboard:

- *netid1\_netid2\_punc.zip*