

# Hardware Synthesis Laboratory

1<sup>st</sup> semester

Academic Year 2022

```
1 module fulladder (cout,s,a,b,cin);
2     output cout;
3     output s;
4     input a,b,cin;
5
6 //assign {cout,s}=a+b+cin;
7 reg cout, s;
8 always @(a or b or cin)
9 begin
10     {cout,s}=a+b+cin;
11 end
12
13 endmodule
```



Krerk Piromsopa, Ph. D.

This document is a part of the 2110363 Hardware Synthesis Lab I,  
Department of Computer Engineering, Chulalongkorn University.  
All rights reserved.

## Preface

Since the exploding complexity of digital electronic circuits in the 1970s, Hardware Descriptive Language (HDL) has evolved. With the rise of FPGA, Hardware Synthesis (together with HDL) has become a fundamental skill in Computer Engineering.

The department of Computer Engineering, Chulalongkorn University is among the first universities (in Thailand) to teach VerilogHDL and Hardware Synthesis Lab. (We have done this for more than 20 years.) This course has evolved from an in-house FPGA board to the standardized developer boards. Generations of students have enjoyed (and weeped) for this class. The success of this class is evidenced by a number of students that have graduated and worked in semiconductor industries (Intel, AMD, IBM, etc.)

As a hardware design enthusiasm, I recreated this Lab book to ease the study of Hardware Descriptive Language (HDL) and Hardware Synthesis Lab. My personal belief is the more you understand hardware is the better you are as a programmer.

For those that may want to further their study in digital design, this is one chance for you to show your skill. Note that we have 2-3 persons per academic year that are natural at hardware descriptive language. Most (if not all) of them usually got a Ph.D. in Hardware-related fields and work in the industries. For those of you that may not be comfortable with hardware design, the minimum requirement for this class is to understand and use the digital tools.

Hopefully, students will learn something useful for their future from this class

Enjoy,

Krerk Piromsopa, Ph.D.

Associate Professor

Department of Computer Engineering

Chulalongkorn University

(Updated: August 9, 2022)

## Table of Contents

|                                                                      |    |
|----------------------------------------------------------------------|----|
| Laboratory 1: Introduction to VerilogHDL and Digital Simulation..... | 5  |
| Objectives.....                                                      | 5  |
| Background.....                                                      | 5  |
| Exercises.....                                                       | 5  |
| Laboratory 2: Time-Division Multiplexing and Clock Divider.....      | 9  |
| Objectives.....                                                      | 9  |
| Background.....                                                      | 9  |
| TDM.....                                                             | 9  |
| Clock Division.....                                                  | 10 |
| Language Templates.....                                              | 10 |
| Exercises.....                                                       | 11 |
| Laboratory 3: Counter and Switch (Debounce).....                     | 12 |
| Objectives.....                                                      | 12 |
| Background.....                                                      | 12 |
| Exercises.....                                                       | 13 |
| Laboratory 4: Memory.....                                            | 15 |
| Objectives.....                                                      | 15 |
| Background.....                                                      | 15 |
| ROM.....                                                             | 15 |
| RAM.....                                                             | 17 |
| Block RAM.....                                                       | 18 |
| Exercises.....                                                       | 20 |
| Laboratory 5: Simple CPU and Memory Mapped I/O.....                  | 21 |
| Objectives.....                                                      | 21 |
| Background.....                                                      | 21 |
| Memory Mapped I/O and Port-Mapped I/O.....                           | 21 |
| Exercises.....                                                       | 22 |

# Hardware Synthesis Laboratory I

4

|                                 |    |
|---------------------------------|----|
| Laboratory 6: VGA and UART..... | 26 |
| Objectives.....                 | 26 |
| Background.....                 | 26 |
| VGA.....                        | 26 |
| UART.....                       | 26 |
| Exercises.....                  | 27 |

## Laboratory 1: Introduction to VerilogHDL and Digital Simulation

### Objectives

1. Get students to familiar with the simulation tool (Vivado)
2. Demonstrate the basic of Verilog simulation and waveform output
3. Able to explain structural model and behavioral model *blocking*
4. Able to explain the differences between blocking and non-blocking assignments

### Background

In this lab, you will learn the fundamentals of VerilogHDL and Digital Simulation using the Vivado Design suite. Firstly, please download the free (webpack) version of Vivado Design Suite from Xilinx Web site<sup>1</sup>. The whole download is about 20GB. Alternatively, you may download it from the department server<sup>2</sup>. Should you have trouble finding a machine for installing the software, please contact the instructors. A (virtual) machine can be provided for you to remotely work with the tool. However, you may still have to install the Lab Edition to download the design to the FPGA board. For more information about the installation and Vivado IDE, please what the Xilinx tutorials' videos<sup>3</sup>.

*continuous assignment*

Please watch the demonstration video on how to use the simulation.

### Exercises

1. Complete the following 1-bit full adder and a test bench to validate such design by simulating all possible inputs. Use the Vivado tool to simulate and validate the design.

| BEHAVIORAL MODEL IN VERILOG                                                    | STRUCTURAL MODEL IN VERILOG                                                                                                          |
|--------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| A way of describing the function of a design as a set of concurrent algorithms | A way of describing functions defined using basic components such as inverters, multiplexers, adders, decoders and basic logic gates |
| Called black box modeling                                                      | Called glass box modeling                                                                                                            |
| Focuses on showing the relationships between inputs and outputs                | Focuses on constructing the design using logic gates and predefined modules                                                          |

Visit [www.PEDIAA.com](http://www.PEDIAA.com)

1 <https://www.xilinx.com/products/design-tools/vivado.html>

2 <https://mis.cp.eng.chula.ac.th/krerk/teaching/2018s2-HWSynLab/>  
Username: student, password: HWSynLab

3 <https://www.xilinx.com/products/design-tools/vivado.html#video>

# Hardware Synthesis Laboratory I

6

```
module fullAdder(cout, s, a, b, cin);
    output cout;
    output s;
    input a;
    input b;
    input cin;
    reg cout, s;

    always @ (a, b, cin)
    begin
        {cout, s} = a + b + cin;
    end
endmodule

`timescale 1ns/1ns
module tester;
```

*Assign wire = wire*

*นี่คือการกำหนดชื่อ*

*Simulate first*

```
reg a, b, cin;
wire cout, s;

fullAdder a1(cout, s, a, b, cin);

initial
begin
    // $dumpfile("time.dump");
    // $dumpvars(2, a1);
    $monitor("time %t: {%b %b} <- %d %d %d", $time, cout, s, a, b, cin);
    #0; // delay
    a=0;
    b=0;
    cin=0;
    // .....
    $finish;
end
endmodule
```

*\* ให้ตัว test case ที่!*

2. What would happen if we replace the always block of the full adder in question 1 "ที่มีอยู่" with the following module? Would it give the same result? Please run the test bench and provide your analysis.

```
module fullAdder(cout, s, a, b, cin);
    output cout;
    output s;
    input a;
    input b;
    input cin;

    assign {cout, s} = a + b + cin;
endmodule
```

```
module fullAdder(
    output cout,
    output s,
    input a,
    input b,
    input cin
);
```

3. Please modify the following latch to be a (positive edge triggering) flip flop with asynchronous reset. Please also modify the test bench to validate your design.

```
`timescale 1ns/1ns

module DFlipFlop(q, clock, nreset, d);
    output q;
    input clock, nreset, d;

    reg q;
    posedge clock, posedge nreset
    always @ (clock)
    begin
        if (nreset==1)
```

```
    q=d;
    else
        q=0;
end
endmodule

module testDFlipFlop();
    reg clock, nreset, d;
    DFlipFlop D1(q, clock, nreset, d);
    always
```



Krerk Piromsopa, Ph.D.

# Hardware Synthesis Laboratory I



```
#10    clock=~clock;

initial
begin
//$/dumpfile("testDFlipFlop.dump");
//$/dumpvars(1,D1);
#0 d=0;
clock=0;
nreset=0;
#50 nreset=1;
#1000 $finish;
end
always
#8 d=~d;
endmodule
```



4. What are the differences between the 2 provided designs? Please write a test bench to show your analysis.

ពិនិត្យវិវាទនៃ blocking ឬ non blocking

**Bus**

```
module shiftA(q,clock,d);
output [1:0] q;
input clock,d;

reg [1:0] q;

always @ (posedge clock)
begin
    q[0]=d;
    q[1]=q[0];
end
endmodule

module shiftB(q,clock,d);
output [1:0] q;
input clock,d;

reg [1:0] q;

always @ (posedge clock)
begin
    q[0]<=d;
    q[1]<=q[0];
end
endmodule
```

\* និមួយនានានឹងរឿងសម្រាប់បានប្រើប្រាស់នៅក្នុងការសម្រេចការណ៍។

shift A → shift B

clock (input) d

q1, q2 (output)

Shift Register

Non-blocking assignment (តាក់បិន្តិភាគ អំពីការរំនួរការណ៍)

Timing diagram showing the state transitions of two shift registers, q1 and q2, over time. The clock signal has rising edges at 10.000 ns, 50.000 ns, 100.000 ns, and 150.000 ns. The input d is asserted during the rising edge of the clock. The states of q1 and q2 are shown as binary values. Handwritten note: 'មិនធ្វើការសម្រេចការណ៍ តាមលទ្ធផល 3'.

5. Please answer the following questions and submit (in PDF format) to CourseVille on Friday before 23:59 (midnight).

1. Please draw a schematic representing the logical blocks of both shiftA and shiftB in exercise 4.
2. What is the difference between blocking and non-blocking assignments?

5. Please answer the following questions and submit (in PDF format) to CourseVille on Friday before 23:59 (midnight).

1. Please draw a schematic representing the logical blocks of both shiftA and shiftB in exercise 4.
2. What is the difference between blocking and non-blocking assignments?

Soln1) Shift A

= Output

```

module shiftA(q,clock,d);
output [1:0] q;
input clock,d;

reg [1:0] q;

always @ (posedge clock)
begin
    q[0]=d;
    q[1]=q[0];
end
endmodule

module shiftB(q,clock,d);
output [1:0] q;
input clock,d;

reg [1:0] q;

always @ (posedge clock)
begin
    q[0]<=d;
    q[1]<=q[0];
end
endmodule

```

Shift B

- 2) blocking = ការដំឡើងដែលបានបញ្ចប់  
nonblocking = ការដំឡើងដែលបានបញ្ចប់

3. Is it possible to apply parameters to the design in exercise 4 to create shiftRegister with any number of bits? If Yes, please explain how.

ອ່ານວ່າ ແມ່ນມີກຳນົດຕາງຫຼາຍຫຼາຍເພື່ອ output ທີ່ຈະມີກຳນົດຕາງຫຼາຍຫຼາຍເພື່ອ Design ມີ Shift B

## Laboratory 2: Time-Division Multiplexing and Clock Divider

### Objectives

1. Get students to familiar with the synthesis tool (Vivado)
2. Synthesis FPGA
3. Able to explain time-division multiplexing
4. Able to design clock divider
5. Able to use a language template.

### Background

#### TDM

To drive a seven-segment display (whether it is common anode or common cathode), each digit would require 9 wires (a to g, dot, common ground or common vcc). With several digits of seven-segment display, the number of wires would intuitively multiplied. For example, 4 digits of seven-segment displays may require up to 33 wires (a to g and dot for each digit with a sharing common wire). It is not practical to have so many wires. To share (reduce) physical wires, Time-Division Multiplexor is introduced.

With time-division multiplexing, we can share 8 wires (a to g and dot) among the displays. Only a digit will be active at a time. If the segments turn on and off at the appropriate rate (I.e. 15 frames per second or more), the observer would see it as if all segments are on at the same time.--hence the term time-division multiplexing. This way, four digits of seven-segment displays can be connected with only 12 wires (8 from a to g and dot + 4 for activating digits).



For more details about time-division multiplexing, please watch the demonstration video.

## Clock Division

There are several ways to divide high frequency clocks into slower clocks. A simple solution is to cascade D flip flops (or even T flip flops) together by feeding  $\sim Q_0$  to D0 and feed Q0 as a clock for D1 (and so on). Nonetheless, this is just one implementation of the clock division. You may use a counter to set and clear a bit as a clock division as well.



## Language Templates

Vivado IDE tool comes bundled with language templates. Language templates are basically code snippets for HDL. You may access the language templates from **menu > Tools > Language Templates**. A language template that might be useful for this lab is 7-segment encoding.



### Wire:-

Wires are used for connecting different elements. They can be treated as physical wires. They can be read or assigned. No values get stored in them. They need to be driven by either continuous assign statement or from a port of a module.

### Reg:-

Contrary to their name, regs don't necessarily correspond to physical registers. They represent data storage elements in Verilog/SystemVerilog. They retain their value till next value is assigned to them (not through assign statement). They can be synthesized to FF, latch or combinatorial circuit. (They might not be synthesizable !!!)

Wires and Regs are present from Verilog timeframe. SystemVerilog added a new data type called logic to them. So the next question is what is this logic data type and how it is different from our good old wire/reg.

# Hardware Synthesis Laboratory I

## Exercises

1. Use your knowledge from **clock division** and **time-division multiplexor** to **display a 4-digit hexadecimal number (0x1234)** to the seven-segment display of the BASYS 3 board. Your design should be modularized (You can save the component for reuse later). There should be at least 3 modules: **clock divider**, **hex (or bcd) to 7-segment encoder** and **7-segment TDM**.



- (A Single LED)  
 หน้าจอแสดงผล ชนิด Anode เดียว
2. Please answer the following questions and submit (in PDF format) to CourseVille on Friday before 23:59 (midnight).

- a. Is the 4-digit seven-segment display on the BASYS 3 board a common anode for common cathode? Please explain.
- b. From the wiring of the board, which logic do you have to assign to the 7-segment pins (a to g and dot) to turn the LED on.
- c. Given that the clock of the BASYS3 is around 10ns, how many bits do you have to divide the clock with to get the appropriate clock for the TDM. Please provide your analysis (calculation).

Hint

1. Use the BASYS 3 XDC<sup>4</sup> file as a base constraint file.
2. Read the datasheet to determine the interconnection in the board.
3. Use the language templates for 7-segment encoder

| Number | Common Anode | a   | b    | c    | d    | e    | f    | g    | dp   | Notes      |
|--------|--------------|-----|------|------|------|------|------|------|------|------------|
| 0      | High         | Low | Low  | Low  | Low  | Low  | Low  | High | High | 0x00000001 |
| 1      | High         | Low | Low  | Low  | Low  | Low  | High | High | High | 0x00000002 |
| 2      | High         | Low | Low  | Low  | Low  | High | High | High | High | 0x00000003 |
| 3      | High         | Low | Low  | Low  | High | High | High | High | High | 0x00000004 |
| 4      | High         | Low | Low  | High | High | High | Low  | Low  | Low  | 0x00000010 |
| 5      | High         | Low | High | Low  | High | High | Low  | Low  | Low  | 0x00000011 |
| 6      | High         | Low | High | Low  | High | High | Low  | Low  | Low  | 0x00000012 |
| 7      | High         | Low | Low  | Low  | High | High | High | High | High | 0x00000013 |
| 8      | High         | Low | Low  | Low  | Low  | Low  | Low  | Low  | Low  | 0x00000000 |
| 9      | High         | Low | Low  | Low  | High | Low  | Low  | Low  | Low  | 0x00000010 |

[fpga4student.com](http://fpga4student.com)



ต้องการ clock 100 MHz  $\rightarrow$  ต้องมี 15 Hz

```
genvar index;
generate for(index = 0; index<18; index = index+1)
begin
    ClockDivision ClockDivision(tmpClock[index+1], tmpClock[index]);
end
endgenerate

ClockDivision ClockDivision(CorrectClock, tmpClock[18]);
```

4<https://mis.cp.eng.chula.ac.th/krerk/teaching/2018s2-HaWSynLab/downloads/Basys-3-Master.xdc>

2. Please answer the following questions and submit (in PDF format) to CourseVille

on Friday before 23:59 (midnight).

a. Is the 4-digit seven-segment display on the BASYS 3 board a common anode

or common cathode? Please explain.

Ans In each digit, it shares a **common anode** because seven anodes (A to G) of the seven segments in each digit are connected together. As show in the picture below



b. From the wiring of the board, which logic do you have to assign to the **7-segment pins** (a to g and dot) to turn the LED on.

Ans Common anode must be activate low each of the seven-segment LED display. For cathodes, they can be low (to illuminated segment LED display) or high (to un-illuminated segment LED display) As show in the table below

| Number | Common-Anode | CA   | CB   | CC   | CD   | CE   | CF   | CG   | Cathode(8-0) |
|--------|--------------|------|------|------|------|------|------|------|--------------|
| 0      | high         | low  | low  | low  | low  | low  | low  | high | 1b0000001    |
| 1      | high         | low  | low  | high | high | high | high | high | 1b1001111    |
| 2      | low          | low  | high | low  | low  | high | low  | high | 1b0001010    |
| 3      | low          | low  | low  | low  | high | high | low  | high | 1b0000110    |
| 4      | high         | low  | low  | high | high | low  | high | low  | 1b1001100    |
| 5      | low          | high | low  | low  | high | low  | low  | low  | 1b0010010    |
| 6      | low          | high | low  | low  | low  | high | low  | low  | 1b0010000    |
| 7      | low          | low  | low  | high | high | high | high | high | 1b0001111    |
| 8      | low          | low  | low  | low  | low  | low  | low  | low  | 1b0000000    |
| 9      | low          | low  | low  | low  | high | low  | low  | low  | 1b0000000    |



c. Given that the **clock** of the BASYS3 is around 10ns, how many bits do you

have to divide the clock with to get the appropriate clock for the TDM.

Please provide your analysis (calculation).

Ans the 4-digit seven-segment display should be continuously refreshed at about 1KHz to 60Hz or (1ms to 16ms). Because human eyes can see between 30 and 60 frames per second.

The clock divider that I write will divide the clock by 2,

Let x is the frame per second that we want.

n is the how many times that we must divide the clock

If the clock of the BASYS3 is around 10ns → it has  $\frac{1}{10 \times 10^{-9}} = 10^8 \text{ Hz}$

$$\text{So, } \frac{10^8}{2^n} * \frac{1}{4} = x$$

Because there is 4 digits, we must divide by 4 to get frame per second of each digit.

$$\frac{10^8}{2^n} * \frac{1}{4} = 15$$

∴ 15 Frame/sec

เวลาต่อ 4 ชั่วโมง  
□ □ □ □

ดังนั้น n มีค่าประมาณ 20 บิต





| A | B | C | D | a | b | c | d | e | f | g |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
| 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |

#for a:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 1  | 0  | 1  | 1  |
| 01    | 0  | 1  | 1  | 1  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 1  | X  | X  |

F(ABCD)=  $\neg B \neg D + BD + A$

#for c:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 1  | 1  | 1  | 0  |
| 01    | 1  | 1  | 1  | 1  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 1  | X  | X  |

F(ABCD)=  $\neg C + B$

#for e:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 1  | 0  | 0  | 1  |
| 01    | 0  | 0  | 0  | 1  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 0  | X  | X  |

F(ABCD)=  $\neg B \neg D + C \neg D$

#for g:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 0  | 0  | 1  | 1  |
| 01    | 1  | 1  | 0  | 1  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 1  | X  | X  |

F(ABCD)=  $\neg BC + A + B \neg D$

#for b:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 1  | 1  | 1  | 1  |
| 01    | 1  | 0  | 1  | 0  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 1  | X  | X  |

F(ABCD)=  $\neg B + CD + \neg C \neg D$

#for d:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 1  | 0  | 1  | 1  |
| 01    | 0  | 1  | 0  | 1  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 1  | X  | X  |

F(ABCD)=  $\neg B \neg D + \neg BC + B \neg CD + C \neg D + A$

#for f:

| AB\CD | 00 | 01 | 11 | 10 |
|-------|----|----|----|----|
| 00    | 1  | 0  | 0  | 0  |
| 01    | 1  | 1  | 0  | 1  |
| 11    | X  | X  | X  | X  |
| 10    | 1  | 1  | X  | X  |

F(ABCD)=  $\neg C \neg D + B \neg C + B \neg D + A$

```
## Clock signal
set_property -dict { PACKAGE_PIN W5 IOSTANDARD LVCOS33 } [get_ports clock]
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clock]
    ) ที่ toggle 1 หน่วยเวลา 5
    ) ที่ toggle 1 หน่วยเวลา 5
```

Screenshot of Xilinx Vivado IDE showing the RTL Analysis and Synthesis environment.

**RTL ANALYSIS**

- Open Elaborate
- Report Methodology
- Report DRC
- Report Noise
- Schematic

**SYNTHESIS**

- Run Synthesis
- Open Synthesized

**IMPLEMENTATION**

- Run Implementation
- Open Implementer

**PROGRAM AND DEBUG**

- Generate Bitstream
- Open Hardware Monitor

**Source File Probe** for `clockDiv.v`:

```

13 // Dependencies:
14 // Revision:
15 // Additional Comments:
16 // Revision 0.01 - File Created
17 // Additional Comments:
18 // 
19 //////////////////////////////////////////////////////////////////
20
21
22 module clockDiv(
23     output clkDiv,
24     input clk
25 );
26
27 reg clkDiv;
28
29 initial begin
30     clkDiv=0;
31 end
32
33 always @ (posedge clk)
34 begin
35     clkDiv=~clkDiv;
36 end
37
38 endmodule
39
40
41

```

**Tcl Console** and **Design Runs** tables:

| Name    | Constraints | Status                    | WNS   | TNS   | WHS   | THS   | TPWS  | Total Power | Failed Routes | LUT | FF   | BRAMs | URAM | DSP | Sta |
|---------|-------------|---------------------------|-------|-------|-------|-------|-------|-------------|---------------|-----|------|-------|------|-----|-----|
| synth_1 | constrs_1   | synth_design Complete!    |       |       |       |       |       |             | 25            | 21  | 0.00 | 0     | 0    | 1   |     |
| impl_1  | constrs_1   | write_bitstream Complete! | 8.526 | 0.000 | 0.389 | 0.000 | 0.000 | 0.102       | 0             | 25  | 21   | 0.00  | 0    | 0   | 1/  |

**quadSevenSeg.v** code with annotations:

```

module quadSevenSeg(
    output [6:0] seg,
    output dp,
    output an0,
    output an1,
    output an2,
    output an3,
    input [3:0] num0,
    input [3:0] num1,
    input [3:0] num2,
    input [3:0] num3,
    input clk
);

reg [1:0] ns;
reg [1:0] ps;
reg [3:0] dispEn;

reg [3:0] hexIn;
wire [6:0] segments;
assign seg=segments;
hexTo7Segment hexDecode(segments,hexIn);
assign dp=0;
assign {an3,an2,an1,an0} = ~dispEn;

always @(posedge clk)
begin
    ps=ns;
end

always @(ps)
begin
    ns=ps+1;
end

always @(ps)
begin
    case(ps)
        2'b00: dispEn=4'b0001;
        2'b01: dispEn=4'b0010;
        2'b10: dispEn=4'b0100;
        2'b11: dispEn=4'b1000;
    endcase
end

always @(ps)
begin
    case(ps)
        2'b00: hexIn=num0;
        2'b01: hexIn=num1;
        2'b10: hexIn=num2;
        2'b11: hexIn=num3;
    endcase
end
endmodule

```

Annotations:

- pink box highlights `seg` declaration.
- blue bracket groups `reg [1:0] ns;`, `reg [1:0] ps;`, and `reg [3:0] dispEn;`.
- red bracket groups `reg [3:0] hexIn;` and `wire [6:0] segments;`.
- green bracket groups `assign seg=segments;` and `hexTo7Segment hexDecode(segments,hexIn);`.
- red bracket groups `assign dp=0;` and `assign {an3,an2,an1,an0} = ~dispEn;`.
- red bracket groups `always @(posedge clk)` and `begin ps=ns; end`.
- red bracket groups `always @(ps)` and `begin ns=ps+1; end`.
- red bracket groups `always @(ps)` and `begin case(ps) ... endcase end`.
- red bracket groups `always @(ps)` and `begin case(ps) ... endcase end`.
- blue bracket groups `2'b00: hexIn=num0;` through `2'b11: hexIn=num3;`.



## Laboratory 3: Counter and Switch (Debounce)

### Objectives

1. Synthesis FPGA
2. Able to design debounce switch and input
3. Able to design up and down Counter

### Background

Debouncing is a programming pattern or a technique to restrict the calling of a time-consuming function frequently, by delaying the execution of the function until a specified time to avoid unnecessary CPU cycles, and API calls and improve performance.

### Switch and Bounce

Mechanic switches and relays have a common issue called contact bounce (aka. chatter). Switch and relay contacts are usually made of metals. When the contacts strike together, their momentum and elasticity act together to cause them to bounce apart one or more times before making steady contact. (Imagine a ball falling on a fall, it would bounce several times before coming to a complete stop.)



There are several ways to debounce. Debouncing methods include using capacitor, SR Latch, or Low-pass filtered schmitt trigger. However, those methods usually required special hardware.

To debounce without using special hardware, we can use software methods by

<sup>5</sup> Image taken from [https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bouncy\\_Switch.png/400px-Bouncy\\_Switch.png](https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bouncy_Switch.png/400px-Bouncy_Switch.png)



# Hardware Synthesis Laboratory I

13

resampling for input several times.



**Abstract.** The single pulser is a clocked sequential device which generates a unit-time pulse on its output for every pulse on its input. This paper explores how a single-pulser implementation is verified by various formal reasoning tools, including the PVS theorem prover for higher-order logic, the SMV model checker for computation tree logic, the DDD design derivation system, and the Oct Tools design environment. By using a single-pulse example, the study attempts to contrast how the underlying formalisms influence one's perspective on design and verification.

**KEYWORDS AND PHRASES:** Formal methods, hardware verification, formal verification, theorem prover, higher order logic, model checker, design derivation system, and the Oct Tools design environment.

**debounce** Please also note that there is a **metastable issue**. To avoid this, it is generally **advised that** **two D flip flops** be placed **between the input and the digital circuit**.

## Exercises

counting 0-1-2-3... -9-0-1-2.... ✓

1. Create an up / down 1-digit BCD counter with 4-bit outputs (DCBA) and 1 overflow output (cout), 1 borrow (bout) and 6 inputs (up, down, set9, set0,clock). Write a simulator to show that the counter functions correctly.



2. Create a **single pulser** with one input, clock, and one output. Write a simulator to show that the single pulser works correctly.



3. Use 4 counters from exercise 1 to create 4 digits BCD counters. Connect all displays



to 4 digits seven-segment displays. (Use the display components from Laboratory II.) Use BTNU for set9 (set the number to 9999). Use BTNC for reset (set the number to 0000). Use SW0 for countdown by 1. Use SW1 for count up by 1. Use SW2 for countdown by 10. Use SW3 for count up by 10. Use SW4 for countdown by 100. Use SW5 for count up by 100. Use SW6 for countdown by 1000. Use SW7 for count up by 1000. If the number is at 0000, a countdown would not decrease the number. If the number is at 9999, a count up would not increase the number. Do not worry about the bounce at the moment. We will fix it in the next exercise.

\* မျက်နှာအတွက် cout ဆုံးဖြတ်ပေါ်မှုများ သော်လည်း

bout မျက်နှာအတွက်များ သော်လည်း

4. Correct the bounce in exercise 2 by implementing a debounce component for each input.

5. Please answer the following questions and submit (in PDF format) to CourseVille on Friday before 23:59 (midnight).

- From the circuit diagram, the BTNx is active High or active Low? Please provide your analysis.
- What is a bounce? How do you programmatically debounce the input?

|      |
|------|
| 0000 |
| 0001 |
| 0002 |
| :    |
| 0009 |
| 0010 |
| 0011 |
| 0012 |
| 0013 |
| :    |
| 0019 |
| 0020 |

# Hardware Synthesis Laboratory I

14

Please provide your analysis.

- c. Please show your method for implementing a single pulser. (e.g. draw a state diagram, or verilogHDL code)

5. Please answer the following questions and submit (in PDF format) to CourseVille

on Friday before 23:59 (midnight).

a. From the circuit diagram, the BTNx is active High or active Low? Please provide your analysis.

ตอบ เป็น Active High เพราะ มีขาที่ทำงานดอนที่มีการกดปุ่ม หรือ กีตีดอนที่ค่าเป็น 1 นั้นเอง และจาก

Circuit Diagram



จะเห็นว่า ถ้า BTNx มีค่าเป็น 1 จริงจะเกิดการทำงาน ดังนั้น จึงเป็น Active High

- b. What is a bounce? How do you programmatically debounce the input? Please provide your analysis.

ตอบ bounce = When two metal contacts to generate multiple signals as the contacts close or open, i.e., when switch is pressed, it can cause 'bounce' that makes multiple signal.

การแก้ปัญหาการ bounce หรือที่เรียกว่า debounce นั้นทำได้โดยการใช้เทียบไมค์ให้มีการนำอินพุตไปใส่ใน DFlipFlop ก่อน เพื่อท้ากรองสัญญาณ ดังรูปด้านล่างที่ได้ทำการวิเคราะห์ไว้



จะเห็นว่าส่วนสีเขียว ซึ่งกีดีอส่วนที่เกิดการ bounce "ได้ถูกกำจัดออกไป โดย D FlipFlop

- c. Please show your method for implementing a single pulser. (e.g. draw a state diagram, or verilogHDL code)

ตอบ verilogHDL code:

```
module SinglePulse(
    output reg answer,
    input d;
    input clock
);
    wire tmp1, tmp2; //must be wire
    DFlipFlop DFlipFlop1(tmp1, clock, 1, d);
    DFlipFlop DFlipFlop2(tmp2, clock, 1, tmp1); //for module AllServerSegment

    wire dd;
    assign dd = tmp2;
    //assign dd = d;
    reg usedSel = 0;
    reg ansver = 0;
    always @(posedge clock)
    begin
        if (dd == 1 && usedSel == 0) begin
            answer = 1;
            usedSel = 1;
        end
        else if (dd == 1 && usedSel == 1) begin
            answer = 0;
            usedSel = 0;
        end
        else if (dd == 0 && usedSel == 1) begin
            answer = 0;
            usedSel = 0; //Reset
        end
        else begin
            answer = 0;
            usedSel = 0;
        end
    end
endmodule
```

มีสेटที่เก็บไว้ผลลัพธ์เคยเป็น 1 หรือยัง ถ้ามี input = 1 เข้ามาแต่ผลลัพธ์เคยเป็น 1 มาแล้ว output จะไม่เป็น 1 แต่จะเป็น 0 แทน และในทางกลับกันถ้า input เป็น 1 แต่สेटที่นี่เป็น 0 ผลลัพธ์ถึงจะแสดง 1 ออกมา แต่ถ้า input เป็น 0 ไม่ว่าอย่างไรก็ตาม output จะเป็น 0 เสมอ

1. Create an up/down 1-digit BCD counter with 4-bit outputs (DCBA) and 1 overflow output (cout), 1 borrow (bout) and 6 inputs (up, down, set9, set0, clock). Write a simulator to show that the counter functions correctly.



0 000  
1 001  
2 010  
3 011  
4 0100  
5 0101  
6 0110  
7 0111  
8 1000  
9 1001



register รีจิสเตอร์ คืออะไร  
hexTo7Segment แปลง hex เป็น 7 segment  
BCDCounter บีซีಡี คืออะไร  
fullAdderTester ทดสอบ adder  
reg [6:0] segments;

```

module BCDCOUNTER(
    output reg [3:0] outputs,
    output reg cout,
    output reg bout,
    input set9,
    input set0,
    input inc,
    input dec,
    input clk
);

module BCDCounter(
    output reg [3:0] outputs,
    output reg cout,
    output reg bout,
    input set9,
    input set0,
    input inc,
    input dec,
    input clk
);

module fullAdderTester(
    reg a,b,cin;
    wire cout,s;
    fullAdder a1(cout,s,a,b,cin);
);

```

$$\begin{array}{cccc} A & B & C & D \\ 0-9 & & & \end{array}$$

$$\begin{aligned} D, C_1 &= D+1 \\ C, C_2 &= C+C_1 \\ B, C_3 &= B+C_2 \end{aligned}$$

$$\begin{aligned} bcd(\text{up}) &\text{return cout, } n \\ \frac{ABCD}{A = bcd()} & c = D(1) \\ B &= \underline{\hspace{2cm}} \\ C &= \underline{\hspace{2cm}} \\ D &= \underline{\hspace{2cm}} \end{aligned}$$

↑  
cout  
c  
cout

2. Create a single pulser with one input, clock, and one output. Write a simulator to show that the single pulser works correctly

Moore  $\rightarrow$  មិនកីរិយាល័យ

Mealy  $\rightarrow$  កីរិយាល័យ នៃ input



## Laboratory 4: Memory

### Objectives

1. Able to implement memory in HDL
2. Able to instantiate internal FPGA memory

### Background

We would be soon implementing our first processor in the next Lab!. Now, you should have an understanding of how to implement a FSM with Verilog. What you are missing is how to implement a memory model on Verilog such that you can use it on your very first processor. We will be looking at read-only-memory (ROM), random-access-memory (RAM), and first-in-first-out (FIFO)

### ROM

The first kind of memory you are going to implement is read-only-memory (ROM). So far, you have been using only Verilog for synthesizing registers (D-Flip Flop). However, it's costly to implement memory using purely registers. Field-programmable gate array (FPGA) manufacturers often include blocks of memory inside the FPGA such that you can use.

Typical ROM instantiation looks like the following

```
module rom case(
    (* synthesis , rom block = "ROM CELLXYZ01" *)
    output reg [3:0] z ,
    input wire [2:0] a); // address- 8 deep memory

always@* begin // @ (a)
    case (a)
        3'b000: z = 4'b1011;
        3'b001: z = 4'b0001;
        3'b100: z = 4'b0011;
        3'b110: z = 4'b0010;
        3'b111: z = 4'b1110;
        default : z = 4'b0000 ;
    endcase
end
endmodule // rom case
```

The code above would generate ROM with 8 addresses, each address is 4 bits.

You might notice the synthesis suggestion keyword (\* synthesis, rom\_block = "ROM\_CELLXYZ01" \*). This tells the synthesis tools to try to use the dedicated ROM inside the FPGA instead of implementing it as a block of registers. The keywords may differ from one FPGA vendor from another. Note that most synthesis tools nowadays are smart enough to detect the access pattern that you can remove that out.

Second, the ROM in is actually asynchronous. You could make it a synchronous ROM by adding a clock, i.e.

```
module rom case(
    (* synthesis , rom block = "ROM CELLXYZ01" *) input clk ,
    output reg [3:0] z ,
    input wire [2:0] a); // address- 8 deep memory

always@(posedge clk)
begin
    case (a)
        3'b000: z = 4'b1011;
        3'b001: z = 4'b0001;
        3'b100: z = 4'b0011;
        3'b110: z = 4'b0010;
        3'b111: z = 4'b1110;
        default : z = 4'b0000 ;
    endcase
end
endmodule // rom case
```

Having the data inside your program is extremely inconvenient especially if you have a large file set of data. Verilog allows you to “read” data from a file.

```
// Verilog-2001 style
// ROM module using two dimensional arrays with
// memory defined in text file with $readmemb or $readmemh
// NOTE: This style can lead to simulation/synthesis mismatch
// if the content of data file changes after synthesis
module rom_2dimarray_initial_readmem (
    output wire [3:0] z,
    input  wire [2:0] a);
    // declares a memory rom of 8 4-bit registers.
    //The indices are 0 to 7
    (* synthesis, rom_block = "ROM_CELL XYZ01" *)
    reg      [3:0] rom[0:7];
    // NOTE: To infer combinational logic instead of a ROM, use
    // (* synthesis, logic_block *)
    initial \$readmemb("rom.data", rom);
    assign z = rom[a];
endmodule
```

*e binary      d = decimal  
h = hex*

The rom.data would look like this

```
1011 // addr=0
1000 // addr=1
0000 // addr=2
1000 // addr=3
0010 // addr=4
0101 // addr=5
1111 // addr=6
1001 // addr=7
```

*ມີກຳນົດຕອງເປັນເກສດແບບທີ່ໄດ້*

## RAM

Another useful primitive for the FPGA is random-access-memory (RAM). Again, as in the ROM case, we could have implemented RAM as a set of registers, but it's expensive and costly to do so. Typical FPGAs have dedicated areas for RAMs, (BlockRAM for Xilinx, Memory Block for Altera, etc.) The following code will generate RAM with 128x8 bits.

```

module SinglePortRAM ( inout [7:0] d, // Data In and Out
inout wire [6:0] addr, // Address
input wire oe, // Output Enable
input wire clk, we);
(* synthesis , ram block *)

reg [7:0] mem [127:0];

always @ (posedge clk)
if (we)
    mem[addr] <= d;
    assign d = oe ? mem[addr] : 8'bZ;
endmodule

```

enable: ចាប់ពីលក្ខណនា

d & oe → មាន high Impedance

You may see the inout port in the example. The idea is that the port can be used as both input and output (at different times). To read, you have to assign Z to the wire before reading the data. To write, just connect the register to the wire. This line “assign d = oe ? mem[addr] : 8'bZ; ” explains such a connection.

## Block RAM

As you can see, we can write a HDL code to generate registers, and we can potentially implement a memory using it. However, a FPGA has a small number of these CLB, and it is a bit overkill since these logic can do much greater things than being just memory. So, most FPGA vendors have specialized memory units that we can use on these FPGAs. Each vendor has a different name, but for Xilinx, we call it Block RAMs. Typically, each bRAMs has a size of 10-20Kbit depending on the FPGA, and each FPGA may have from ten to thousands of these bRAMs.

There are several ways to initiate bRAMs on Xilinx, but in general Xilinx Synthesis step can recognize if you are going to generate a bRAMs from a following pattern

```
parameter RAM WIDTH = <ram width>;
parameter RAM ADDR BITS = <ram addr bits>;
reg [RAMWIDTH-1:0] <ram name> [(2**RAMADDRBITS)-1:0]; reg [RAM WIDTH-1:0]
<output data>;
<reg or wire> [RAMADDRBITS-1:0] <address>;
<reg or wire> [RAMWIDTH-1:0] <input data>; always @(posedge <clock>)
if (<ram enable>) begin
if (<write enable>) begin
<ram name>[<address>] <= <input data>; <output data> <= <input data>;
end else
<output data> <= <ram name>[<address>]; end
```

Note that bRAMs is not the only type of construct that the Synthesizer can recognize. There are many other types of construct. Some of these are even more complicated. So most FPGA vendors has a so called language template. For Xilinx, you can find these out in the menu by going to Tools > Language Templates. For memory, it is in Verilog > Synthesis Constructs > Coding Examples > RAM > BlockRAM. We recommend you explore these constructs.

Most FPGAs have what is called **Distributed RAM**. This essentially **uses the logic gate and registers to implement the memory**. Usually, Distributed RAM is faster than bRAMs, but it is smaller.

1

Note that there is also another method for instantiating the bRAMs. This is through the use of Block Design. We recommend you take a look at this document [https://www.xilinx.com/support/documentation/university/Vivado-Teaching/Digital-Design/2014x/docs-pdf/Vivado\\_tutorial.pdf](https://www.xilinx.com/support/documentation/university/Vivado-Teaching/Digital-Design/2014x/docs-pdf/Vivado_tutorial.pdf) for more information about how to use IP Integrator and Block Design.

## Exercises

1. Design and build a circuit to work as a Stack (LIFO). The user can use two push buttons in order to PUSH (BTNU) or POP (BTNC). Use 8 switches on the board as a value. When a user hits a PUSH button, it will store the value from the switches to the stack. When the user hits the POP switch, it will display the value from the top of the stack in the two hex displays on the left. The other hex displays are used to display the number of elements currently in the stack. The stack can keep up to 256 elements. If the stack is full, hitting the PUSH button should not do anything.

We recommend using a PUSH button as a reset (BTND).

2. Read an input from 5 binary switches. (You are welcome to use any switch. If you have no preference, use SW[4..0].) This should give you a number ranging from 0 to 31. Use distributed memory or bRAMs as the ROM for converting binary to 2 BCD for displaying on a seven-segment display. Use 5 bit binary as an address. The output can be either 2x4 BCD for applying to BCDtoSevenSegment (as used in the previous lab) or 2x8 for feeding directly to a seven-segment display.

Note: You may want to initialize the memory from data. Please see the language template for more information

3. Use Block Design to create a simple calculator 4-bit calculator. You will assign 4 switches as the 4-bits input for A and another 4 switches for B. You will assign 4 push buttons to do 4 different operations.
  - a. When BTNU is pushed, you will display the result of A+B in base 10 using the three 7-segment displays.
  - b. When BTNL is pushed, you will display the result of A-B in base 10 using the three 7-segment displays.
  - c. When BTND is pushed, you will display the result of A\*B in base 10 using the three 7-segment displays.
  - d. When BTNR is pushed, you will display the result of A/B in base 10 using the three 7-segment displays.
4. Please answer the following questions and submit (in PDF format) to CourseVille on Friday before 23:59 (midnight).
  - a. Explain your ROM for mapping 5-bit binary to 2-digit BCDs (or 2x8 bits seven segment displays depending on your design in Exercise.2).

4. Please answer the following questions and submit (in PDF format) to CourseVille on Friday before 23:59 (midnight).

a. Explain your ROM for mapping 5-bit binary to 2-digit BCDs (or 2x8 bits seven segment displays depending on your design in Exercise.2).

ตอบ ROM นั้น จะมี Look up table ซึ่งเปรียบเสมือนตารางที่มีคำตอบที่เกิดจาก input ที่เป็นไปได้ ทุกรูปแบบ เก็บไว้ ถ้า input แบบหนึ่งเข้ามา ก็จะสามารถส่งคำตอบออกไปได้เลยทันที โดยที่ไม่ต้องคำนวณใหม่ เช่น ถ้า เลข 5 บิตที่เข้ามาเป็น 10100 ก็จะส่งค่าเออาทพุตออกไปเป็น 0010 กับ 0000 ทันที โดยที่ไม่ต้องมีการคำนวณ ใหม่ ซึ่งไฟล์คำตอบทั้งหมดนั้นจะอยู่ในไฟล์ชื่อว่า romData.data และมีการใช้คำสั่งชื่อว่า \$readmemb ในการ map ตัวไฟล์คำตอบ เข้ากับ array ขนาด  $32 \times 8$  (ซึ่งจะมองเสมือนว่าเป็นตัวจำค่าต่างๆ) และเพราคำตอบที่ ออกมากจาก ROM นั้นต้องมีขนาด 8 บิต และ เนื่องจากอินพุตนั้นมีจำนวน 5 บิต แสดงว่ามีคำตอบที่เป็นไปได้ อยู่ทั้งหมด  $2^5 = 32$  รูปแบบนั้นเอง

- Design and build a circuit to work as a Stack (LIFO). The user can use two push buttons in order to PUSH (BTNU) or POP (BTNC). Use 8 switches on the board as a value. When a user hits a PUSH button, it will store the value from the switches to the stack. When the user hits the POP switch, it will display the value from the top of the stack in the two hex displays on the left. The other hex displays are used to display the number of elements currently in the stack. The stack can keep up to 256 elements. If the stack is full, hitting the PUSH button should not do anything.

We recommend using a PUSH button as a reset (BTND).



2. Read an input from 5 binary switches. (You are welcome to use any switch. If you have no preference, use SW[4..0].) This should give you a number ranging from 0 to 31. Use distributed memory or bRAMs as the ROM for converting binary to 2 BCD for displaying on a seven-segment display. Use 5 bit binary as an address. The output can be either 2x4 BCD for applying to BCDtoSevenSegment (as used in the previous lab) or 2x8 for feeding directly to a seven-segment display.

\*\* ຕົວຢ່ານ

Note: You may want to initialize the memory from data. Please see the language template for more information



0 0 0 0    0 0 0 0 | 0 0 0 0 0 0 0 0  
⋮            ⋮            ⋮  
0 1 0 1    0 1 0 0 | 0 0 0 1 0 1 0 0  
5            4            20

ເພື່ອ 5x4 = 20 (ຈຳກຳປະເລຸງ)



3. Use Block Design to create a simple calculator 4-bit calculator. You will assign 4 switches as the 4-bits input for A and another 4 switches for B. You will assign 4 push buttons to do 4 different operations.

- When BTNU is pushed, you will display the result of  $A+B$  in base 10 using the three 7-segment displays.
- When BTNL is pushed, you will display the result of  $A-B$  in base 10 using the three 7-segment displays.
- When BTND is pushed, you will display the result of  $A*B$  in base 10 using the three 7-segment displays.
- When BTNR is pushed, you will display the result of  $A/B$  in base 10 using the three 7-segment displays.



$A+B$   $\sum_{i=0}^3 a_i b_i$   
 $A-B$   $\sum_{i=0}^3 a_i - b_i$   
 $A \times B$

$A+B; 0+0$   
 $0+1$   
 $1+0$



$$\begin{array}{r} 1111 \\ \times 111 \\ \hline 255 \end{array}$$

$8+4+2+1$   
 $\hookrightarrow 15$

$$\begin{aligned} A+B &= 0 \\ A-B &= 1 \\ A \times B &= 2 \\ A/B &= 3 \end{aligned}$$

$$\square \quad \square \quad \square \quad \square + \square \quad \square \quad \square \quad \square$$

$2 \times 2 \times 2 \times 2 \quad 2 \times 2 \times 2 \times 2$

$$\square \quad \square \quad \square \quad \square - \square \quad \square \quad \square \quad \square$$

$2 \times 2 \times 2 \times 2 \quad 2 \times 2 \times 2 \times 2$

$$\square \quad \square \quad \square \quad \square * \square \quad \square \quad \square \quad \square$$

$2 \times 2 \times 2 \times 2 \quad 2 \times 2 \times 2 \times 2$

$$\square \quad \square \quad \square \quad \square \div \square \quad \square \quad \square \quad \square$$

$2 \times 2 \times 2 \times 2 \quad 2 \times 2 \times 2 \times 2$

$4(2^4 \times 2^4) = 1024$

## Laboratory 5: Simple CPU and Memory Mapped I/O

### Objectives

1. Able to implement Simple CPU in VerilogHDL
2. Able to implement the memory mapped I/O

### Background

I/O ที่เรียกต่อสืบ กัน 7-segment ให้ switch

แบบที่ 3 คือ I/O Processor (มินิบ๊อ)

### Memory Mapped I/O and Port-Mapped I/O

Memory Mapped I/O and Port-Mapped I/O are two different methods for implementing input/output between CPU and peripheral devices in a computer.

ร่วมกับ memory address หนึ่ง ซึ่งถูกพ่วงเข้ากับ I/O

**1** \*\* Memory Mapped I/O uses the same address to address both memory and I/O devices. The memory and registers of devices are associated with physical addresses. Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register. This type of I/O allows software to interact with I/O directly using standard load and store instructions. **แต่ I/O ไม่เป็นหนึ่งเดียว แต่ Memory**

(Port)

**2** Unlike the Memory Mapped I/O, Port-mapped I/O uses special CPU instructions for performing I/O, such as the inp and outp. The benefit of Port-mapped I/O is the separation of address spaces between I/O and memory.

\*  
memory ไม่เหมือนกับ 500 แต่ I/O ทำแทนที่ 500 มีคนละชื่อ

To connect a device to either memory mapped I/O or port-mapped I/O, a decoder must be added to the address. When an address is matched, accessing (reading and writing) to the data bus will be relayed to a device. In certain cases (eg. microcontroller), control registers (and buffers) will be added to related addresses in order to act as an interface between processor and devices.

(For more information, see the video clips.)

I/O



สำหรับ Port Mapped I/O แล้วนี่  
จะค่าบัน 0 → ให้ Addr หัวเข้าไปเมื่อ 7-Seg  
II → II MEM



$nWR = 0 \& 0x0000 FFFF$   
ตัวจังหวะที่หนึ่ง DCBA[3] จะติด!

— = สำคัญ



## Exercises

Please use nanoLADA CPU code<sup>6</sup> as a base.

- Based on the given ALU, extend the ALU to support the following operations.

| ALU OP | Operations                          | $A=6, B=5, Cin=1$                                                     |
|--------|-------------------------------------|-----------------------------------------------------------------------|
| 000    | {Cout,S}=A+B+Cin;                   | $\begin{array}{r} 110 \\ 011 \\ + 001 \\ \hline 110 \end{array}$      |
| 001    | {Cout,S}=A-B;                       | $101 \rightarrow 5$<br>$110 \rightarrow 6$                            |
| 010    | S=A   B; Cout=0; (or)               | $\begin{array}{r} 110 \\ 101 \\ \hline 111 \end{array} \rightarrow 7$ |
| 011    | S=A & B; Cout=0; (and)              | $\begin{array}{r} 110 \\ 101 \\ \hline 100 \end{array} \rightarrow 4$ |
| 100    | S=A ^ B; Cout=0; (xor)<br>ต่อสัญญาณ | $\begin{array}{r} 110 \\ 101 \\ \hline 011 \end{array} \rightarrow 3$ |
| 101    | S=-A; Cout=0; (2's complement)      | $\begin{array}{r} 110 \\ 001 \\ \hline 010 \end{array} \rightarrow a$ |
| 110    | S=~A; Cout=0; (not)                 | $\begin{array}{r} 110 \\ 001 \\ \hline 011 \end{array}$               |
| 111    | S=~B; Cout=0; (not)                 | $\begin{array}{r} 101 \\ 010 \\ \hline 011 \end{array}$               |

แก้ไข: Run simulator ให้ถูกต้อง

Use the simulator to validate your ALU.

- Extend the R-type instruction format to support the following operations.



<sup>6</sup>Available at <https://www.cp.eng.chula.ac.th/~krerk/books/Computer%20Architecture/nanoLADA/>

# Hardware Synthesis Laboratory I

$p\_address = pc$   
 $p\_data \rightarrow \text{instruction}$   
 $d\_address = data\_S = \text{address of data}$   
 $d\_data \rightarrow data\_M = \text{data}$

23

Original instruction

|                                                     | Opcode<br>(6 bits) | RS<br>(5 bits) | RT<br>(5 bits) | RD<br>(5 bits) | Reserved<br>(11 bits) |
|-----------------------------------------------------|--------------------|----------------|----------------|----------------|-----------------------|
| ADD rd, rs, rt<br>$R[rd] \leftarrow R[rs] + R[rt];$ | 000001             |                |                |                |                       |

New instructions.

In Reserve 11 bits and

|                                                          | Opcode<br>(6 bits) | RS<br>(5 bits) | RT<br>(5 bits) | RD<br>(5 bits) | ALU Operation<br>(11 bits) |
|----------------------------------------------------------|--------------------|----------------|----------------|----------------|----------------------------|
| ADD rd, rs, rt<br>$R[rd] \leftarrow R[rs] + R[rt];$      | 000001             |                |                |                | 00000000_000 ✓             |
| SUB rd, rs, rt<br>$R[rd] \leftarrow R[rs] - R[rt];$      | 000001             |                |                |                | 00000000_001 ✓             |
| OR rd, rs, rt<br>$R[rd] \leftarrow R[rs] \mid R[rt];$    | 000001             |                |                |                | 00000000_010 ✓             |
| AND rd, rs, rt<br>$R[rd] \leftarrow R[rs] \& R[rt];$     | 000001             |                |                |                | 00000000_011 ✓             |
| XOR rd, rs, rt<br>$R[rd] \leftarrow R[rs] \wedge R[rt];$ | 000001             |                |                |                | 00000000_100 ✓             |
| COM rd, rs, xx<br>$R[rd] \leftarrow -R[rt];$             | 000001             |                |                |                | 00000000_101 ✓             |
| NOT rd, rs, xx<br>$R[rd] \leftarrow \sim R[rs]$          | 000001             |                |                |                | 00000000_110 ✓             |

M พื้น ROM M พื้น prog ที่เก็บคำสั่งห้าม

กอ2



001  
010  
011

New instructions.

|                                          | Opcode (6 bits) | RS (5 bits) | RT (5 bits) | RD (5 bits) | ALU Operation (11 bits) |
|------------------------------------------|-----------------|-------------|-------------|-------------|-------------------------|
| ADD rd, rs, rt<br>R[rd] ← R[rs] + R[rt]; | 000001          |             |             |             | 00000000_000 ✓          |
| SUB rd, rs, rt<br>R[rd] ← R[rs] - R[rt]; | 000001          |             |             |             | 00000000_001 ✓          |
| OR rd, rs, rt<br>R[rd] ← R[rs]   R[rt];  | 000001          |             |             |             | 00000000_010 ✓          |
| AND rd, rs, rt<br>R[rd] ← R[rs] & R[rt]; | 000001          |             |             |             | 00000000_011 ✓          |
| XOR rd, rs, rt<br>R[rd] ← R[rs] ^ R[rt]; | 000001          |             |             |             | 00000000_100 ✓          |
| COM rd, rs, xx<br>R[rd] ← -R[rs];        | 000001          |             |             |             | 00000000_101 ✓          |
| NOT rd, rs, xx<br>R[rd] ← -R[rs];        | 000001          |             |             |             | 00000000_110 ✓          |

LW 1 #10 p\_data = 32'b011000\_00100\_00101\_00000\_000000000000; // LW \$r5, 0(\$r4) = 1  
LW 2 #10 p\_data = 32'b011000\_00101\_00100\_00000\_000000000000; // LW \$r4, 0(\$r5) = 2

#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000000; // ADD \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000001; // SUB \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000010; // OR \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000011; // AND \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000100; // XOR \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000101; // -A \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000110; // NOT A \$r6, \$r5, \$r4  
#10 p\_data = 32'b000001\_00100\_00101\_00110\_000000000111; // NOT B \$r6, \$r5, \$r4

หมายเหตุ

```

# run 1000ns
0 - mem[zzzxxxx] - xxxxxxxx

1 - A(REG[ 0]) - 00000000, B(REG[ 0]) - 00000000

5 - mem[zzz0000] - xxxxxxxx

2 - A(REG[ 0]) - 00000000, B(REG[ 0]) - 00000000

3 - A(REG[ 4]) - 00000000, B(REG[ 5]) - 00000000
3 - REG[ 5] <- 00000001
30 - mem[zzz0001] - 00000001
ดูใน list ที่ 0 นะ
ดูใน list ที่ 0 นะ
memory address

4 - A(REG[ 5]) - 00000001, B(REG[ 4]) - 00000000
4 - REG[ 4] <- 00000002
40 - mem[zzz0002] - 00000002

40 - mem[zzz0003] - 00000003

5 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
5 - REG[ 6] <- 00000003
50 - mem[zzz0001] - 00000004

6 - REG[ 6] <- 00000001
60 - mem[zzz0003] - 00000002

7 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
7 - REG[ 6] <- 00000003
70 - mem[zzz0000] - 00000004

8 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
8 - REG[ 6] <- 00000000
80 - mem[zzz0003] - 00000001

9 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
9 - REG[ 6] <- 00000003
90 - mem[zzzffffe] - 00000004

10 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
10 - REG[ 6] <- fffffffe
100 - mem[zzzffffd] - xxxxxxxx

11 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
11 - REG[ 6] <- fffffffd
110 - mem[zzzffffe] - xxxxxxxx

12 - A(REG[ 4]) - 00000002, B(REG[ 5]) - 00000001
12 - REG[ 6] <- fffffffe

```

| opcode                                                                            | RS | RT | RD | Reserved |
|-----------------------------------------------------------------------------------|----|----|----|----------|
| #10 p_data = 32'b011000_00100_00101_00000_000000000000; // LW \$r5, 0(\$r4) = 1   |    |    |    |          |
| #10 p_data = 32'b011000_00101_00100_00000_000000000000; // LW \$r4, 0(\$r5) = 2   |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000000; // ADD \$r6, \$r5, \$r4   |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000001; // SUB \$r6, \$r5, \$r4   |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000010; // OR \$r6, \$r5, \$r4    |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000011; // AND \$r6, \$r5, \$r4   |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000100; // XOR \$r6, \$r5, \$r4   |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000101; // -A \$r6, \$r5, \$r4    |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000110; // NOT A \$r6, \$r5, \$r4 |    |    |    |          |
| #10 p_data = 32'b000001_00100_00101_00110_000000000111; // NOT B \$r6, \$r5, \$r4 |    |    |    |          |

ดูใน list ที่ 0

## Simulator

Use the simulator to validate your extended CPU.

3. Use the knowledge from memory-mapped I/O to map the following devices to associated addresses. (You have to add a few registers and buffers to the associated address.) Note that you have to take away memory from those addresses.

แสดงการต่อเข้าชิปจ่ายรูปใน address พากศ์ บันทีปะงุห์ที่ 7 Seg

| Address | Device                                                    | Note            |
|---------|-----------------------------------------------------------|-----------------|
| 0xFFFF0 | 4-bit register for driving Seven-segment display digit 0. | Use last 4 bits |
| 0xFFFF4 | 4-bit register for driving Seven-segment display digit 1. | Use last 4 bits |
| 0xFFFF8 | 4-bit register for driving Seven-segment display digit 2. | Use last 4 bits |
| 0xFFFFC | 4-bit register for driving Seven-segment display digit 3. | Use last 4 bits |
| 0xFFE0  | 4-bit buffer for reading Switch [3..0] (A)                | Use last 4 bits |
| 0xFFE4  | 4-bit buffer for reading Switch [7..4] (B)                | Use last 4 bits |
| 0xFFE8  | 4-bit buffer for reading Switch [11..8] (Op)              | Use last 4 bits |



Write a simple software to your ROM to repeatedly read from switches. Display the value of switch [3..0] to seven-segment display digit 0. Display the value of switch [7..4] to seven-segment display digit 1. For seven segment display digits [3..2], show the result from the following operations.



|                       |                                             |
|-----------------------|---------------------------------------------|
| Switches [10..8] (Op) | Seven segment display [3..2] $A = 2, B = 1$ |
| 000                   | Show A+B<br>3                               |

# Hardware Synthesis Laboratory I

$$A=2, B=1$$

25

|     |          |                 |                         |
|-----|----------|-----------------|-------------------------|
| 001 |          | Show A-B        | 1                       |
| 010 | 10<br>01 | Show A B        | 3                       |
| 011 |          | Show A&B        | 0                       |
| 100 | 10<br>01 | Show A^B        | 011 3                   |
| 101 |          | Show ~A (not A) | 11111101 FD             |
| 110 |          | Show -A         | 00000010<br>11111110 FE |
| 111 |          | Show -B         | 11111111 FF             |

Demonstrate your design in our BASYS 3 FPGA board.

$$\begin{matrix} C \rightarrow 12 \\ 0 \rightarrow 13 \end{matrix}$$

$$\begin{array}{r} 0000 | 0001 \\ 1111 | 1110 \\ 0000 | 0001 \\ \hline 00000010 \end{array}$$

3 2 1 0  
g+4+z = 14

FD

|                  |
|------------------|
| adder.v          |
| alu.v            |
| clockDiv.v       |
| control.v        |
| dataList         |
| extender.v       |
| hexTo7Segment.v  |
| memory.v         |
| mux.v            |
| nano_sc_system.v |
| nanopu.v         |
| oldmemory.v      |
| prog.list        |
| prog7.list       |
| prog2.list       |
| prog3.list       |
| quadSevenSeg.v   |
| regfile.v        |
| rom.v            |

## Laboratory 6: VGA and UART

(This lab is partly taken from Pitchaya Sitti-Amorn, Ph.D.)

### Objectives

1. Understand the asynchronous serial communication
2. Understand how to use FPGA to interface with external devices

### Background

One use of the FPGA is to interface with external devices that require precise timing. In this lab, we will be looking at two devices: VGA (Display), and a serial communication (UART).

### VGA

VGA protocol is designed when computer monitors still used cathode ray tube (CRT) devices. In those times, most of the controls are actually analog. While FPGA cannot output the analog signal to control the VGA port directly, it can be used with a resistor ladder to create a simple digital to analog signal (DAC) ( See VGA Port section of the BASYS3 reference<sup>7</sup> ).

And due to the analog design of the VGA signal, it will also require precise timing. In this lab, you will be interfacing with the VGA and make some adjustments from the given code. You can also get code and more information from embedded thoughts<sup>8</sup>.

Note. Please make sure that you understand H-Sync, V-Sync, and related signals. You may find it in a quiz.

### UART

Universal Asynchronous Receiver/transmitter or UART is a computer protocol that enables data transfer between two devices. UARTs are commonly used with the electrical layer standard such as TIA, RS-232, RS-422 or RS-485. The board you have in the lab has a UART port through the USB.

<sup>7</sup><https://reference.digilentinc.com/basys3/refmanual>

<sup>8</sup><https://embeddedthoughts.com/2016/07/29/driving-a-vga-monitor-using-an-fpga/>

The minimal communication requirement by UART uses only two wires, TX and RX. Figure 1 shows a typical UART communication.

In order for a device to send the data out, both devices must use the same clock rate (baud), data bit size, parity and stop bits.

<http://www.unm.edu/~zbaker/ece238/slides/UART.pdf> provides good details on the protocol.



In order to communicate with the UART on the Basys 3 board, you will need to install some serial communication software such as Putty, Tera Term, etc. You can also find more information about the USB-UART Bridge on the Basys 3 in the reference manual.

## Exercises

1. VGA: Either modify the example code or rewrite your own code so that the board will display gradients between two colors set by the switches. The gradient can be changed back and forth between horizontal to vertical via a push-button.
2. UART: You will implement a simple program that receives UART inputs, add one to each input character, then returns to the UART. If you test this with a console. For example, if you type “*abcde*” in the console, you should receive “*bcd ef*”.

You may choose any baud rate you would like to use. I recommend testing with 9600bps or 115200bps. You may also want to test the loopback, i.e. wiring TX and RX together.

Note: You might need terminal software to connect your computer to BASYS3. On Linux (and Mac OS X), try minicom or screen. On Windows, try RealTerm<sup>9</sup>, TerraTerm, or PuTTY.

<sup>9</sup><https://sourceforge.net/projects/realterm/>