

## Computer-Aided VLSI System Design

### Homework 1: Arithmetic Logic Unit

TA: 曾柏豪 r13943123@ntu.edu.tw    Due Tuesday, Sep. 30<sup>th</sup>, 13:59

TA: 陳柏任 d13943013@ntu.edu.tw

### Data Preparation

1. Unpack 1141\_hw1.tar with the following command

```
tar -xvf 1141_hw1.tar
```

| Folder                 | File        | Description                         |
|------------------------|-------------|-------------------------------------|
| 00_TESTBED             | testbench.v | File to test your design            |
| 00_TESTBED<br>/pattern | INST*_I.dat | Input instruction patterns          |
|                        | INST*_O.dat | Output golden patterns              |
| 01_RTL                 | alu.v       | Your design                         |
|                        | rtl.f       | File list for RTL simulation        |
|                        | 01_run      | VCS command for simulation          |
|                        | 99_clean    | Command for cleaning temporal files |

### Introduction

An arithmetic logic unit (ALU) is one of the components of a computer processor. It performs arithmetic and bit-level logical operations in a computer. In this homework, you are going to design an ALU with some special instructions.

## Block Diagram



## Specifications

1. Top module name: alu
2. Input/output description:

| Signal Name        | I/O | Width | Simple Description                                                                                                                                                                               |
|--------------------|-----|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>i_clk</b>       | I   | 1     | Clock signal in the system                                                                                                                                                                       |
| <b>i_rst_n</b>     | I   | 1     | Active <b>low</b> asynchronous reset                                                                                                                                                             |
| <b>i_in_valid</b>  | I   | 1     | The signal is <b>high</b> if input data is ready                                                                                                                                                 |
| <b>o_busy</b>      | O   | 1     | Set <b>low</b> if ready for next input data.<br>Set <b>high</b> to pause input sequence.                                                                                                         |
| <b>i_inst</b>      | I   | 4     | Instruction for ALU to perform                                                                                                                                                                   |
| <b>i_data_a</b>    | I   | 16    | Signed input data with 2's complement representation<br>1. For instructions 0000~0011, fixed-point number<br>(6-bit signed integer + 10-bit fraction)                                            |
| <b>i_data_b</b>    | I   | 16    | 2. For instructions 0100~1001, integer (for<br>instruction 0100, it is unsigned integer!)                                                                                                        |
| <b>o_out_valid</b> | O   | 1     | Set <b>high</b> if ready to output result                                                                                                                                                        |
| <b>o_data</b>      | O   | 16    | Signed output data with 2's complement representation<br>1. For instructions 0000~0011, fixed-point number<br>(6-bit signed integer + 10-bit fraction)<br>2. For instructions 0100~1001, integer |

3. Active low asynchronous reset is used only once.

4. All inputs are synchronized with the **negative** clock edge.
5. All outputs should be synchronized with the **positive** clock edge. That is, flip-flops should be added before all outputs.
6. New pattern (*i\_inst*, *i\_data\_a* and *i\_data\_b*) is ready only when *i\_in\_valid* is high.
7. At each negative clock edge, *i\_in\_valid* will be randomly pulled high only if *o\_busy* is low.
8. *o\_out\_valid* should be pulled high for only one cycle for each *o\_data*.
9. The testbench will sample *o\_data* at **negative** clock edge if *o\_out\_valid* is high.
10. You can raise *o\_out\_valid* at any moment.

## Design Description

1. The following table shows the operations you need to implement on your ALU:

| <b>i_inst[3:0]</b> | <b>Operation</b>                       | <b>Description</b>                                                  |
|--------------------|----------------------------------------|---------------------------------------------------------------------|
| 4'b0000            | Signed Addition                        | $o\_data = i\_data\_a + i\_data\_b$                                 |
| 4'b0001            | Signed Subtraction                     | $o\_data = i\_data\_a - i\_data\_b$                                 |
| 4'b0010            | Signed Multiplication and Accumulation | $o\_data = i\_data\_a * i\_data\_b + data\_acc_{old}$               |
| 4'b0011            | Taylor Expansion of Sin Function       | $o\_data = \sum_{n=0}^2 \frac{(-1)^n}{(2n+1)!} (i\_data\_a)^{2n+1}$ |
| 4'b0100            | Binary to Gray Code                    | Encode the gray code result                                         |
| 4'b0101            | LRCW                                   | Encode the CPOP result                                              |
| 4'b0110            | Right Rotation                         | Rotate <i>i_data_a</i> right by <i>i_data_b</i> bits                |
| 4'b0111            | Count Leading Zeros                    | Count leading 0's in <i>i_data_a</i>                                |
| 4'b1000            | Reverse Match4                         | Custom bit-level operation                                          |
| 4'b1001            | Matrix Transpose                       | Transpose an 8*8 matrix                                             |

2. Specifications on number representation:
  - a. For instructions 0000~0011, saturation must be applied to the final result. That is, if the final result exceeds the maximum (minimum) representable value of 16-bit representation (6-bit integer + 10-bit fraction), use the maximum (minimum) value as output.
  - b. For instructions 0010 and 0011, rounding must be applied to the final result before saturation. The rounding mode used is **rounding to the nearest[2]**. That is, if the value of the remaining bits under LSB is greater than or equal to half the value representable by LSB, it should be rounded up.
3. For instruction 0010, The accumulator, *data\_acc*, is cleared to 0 when the active-low reset *i\_RST\_N* is applied. The value in *data\_acc* is first clamped (saturated) to its maximum or minimum threshold (16 bits int, 20 bits fraction) before the next accumulation cycle begins. Noted that any **intermediate value or accumulation**

**value to be accumulated cannot be rounded**

4. For instruction 0011, please use the following Taylor Expansion to compute Sin [3], a non-linear function. *i\_data\_a* is guaranteed to be between 1.0 and -1.0. **It is not allowed to use the division operator (/) in Verilog code.**

$$\sin(i\_data\_a) \cong \sum_{n=0}^2 \frac{(-1)^n}{(2n+1)!} (i\_data\_a)^{2n+1}$$

5. For instructions 0110, *i\_data\_b* is guaranteed to be from 0 to 16 (inclusive).
6. For instructions 0111, 1000 and 1001, it is recommended to use for loops.
7. For all instructions, there cannot be any combinational loop. Otherwise, the instruction will not be scored.
8. For instruction 1000, implement the following custom bit-level operation.
- $$o\_data[i] = \begin{cases} (i\_data\_a[i + 3 : i] == i\_data\_b[15 - i : 12 - i]), & i = 0 \sim 12 \\ 0, & i = 13 \sim 15 \end{cases}$$
9. For instruction 1001, a matrix transpose operation requires collecting 8 cycles of valid input data. These cycles may arrive non-consecutively, but the entire 8-cycle transfer is guaranteed to complete before the next instruction is issued.
10. For instruction 1001, the output data is transmitted over 8 cycles. The *o\_valid* signal must be asserted during each cycle that contains valid output data. Besides, all valid results for the current matrix transpose operation must be output before the results from the **subsequent instruction** are transmitted
11. You **CANNOT** implement any operation by look up tables (the scaling factors of reciprocal in instruction 0011 are allowed).
12. You are **NOT** allowed to use DesignWare.

### Sample Waveform



## Instruction 1001 Waveform



## Submission

1. Create a folder named **studentID\_hw1** and follow the hierarchy below.

```
r13943000_hw1
└── 01_RTL
    ├── alu.v
    ├── xxx.v (other verilog files you wrote)
    └── rtl.f
```

Note: Use **lowercase** for all the letters. (e.g. r13943000\_hw1)

2. Pack the folder **studentID\_hw1** into a **tar** file named **studentID\_hw1\_vk.tar** (**k** is the number of version, **k = 1,2,...**). TA will only check the last version.

```
tar -cvf studentID_hw1_vk.tar studentID_hw1
```

Note:

- a. Use **lowercase** for all the letters. (e.g. r13943000\_hw1\_v1.tar)
  - b. Pack the folder on IC Design LAB server to avoid OS related problems.
3. Submit to NTU Cool

## Grading Policy

1. TA will run your code with the following format of command. Make sure to run this command with no error message on IC Design LAB server.

```
vcs -full64 -R -f rtl.f +v2k -sverilog -debug_access+all +define+$1
```

2. Pass all the instruction tests to get full score.
  - Released patterns: **70%**

| i_inst[3:0] | Operation                           | Score |
|-------------|-------------------------------------|-------|
| 4'b0000     | Signed Addition                     | 5%    |
| 4'b0001     | Signed Subtraction                  | 5%    |
| 4'b0010     | Signed MAC                          | 10%   |
| 4'b0011     | Taylor Expansion<br>of Sin Function | 10%   |
| 4'b0100     | Binary to Gray Code                 | 5%    |
| 4'b0101     | LRCW                                | 5%    |
| 4'b0110     | Right Rotation                      | 5%    |
| 4'b0111     | Count Leading Zeros                 | 8%    |
| 4'b1000     | Reverse Match4                      | 8%    |
| 4'b1001     | Matrix Transpose                    | 9%    |

- Hidden patterns: **30%**
  - Mixture of all instructions
- 3. SpyGlass check (goal: lint rtl and lint rtl enhanced) with **error: -20%**
- 4. Lose **5 points** for any incorrect naming or format.
  - It is your responsibility to ensure that the files can be correctly unpacked and executed on IC Design LAB server.
- 5. No late submission
  - 0 point for this homework
- 6. No plagiarism
  - Plagiarism in any form, including copying from online sources, is strictly prohibited.

## References

---

1. Reference for fixed-point representation  
<https://www.allaboutcircuits.com/technical-articles/fixed-point-representation-the-q-format-and-addition-examples/>
2. Reference for rounding to the nearest  
<https://www.mathworks.com/help/fixedpoint/ug/rounding.html>
3. Reference for Taylor Expansion function  
[https://en.wikipedia.org/wiki/Taylor\\_series](https://en.wikipedia.org/wiki/Taylor_series)
4. Reference for reciprocal multiplication  
<https://homepage.divms.uiowa.edu/~jones/bcd/divide.html>
5. Reference for LRCW  
<Comparing fast implementations of bit permutation instructions>