

# EE599 Assignment 1

Xuening Zhao

5741894054

GitHub Link:

[https://github.com/Olorin7/EE599\\_XueningZhao\\_5741894054/tree/master/Assignment1](https://github.com/Olorin7/EE599_XueningZhao_5741894054/tree/master/Assignment1)

## 1. Odd-even Transmission Sort

### 1.1 16 elements sort

Testbench result:



Because Verilog does not support 2-D array, so I connect each 8-bit elements and use a  $16 \times 8$  bits port as input. That is, the input data is {3c, 4d, 5a, 1a, 6f, 31, 14, 7b, 3e, 01, 6e, 7b, 11, 11, 33, 7a}. After 16 clocks, the Dout shows the sorted result.

Elaborated design and synthesized design are attached at the end of the report.

Time Report:

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 1.644 ns | Worst Hold Slack (WHS): 0.175 ns | Worst Pulse Width Slack (WPWS): 2.000 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 257       | Total Number of Endpoints: 257   | Total Number of Endpoints: 130                    |

All user specified timing constraints are met.

Resource Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 197         | 14400     | 1.37          |
| FF       | 129         | 28800     | 0.45          |
| IO       | 258         | 54        | 477.78        |

### 1.2 32 elements sort

Elaborated design and synthesized design are attached at the end of the report.

Time Report:

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>1.644 ns</b> | Worst Hold Slack (WHS): <b>0.171 ns</b> | Worst Pulse Width Slack (WPWS): <b>2.000 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>515</b>       | Total Number of Endpoints: <b>515</b>   | Total Number of Endpoints: <b>260</b>                    |

All user specified timing constraints are met.

#### Resource Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 407         | 14400     | 2.83          |
| FF       | 259         | 28800     | 0.90          |
| IO       | 514         | 54        | 951.85        |

#### 1.3 64 elements sort

Elaborated design and synthesized design are attached at the end of the report.

#### Time Report:

##### Design Timing Summary

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>1.644 ns</b> | Worst Hold Slack (WHS): <b>0.173 ns</b> | Worst Pulse Width Slack (WPWS): <b>2.000 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>1029</b>      | Total Number of Endpoints: <b>1029</b>  | Total Number of Endpoints: <b>518</b>                    |

All user specified timing constraints are met.

#### Resource Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 825         | 14400     | 5.73          |
| FF       | 517         | 28800     | 1.80          |
| IO       | 1026        | 54        | 1900.00       |

#### 1.4 128 elements sort

Elaborated design and synthesized design are attached at the end of the report.

#### Time Report:

##### Design Timing Summary

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>1.644 ns</b> | Worst Hold Slack (WHS): <b>0.175 ns</b> | Worst Pulse Width Slack (WPWS): <b>2.000 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>2057</b>      | Total Number of Endpoints: <b>2057</b>  | Total Number of Endpoints: <b>1034</b>                   |

All user specified timing constraints are met.

## Resource Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 1661        | 14400     | 11.53         |
| FF       | 1033        | 28800     | 3.59          |
| IO       | 2050        | 54        | 3796.30       |

## 2. Dense Matrix-Matrix Multiplication

### 2.1 Design Problems

2.1.1  $n$  units

2.1.2  $\frac{n}{2^r}$  modules

2.1.3 Let  $r = \log_2 n$ , below is the same, the result is  $2^k + r$  bits

2.1.4  $n-1$

2.1.5  $r+1$  clocks

2.1.6 Parallel design:  $r+1$  clocks; Series design:  $n \times n \times (r+1)$  clocks

### 2.2 4\*4 matrices

Testbench result:



Because Verilog does not support 2-D array, so I connect each 8-bit elements and use two  $4 \times 8$  bits port as input. One is connected by 4 elements in a row and the other is connected by 4 elements in a column. That is, the input metrics are {3c, 4d, 5a, 1a; 6f, 31, 14, 7b; 3e, 01, 6e, 7b; 11, 11, 33, 7a} and {01, 5d, 30, 6b; 6e, f4, 0a, b1; 7b, a2, c7, 76; 1a, 22, 81, 41}. Then I generate 16 parallel multiply and adder trees to run the Multiplication. After 2 clocks, the Dout shows the result.

Elaborated design and synthesized design are attached at the end of the report.

Time Report:

#### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 2.886 ns | Worst Hold Slack (WHS): 0.170 ns | Worst Pulse Width Slack (WPWS): 2.000 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 52        | Total Number of Endpoints: 52    | Total Number of Endpoints: 117                    |

All user specified timing constraints are met.

#### Resource Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 334         | 14400     | 2.32          |
| FF       | 116         | 28800     | 0.40          |
| IO       | 84          | 54        | 155.56        |

So we can implement 43 multiply and adder trees.

#### Power Report:

##### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

**Total On-Chip Power:** 0.138 W  
**Design Power Budget:** Not Specified  
**Power Budget Margin:** N/A  
**Junction Temperature:** 26.6°C  
 Thermal Margin: 73.4°C (6.1 W)  
 Effective θJA: 11.5°C/W  
 Power supplied to off-chip devices: 0 W  
 Confidence level: Low

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity



### 2.3 8\*8 matrices

Elaborated design and synthesized design are attached at the end of the report.

#### Time Report:

### Design Timing Summary

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>2.867 ns</b> | Worst Hold Slack (WHS): <b>0.170 ns</b> | Worst Pulse Width Slack (WPWS): <b>2.000 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>123</b>       | Total Number of Endpoints: <b>123</b>   | Total Number of Endpoints: <b>252</b>                    |

All user specified timing constraints are met.

### Resource Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 685         | 14400     | 4.76          |
| FF       | 251         | 28800     | 0.87          |
| IO       | 149         | 54        | 275.93        |

### Power Report:

#### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

**Total On-Chip Power:** **0.152 W**  
**Design Power Budget:** **Not Specified**  
**Power Budget Margin:** **N/A**  
**Junction Temperature:** **26.8°C**  
**Thermal Margin:** **73.2°C (6.1 W)**  
**Effective θ<sub>JA</sub>:** **11.5°C/W**  
**Power supplied to off-chip devices:** **0 W**  
**Confidence level:** **Low**

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity



### 2.4 16\*16 matrices

Elaborated design and synthesized design are attached at the end of the report.

#### Time Report:

### Design Timing Summary

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>2.836 ns</b> | Worst Hold Slack (WHS): <b>0.170 ns</b> | Worst Pulse Width Slack (WPWS): <b>2.000 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>266</b>       | Total Number of Endpoints: <b>266</b>   | Total Number of Endpoints: <b>523</b>                    |

All user specified timing constraints are met.

### Recourse Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 1388        | 14400     | 9.64          |
| FF       | 522         | 28800     | 1.81          |
| IO       | 278         | 54        | 514.81        |

### Power Report:

#### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                      |
|-------------------------------------|----------------------|
| <b>Total On-Chip Power:</b>         | <b>0.176 W</b>       |
| <b>Design Power Budget:</b>         | <b>Not Specified</b> |
| <b>Power Budget Margin:</b>         | <b>N/A</b>           |
| <b>Junction Temperature:</b>        | <b>27.0°C</b>        |
| Thermal Margin:                     | 73.0°C (6.1 W)       |
| Effective θ <sub>JA</sub> :         | 11.5°C/W             |
| Power supplied to off-chip devices: | 0 W                  |
| Confidence level:                   | <a href="#">Low</a>  |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

#### On-Chip Power



### 2.5 32\*32 matrices

Elaborated design and synthesized design are attached at the end of the report.

### Time Report:

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 2.788 ns | Worst Hold Slack (WHS): 0.170 ns | Worst Pulse Width Slack (WPWS): 2.000 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 553       | Total Number of Endpoints: 553   | Total Number of Endpoints: 1066                   |

All user specified timing constraints are met.

### Recourse Report:

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 2795        | 14400     | 19.41         |
| FF       | 1065        | 28800     | 3.70          |
| IO       | 535         | 54        | 990.74        |

### Power Report:

#### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                |
|-------------------------------------|----------------|
| Total On-Chip Power:                | 0.23 W         |
| Design Power Budget:                | Not Specified  |
| Power Budget Margin:                | N/A            |
| Junction Temperature:               | 27.7°C         |
| Thermal Margin:                     | 72.3°C (6.1 W) |
| Effective θ <sub>JA</sub> :         | 11.5°C/W       |
| Power supplied to off-chip devices: | 0 W            |
| Confidence level:                   | Low            |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity



### 3. Design Schematics

# 16 elements sort

elaborated design:



synthesized design:



32 elements sort

elaborated design:  
next page is synthesized design





64 elements sort

elaborated design:

next 8 pages are synthesized design



















128 elements sort

elaborated design:

next 16 pages are synthesized design



































4\*4 matrices multiplication  
elaborated design:



synthesized design:



8\*8 matrices multiplication  
elaborated design:



synthesized design:



# 16\*16 matrices multiplication elaborated design:



synthesized design:



32\*32 matrices multiplication  
elaborated design:  
next page is synthesized design



