

# EE 599: Assignment 1

Suraj Chakravarthi Raja | [surajcha@usc.edu](mailto:surajcha@usc.edu) | USC ID: 1389-2888-44

GitHub Repo: [github.com/solaremporer/EE-599\\_SurajChakravarthiRaja\\_1389288844](https://github.com/solaremporer/EE-599_SurajChakravarthiRaja_1389288844)

# Problem 1: Odd-Even Transposition Sort



Odd-even transposition sort is a parallel sorting algorithm. It sorts a sequence of  $n$  sorting elements in  $n$  clocks given that  $n$  is even. It uses  $n/2$  compare-exchange processing elements operating in parallel during each clock cycle.

Here are the specifications of the sorting engine, for sorting a sequence of 'n' elements.

- Number of compare-exchange processing elements:  $n/2$
- Number of clocks to produce a result:  $n$  clock cycles
- This design is not pipelined
- The number of elements 'n' must be even for this approach to work

## Design files

The complete set of design files are available as in the [GitHub repo](#).

- **sorter.v** : This top-level module contains the complete RTL design of sorting engine

```
module sorter #(
    parameter BIT_WIDTH = 8,
    parameter SEQ_WIDTH = 16 // this is 'n'
)
(
    input    clk,
    input    reset,
    input    start,
    input    [(BIT_WIDTH*SEQ_WIDTH)-1 : 0] in,
    output   [(BIT_WIDTH*SEQ_WIDTH)-1 : 0] out,
    output    valid,
);

```

- The parameter `BIT_WIDTH` sets the bit-width of each element. It is set to 8 bits for this report.
- The parameter `SEQ_WIDTH` is used to set the number of elements in the sequence. *NOTE: Remember that this number must always be an even number.*
- When the `reset` is cleared, so long as the `start` signal is active low, the input sequence is loaded at the positive edge of every clock cycle.
- When the resulting sorted sequence is ready, the `valid` output line goes ACTIVE HIGH.

- **sorter\_tb.v** : Testbench used to simulate, probe and validate the design.

```
module sorter_tb #(
    parameter BIT_WIDTH = 8,
    parameter SEQ_WIDTH = 16 // this is 'n'
)
(
);

```

- The parameter `BIT_WIDTH` sets the bit-width of each element. It is set to 8 bits for this report.
- The parameter `SEQ_WIDTH` (default value = 16) is used to set the number of elements in the sequence. *NOTE: Remember that this number must always be an even number.*
- The testbench randomly generates a sequence of '`SEQ_WIDTH`' number of unsigned integer elements for sorting.

## Simulation waveforms (16 element sequence)

The simulation uses a 10 ns clock (clk).

In the simulation waveform, you will see that at the 20 ns mark, the `start` line is set to HIGH. This begins the sorting process. The input sequence (seq) is 1, 5, 0, 2, 9, 5, 9, 2, 2, 1, 6, 3, 7, 7, 7, 1.

We see that at the 130 ns mark, the final sorted result is already available. This is because not all sequences need the entire 16 clock cycles to complete. But, some sequences need the additional time, so we let the sorting continue. At the 170 ns mark, the `valid` line goes HIGH (after 16 clock cycles) indicating that sorting is complete. The sorted new sequence (newSeq) is 0, 1, 1, 1, 2, 2, 2, 3, 5, 6, 6, 7, 7, 7, 9, 9.

(The enlarged waveform is on the next page)



## Elaborated Design (16 element sequence)

(Inserted in the next page)



## Synthesized Schematic (16 element sequence)

(Inserted in the next page)



# Synthesis Reports (16 element sequence)

## Resource utilization report (16 element sequence)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 17:15:37 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file sorter_utilization_synth.rpt -pb  
sorter_utilization_synth.pb  
| Design      : sorter  
| Device      : 7z007sclg225-2  
| Design State: Synthesized
```

### Utilization Design Information

#### Table of Contents

- ```
-----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists
```

#### 1. Slice Logic

| Site Type              | Used       | Fixed    | Available    | Util%       |
|------------------------|------------|----------|--------------|-------------|
| <b>Slice LUTs*</b>     | <b>262</b> | <b>0</b> | <b>14400</b> | <b>1.82</b> |
| LUT as Logic           | 262        | 0        | 14400        | 1.82        |
| LUT as Memory          | 0          | 0        | 6000         | 0.00        |
| <b>Slice Registers</b> | <b>134</b> | <b>0</b> | <b>28800</b> | <b>0.47</b> |
| Register as Flip Flop  | 134        | 0        | 28800        | 0.47        |
| Register as Latch      | 0          | 0        | 28800        | 0.00        |
| <b>F7 Muxes</b>        | <b>0</b>   | <b>0</b> | <b>8800</b>  | <b>0.00</b> |
| <b>F8 Muxes</b>        | <b>0</b>   | <b>0</b> | <b>4400</b>  | <b>0.00</b> |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

#### 1.1 Summary of Registers by Type

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 134   | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%  |
|---------------|------|-------|-----------|--------|
| Bonded IOB    | 260  | 0     | 54        | 481.48 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00   |
| Bonded IOPADs | 0    | 0     | 130       | 0.00   |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00   |
| PHASER_REF    | 0    | 0     | 2         | 0.00   |
| OUT_FIFO      | 0    | 0     | 8         | 0.00   |
| IN_FIFO       | 0    | 0     | 8         | 0.00   |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00   |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT4     | 242  | LUT                 |
| FDRE     | 134  | Flop & Latch        |
| IBUF     | 131  | IO                  |
| OBUF     | 129  | IO                  |
| LUT5     | 115  | LUT                 |
| CARRY4   | 30   | CarryLogic          |
| LUT3     | 17   | LUT                 |
| LUT6     | 14   | LUT                 |

|      |   |       |
|------|---|-------|
| LUT2 | 2 | LUT   |
| LUT1 | 2 | LUT   |
| BUFG | 1 | Clock |

---

## 8. Black Boxes

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## 9. Instantiated Netlists

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## Timing estimation report (16 element sequence)

### Design Timing Summary

| Setup                                       | Hold                                    | Pulse Width                                              |
|---------------------------------------------|-----------------------------------------|----------------------------------------------------------|
| Worst Negative Slack (WNS): <b>6.644 ns</b> | Worst Hold Slack (WHS): <b>0.131 ns</b> | Worst Pulse Width Slack (WPWS): <b>4.500 ns</b>          |
| Total Negative Slack (TNS): <b>0.000 ns</b> | Total Hold Slack (THS): <b>0.000 ns</b> | Total Pulse Width Negative Slack (TPWS): <b>0.000 ns</b> |
| Number of Failing Endpoints: <b>0</b>       | Number of Failing Endpoints: <b>0</b>   | Number of Failing Endpoints: <b>0</b>                    |
| Total Number of Endpoints: <b>268</b>       | Total Number of Endpoints: <b>268</b>   | Total Number of Endpoints: <b>135</b>                    |

All user specified timing constraints are met.

## Power estimation report (16 element sequence)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                      |
|-------------------------------------|----------------------|
| <b>Total On-Chip Power:</b>         | <b>0.127 W</b>       |
| <b>Design Power Budget:</b>         | <b>Not Specified</b> |
| <b>Power Budget Margin:</b>         | <b>N/A</b>           |
| <b>Junction Temperature:</b>        | <b>26.5°C</b>        |
| Thermal Margin:                     | 73.5°C (6.2 W)       |
| Effective 9JA:                      | 11.5°C/W             |
| Power supplied to off-chip devices: | 0 W                  |
| Confidence level:                   | <b>Low</b>           |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Elaborated Design (32 element sequence)

(Inserted in the next page)



## Synthesized Schematic (32 element sequence)

(Inserted in the next page)



# Synthesis Reports (32 element sequence)

## Resource utilization report (32 element sequence)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 17:38:18 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file sorter_utilization_synth.rpt -pb  
sorter_utilization_synth.pb  
| Design      : sorter  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
----  
----
```

### Utilization Design Information

#### Table of Contents

- ```
----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists
```

#### 1. Slice Logic

| Site Type             | Used | Fixed | Available | Util% |
|-----------------------|------|-------|-----------|-------|
| Slice LUTs*           | 537  | 0     | 14400     | 3.73  |
| LUT as Logic          | 537  | 0     | 14400     | 3.73  |
| LUT as Memory         | 0    | 0     | 6000      | 0.00  |
| Slice Registers       | 266  | 0     | 28800     | 0.92  |
| Register as Flip Flop | 266  | 0     | 28800     | 0.92  |
| Register as Latch     | 0    | 0     | 28800     | 0.00  |
| F7 Muxes              | 0    | 0     | 8800      | 0.00  |
| F8 Muxes              | 0    | 0     | 4400      | 0.00  |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

#### 1.1 Summary of Registers by Type

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 266   | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%  |
|---------------|------|-------|-----------|--------|
| Bonded IOB    | 516  | 0     | 54        | 955.56 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00   |
| Bonded IOPADs | 0    | 0     | 130       | 0.00   |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00   |
| PHASER_REF    | 0    | 0     | 2         | 0.00   |
| OUT_FIFO      | 0    | 0     | 8         | 0.00   |
| IN_FIFO       | 0    | 0     | 8         | 0.00   |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00   |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT4     | 497  | LUT                 |
| FDRE     | 266  | Flop & Latch        |
| IBUF     | 259  | IO                  |
| OBUF     | 257  | IO                  |
| LUT5     | 244  | LUT                 |
| CARRY4   | 62   | CarryLogic          |
| LUT6     | 31   | LUT                 |
| LUT3     | 17   | LUT                 |

|      |   |       |
|------|---|-------|
| LUT1 | 5 | LUT   |
| LUT2 | 2 | LUT   |
| BUFG | 1 | Clock |

---

## 8. Black Boxes

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## 9. Instantiated Netlists

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## Timing estimation report (32 element sequence)

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 6.644 ns | Worst Hold Slack (WHS): 0.131 ns | Worst Pulse Width Slack (WPWS): 4.500 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 532       | Total Number of Endpoints: 532   | Total Number of Endpoints: 267                    |

All user specified timing constraints are met.

## Power estimation report (32 element sequence)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                |
|-------------------------------------|----------------|
| Total On-Chip Power:                | 0.164 W        |
| Design Power Budget:                | Not Specified  |
| Power Budget Margin:                | N/A            |
| Junction Temperature:               | 26.9°C         |
| Thermal Margin:                     | 73.1°C (6.1 W) |
| Effective θ <sub>JA</sub> :         | 11.5°C/W       |
| Power supplied to off-chip devices: | 0 W            |
| Confidence level:                   | Low            |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Elaborated Design (64 element sequence)

(Inserted in the next page)



## Synthesized Schematic (64 element sequence)

(For brevity, only the first page of the schematic Inserted in the next page. The complete 11 page schematic inserted in **Appendix 1**)



# Synthesis Reports (64 element sequence)

## Resource utilization report (64 element sequence)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 17:50:59 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file sorter_utilization_synth.rpt -pb  
sorter_utilization_synth.pb  
| Design      : sorter  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
----  
----
```

### Utilization Design Information

#### Table of Contents

- ```
----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists
```

#### 1. Slice Logic

| Site Type             | Used | Fixed | Available | Util% |
|-----------------------|------|-------|-----------|-------|
| Slice LUTs*           | 1085 | 0     | 14400     | 7.53  |
| LUT as Logic          | 1085 | 0     | 14400     | 7.53  |
| LUT as Memory         | 0    | 0     | 6000      | 0.00  |
| Slice Registers       | 525  | 0     | 28800     | 1.82  |
| Register as Flip Flop | 525  | 0     | 28800     | 1.82  |
| Register as Latch     | 0    | 0     | 28800     | 0.00  |
| F7 Muxes              | 0    | 0     | 8800      | 0.00  |
| F8 Muxes              | 0    | 0     | 4400      | 0.00  |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

#### 1.1 Summary of Registers by Type

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 525   | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%   |
|---------------|------|-------|-----------|---------|
| Bonded IOB    | 1028 | 0     | 54        | 1903.70 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00    |
| Bonded IOPADs | 0    | 0     | 130       | 0.00    |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00    |
| PHASER_REF    | 0    | 0     | 2         | 0.00    |
| OUT_FIFO      | 0    | 0     | 8         | 0.00    |
| IN_FIFO       | 0    | 0     | 8         | 0.00    |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00    |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT4     | 1009 | LUT                 |
| FDRE     | 525  | Flop & Latch        |
| IBUF     | 515  | IO                  |
| OBUF     | 513  | IO                  |
| LUT5     | 500  | LUT                 |
| CARRY4   | 126  | CarryLogic          |
| LUT6     | 64   | LUT                 |
| LUT3     | 17   | LUT                 |

|      |   |       |
|------|---|-------|
| LUT1 | 7 | LUT   |
| LUT2 | 3 | LUT   |
| BUFG | 1 | Clock |

---

## 8. Black Boxes

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## 9. Instantiated Netlists

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## Timing estimation report (64 element sequence)

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 6.644 ns | Worst Hold Slack (WHS): 0.134 ns | Worst Pulse Width Slack (WPWS): 4.500 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 1050      | Total Number of Endpoints: 1050  | Total Number of Endpoints: 526                    |

All user specified timing constraints are met.

## Power estimation report (64 element sequence)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                |
|-------------------------------------|----------------|
| Total On-Chip Power:                | 0.236 W        |
| Design Power Budget:                | Not Specified  |
| Power Budget Margin:                | N/A            |
| Junction Temperature:               | 27.7°C         |
| Thermal Margin:                     | 72.3°C (6.0 W) |
| Effective θ <sub>JA</sub> :         | 11.5°C/W       |
| Power supplied to off-chip devices: | 0 W            |
| Confidence level:                   | Low            |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Elaborated Design (128 element sequence)

(Inserted in the next 4 pages)









## Synthesized Schematic (128 element sequence)

(For brevity, only the first page of the schematic Inserted in the next page. The complete 21 page schematic inserted in **Appendix 2**)



# Synthesis Reports (128 element sequence)

## Resource utilization report (128 element sequence)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

---

```
----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 18:24:11 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file sorter_utilization_synth.rpt -pb  
sorter_utilization_synth.pb  
| Design      : sorter  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
----
```

---

### Utilization Design Information

#### Table of Contents

---

- 1. Slice Logic
- 1.1 Summary of Registers by Type
- 2. Memory
- 3. DSP
- 4. IO and GT Specific
- 5. Clocking
- 6. Specific Feature
- 7. Primitives
- 8. Black Boxes
- 9. Instantiated Netlists

#### 1. Slice Logic

---

| Site Type             | Used | Fixed | Available | Util% |
|-----------------------|------|-------|-----------|-------|
| Slice LUTs*           | 2179 | 0     | 14400     | 15.13 |
| LUT as Logic          | 2179 | 0     | 14400     | 15.13 |
| LUT as Memory         | 0    | 0     | 6000      | 0.00  |
| Slice Registers       | 1042 | 0     | 28800     | 3.62  |
| Register as Flip Flop | 1042 | 0     | 28800     | 3.62  |
| Register as Latch     | 0    | 0     | 28800     | 0.00  |
| F7 Muxes              | 0    | 0     | 8800      | 0.00  |
| F8 Muxes              | 0    | 0     | 4400      | 0.00  |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

#### 1.1 Summary of Registers by Type

---

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 1042  | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%   |
|---------------|------|-------|-----------|---------|
| Bonded IOB    | 2052 | 0     | 54        | 3800.00 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00    |
| Bonded IOPADs | 0    | 0     | 130       | 0.00    |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00    |
| PHASER_REF    | 0    | 0     | 2         | 0.00    |
| OUT_FIFO      | 0    | 0     | 8         | 0.00    |
| IN_FIFO       | 0    | 0     | 8         | 0.00    |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00    |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT4     | 2033 | LUT                 |
| FDRE     | 1042 | Flop & Latch        |
| IBUF     | 1027 | IO                  |
| OBUF     | 1025 | IO                  |
| LUT5     | 1011 | LUT                 |
| CARRY4   | 254  | CarryLogic          |
| LUT6     | 129  | LUT                 |
| LUT3     | 18   | LUT                 |

|      |    |       |
|------|----|-------|
| LUT1 | 11 | LUT   |
| LUT2 | 4  | LUT   |
| BUFG | 1  | Clock |

---

## 8. Black Boxes

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## 9. Instantiated Netlists

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## Timing estimation report (128 element sequence)

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 6.644 ns | Worst Hold Slack (WHS): 0.134 ns | Worst Pulse Width Slack (WPWS): 4.500 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 1050      | Total Number of Endpoints: 1050  | Total Number of Endpoints: 526                    |

All user specified timing constraints are met.

## Power estimation report (128 element sequence)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                      |
|-------------------------------------|----------------------|
| <b>Total On-Chip Power:</b>         | <b>0.384 W</b>       |
| <b>Design Power Budget:</b>         | <b>Not Specified</b> |
| <b>Power Budget Margin:</b>         | <b>N/A</b>           |
| <b>Junction Temperature:</b>        | <b>29.4°C</b>        |
| Thermal Margin:                     | 70.6°C (5.9 W)       |
| Effective θ <sub>JA</sub> :         | 11.5°C/W             |
| Power supplied to off-chip devices: | 0 W                  |
| Confidence level:                   | Low                  |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Problem 2: Dense matrix-matrix multiplication



Given two matrices A (size  $m \times n$ ) and B (size  $n \times p$ ), the product matrix C (size  $m \times p$ ) is denoted as  $C = AB$  such that each element of C is computed as:

$$c_{i,j} = \sum_{k=1}^n a_{i,k}b_{k,j} \quad \forall i \in \{1, 2, \dots, m\}, j \in \{1, 2, \dots, p\}$$

For our problem, we will have both A and B matrices be of size  $n \times n$  such that  $n = 2^r$

We will use a multiply-adder tree where with  $0$  thru  $r$  levels where  $n = 2^r$  which will accept two vectors of  $n$  elements each to produce one Multiply-Accumulate (MAC) result. To obtain a complete  $n \times n$  C matrix, we will need  $n^2$  MAC results. Each level or stage of the tree will be pipelined to maximize the throughput and parallelism of the design. So, once the pipeline is filled up, we will produce one result for every clock cycle.

## Design Problems

Let us consider a Multiply and Adder Tree (MulandAddTree) with  $n$  element multiplication.

- This design needs  $n$  Multiply units in total, all in stage 0.
- Stages 1 –  $r$  of the tree have  $2^{(\log_2 n) - r} = n - 2^r$  adders in each stage where  $r = \log_2 n$ .
- So, in total we would need  $n - 1$  Adder units in the entire tree.
- If we have  $k$ -bit inputs, we would need a  $2k$ -bit output at the output of the multiply units. The adder units, therefore, must also be  $2k$  bits wide. So, the final result of the Multiply and Adder Tree will be  $2k$  bits wide.
- The first result appears at the output after ' $r$ ' clock cycles (where  $r = \log_2 n$  or height of the tree).
- After the first result appears, we will see consecutive results appear in every following clock cycle as the tree is pipelined. So, we will need  $r + (n - 1)$  clock cycles to compute the entire matrix multiplication.

## Design files

The complete set of design files are available in the [GitHub repo](#).

- **MulandAddTree.v** : This top-level module contains the complete RTL design of Multiply and Adder Tree

```
module MulandAddTree #(
    parameter BIT_WIDTH = 8,
    parameter STAGE_WIDTH = 4
)
(
    input                                clk,
    input                                reset,
    input  [(STAGE_WIDTH*BIT_WIDTH)-1 : 0] a,
    input  [(STAGE_WIDTH*BIT_WIDTH)-1 : 0] b,
    output [              (2*BIT_WIDTH) -1 : 0] c,
    output                                valid
);
```

- The parameter `BIT_WIDTH` sets the bit-width of each input. The output is twice that width.
- The parameter `SEQ_WIDTH` is equal to  $n$  where each matrix A, B, C are of size  $n \times n$ . *NOTE: Remember that this number must always be a power of 2.*
- When the `reset` is cleared, the processing begins at the positive edge of the following clock cycle.
- So long as `reset` is cleared, every input at the positive edge of the clock is considered for processing.
- When the result is ready, the `valid` output line goes ACTIVE HIGH.

- **multiply.v** : This contains the RTL design of Multiply unit

```
module Multiply #(
    parameter BIT_WIDTH = 8
)
(
    input                                clk,
```

```

        input                      reset,
        input  [  BIT_WIDTH -1 : 0]  in1,
        input  [  BIT_WIDTH -1 : 0]  in2,
        output [(2*BIT_WIDTH)-1 : 0] pdt,
        output                      valid
    );

```

- The parameter `BIT_WIDTH` sets the bit-width of each element. It is set to 8 bits for this report.
  - When the `reset` is cleared, the product is computed at the positive edge of the following clock cycle.
  - So long as `reset` is cleared, every input at the positive edge of the clock is considered for processing.
  - When the result is ready, the `valid` output line goes ACTIVE HIGH.
- **adder.v** : This contains the RTL design of Adder unit
 

```

module Adder #(
    parameter BIT_WIDTH = 8
)
(
    input                      clk,
    input                      reset,
    input  [BIT_WIDTH-1 : 0]    a,
    input  [BIT_WIDTH-1 : 0]    b,
    output [BIT_WIDTH-1 : 0]    sum,
    output                      valid
);

```

    - The parameter `BIT_WIDTH` sets the bit-width of the adder unit.
    - When the `reset` is cleared, the product is computed at the positive edge of the following clock cycle.
    - So long as `reset` is cleared, every input at the positive edge of the clock is considered for processing.
    - When the result is ready, the `valid` output line goes ACTIVE HIGH.

- **MulandAddTree\_tb.v** : Testbench used to simulate, probe and validate the design.

```

module MulandAddTree_tb#(
    parameter BIT_WIDTH = 8,
    parameter STAGE_WIDTH = 4
)
(
);

```

- The parameter `BIT_WIDTH` sets the bit-width of each element. It is set to 8 bits for this report.
- The parameter `SEQ_WIDTH` is equal to  $n$  where each matrix A, B, C are of size  $n \times n$ . *NOTE: Remember that this number must always be a power of 2.*
- The testbench randomly generates A and B matrices of size '`SEQ_WIDTH` x '`SEQ_WIDTH`' filled with unsigned integer elements.
- It then feeds the appropriate rows and columns of A and B respectively to obtain each of the  $n^2$  elements of the resulting matrix C.

## Simulation waveforms (4x4 matrices)

### **Randomly generated inputs:**

$A = [[3, 2, 0, 2]; [0, 7, 7, 6]; [8, 3, 9, 8]; [9, 7, 5, 0]]$

$B = [[0, 8, 5, 9]; [6, 7, 4, 0]; [7, 1, 9, 5]; [0, 5, 5, 8]]$

### **Expected Result:**

$B = [[12, 48, 33, 43]; [91, 86, 121, 83]; [81, 134, 173, 181]; [77, 126, 118, 106]]$

For the purpose of simulation, buses **AA**, **BB** and **CC** on the testbench represent the full matrices A, B and C respectively.

The buses **a** and **b** are the respective 8-bit input vectors (row and column) to the mult-add tree.

The bus **c** is the single 16-bit output of the mult-add tree.

The testbench feeds rows and columns from **AA** and **BB** into **a** and **b** respectively. Likewise, it also records output from **c** into **CC**.

The simulation uses a 10 ns clock (**clk**).

In the simulation waveform, you will see that at the 15 ns mark, the **reset** line is cleared. So, at the following positive edge of the clock (at the 20 ns mark), the mult-add tree starts to compute the first output of matrix **CC** ( $C_{0,0}$ ).

After two clock cycles (at the 40 ns mark), the **valid** line goes HIGH to show that the result of the first computation. This is visible in bus **c** as the unsigned integer 12 which matches the expected result.

From now on, at every following positive edge of the clock, we will see valid results feed out of the mult-add tree and into the bus **c**.

Valid outputs are marked on the simulation waveform as points 40 ns, 50 ns, and so on upto 190 ns, producing results which match the expected results:

12, 48, 33, 43, 91, 86, 121, 83, 81, 134, 173, 181, 77, 126, 118, 106

The testbench uses these results from **c** to populate bus **CC** which represents the output matrix which becomes valid at the 200 ns mark.

The simulation automatically ends at the 230 ns mark.

**(The enlarged waveform is on the next page)**



## Elaborated Design (4x4 matrices - full mult-add tree)

(Inserted in the next page)



## Elaborated Design (4x4 matrices - multiply unit only)

(Inserted in the next page)



## Elaborated Design (4x4 matrices - adder unit only)

(Inserted in the next page)

### ADDER\_TREE[1].ADDER\_STAGE[0].SUM



## Synthesized Schematic (4x4 matrices)

(Inserted in the next page - for brevity, only one multiplier module and one adder module expanded)



# Synthesis Reports (4x4 matrices)

## Resource utilization report (4x4 matrices)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
-----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 21:22:59 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file MulandAddTree_utilization_synth.rpt -pb  
MulandAddTree_utilization_synth.pb  
| Design      : MulandAddTree  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
-----
```

### Utilization Design Information

#### Table of Contents

- ```
-----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists  
-----  
1. Slice Logic  
-----
```

| Site Type              | Used       | Fixed    | Available    | Util%       |
|------------------------|------------|----------|--------------|-------------|
| <b>Slice LUTs*</b>     | <b>334</b> | <b>0</b> | <b>14400</b> | <b>2.32</b> |
| LUT as Logic           | 334        | 0        | 14400        | 2.32        |
| LUT as Memory          | 0          | 0        | 6000         | 0.00        |
| <b>Slice Registers</b> | <b>115</b> | <b>0</b> | <b>28800</b> | <b>0.40</b> |
| Register as Flip Flop  | 115        | 0        | 28800        | 0.40        |
| Register as Latch      | 0          | 0        | 28800        | 0.00        |
| <b>F7 Muxes</b>        | <b>0</b>   | <b>0</b> | <b>8800</b>  | <b>0.00</b> |
| <b>F8 Muxes</b>        | <b>0</b>   | <b>0</b> | <b>4400</b>  | <b>0.00</b> |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

- ```
-----  
1.1 Summary of Registers by Type  
-----
```

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 115   | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%  |
|---------------|------|-------|-----------|--------|
| Bonded IOB    | 83   | 0     | 54        | 153.70 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00   |
| Bonded IOPADs | 0    | 0     | 130       | 0.00   |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00   |
| PHASER_REF    | 0    | 0     | 2         | 0.00   |
| OUT_FIFO      | 0    | 0     | 8         | 0.00   |
| IN_FIFO       | 0    | 0     | 8         | 0.00   |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00   |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |  |
|----------|------|---------------------|--|
| LUT2     | 152  | LUT                 |  |
| LUT6     | 148  | LUT                 |  |
| FDRE     | 115  | Flop & Latch        |  |
| LUT4     | 84   | LUT                 |  |
| IBUF     | 66   | IO                  |  |
| CARRY4   | 52   | CarryLogic          |  |
| OBUF     | 17   | IO                  |  |
| LUT5     | 12   | LUT                 |  |

|      |    |       |
|------|----|-------|
| LUT3 | 12 | LUT   |
| LUT1 | 2  | LUT   |
| BUFG | 1  | Clock |

---

## 8. Black Boxes

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## 9. Instantiated Netlists

---

|                 |
|-----------------|
| +-----+-----+   |
| Ref Name   Used |
| +-----+-----+   |

## Timing estimation report (4x4 matrices)

### Design Timing Summary

| Setup                        | Hold     | Pulse Width                  |          |                                          |          |
|------------------------------|----------|------------------------------|----------|------------------------------------------|----------|
| Worst Negative Slack (WNS):  | 7.898 ns | Worst Hold Slack (WHS):      | 0.170 ns | Worst Pulse Width Slack (WPWS):          | 4.500 ns |
| Total Negative Slack (TNS):  | 0.000 ns | Total Hold Slack (THS):      | 0.000 ns | Total Pulse Width Negative Slack (TPWS): | 0.000 ns |
| Number of Failing Endpoints: | 0        | Number of Failing Endpoints: | 0        | Number of Failing Endpoints:             | 0        |
| Total Number of Endpoints:   | 98       | Total Number of Endpoints:   | 98       | Total Number of Endpoints:               | 116      |

All user specified timing constraints are met.

## Power estimation report (4x4 matrices)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                |
|-------------------------------------|----------------|
| Total On-Chip Power:                | 0.113 W        |
| Design Power Budget:                | Not Specified  |
| Power Budget Margin:                | N/A            |
| Junction Temperature:               | 26.3°C         |
| Thermal Margin:                     | 73.7°C (6.2 W) |
| Effective 9JA:                      | 11.5°C/W       |
| Power supplied to off-chip devices: | 0 W            |
| Confidence level:                   | Low            |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Parallel Multiply-Add Trees (4x4 matrices)

We can implement approximately **43** multiply-add trees in parallel.

The code for this can be found in `ParaMultandAddTree.sv` in the [GitHub repo](#).

Here is the **resource utilization report summary** for this:



## Synthesized Schematic (8x8 matrices)

(Inserted in the next page - for brevity, only one multiplier module and one adder module expanded)



# Synthesis Reports (8x8 matrices)

## Resource utilization report (8x8 matrices)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
-----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 22:11:02 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file MulandAddTree_utilization_synth.rpt -pb  
MulandAddTree_utilization_synth.pb  
| Design      : MulandAddTree  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
-----
```

### Utilization Design Information

#### Table of Contents

- ```
-----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists  
-----  
1. Slice Logic  
-----
```

| Site Type              | Used       | Fixed    | Available    | Util%       |
|------------------------|------------|----------|--------------|-------------|
| <b>Slice LUTs*</b>     | <b>683</b> | <b>0</b> | <b>14400</b> | <b>4.74</b> |
| LUT as Logic           | 683        | 0        | 14400        | 4.74        |
| LUT as Memory          | 0          | 0        | 6000         | 0.00        |
| <b>Slice Registers</b> | <b>244</b> | <b>0</b> | <b>28800</b> | <b>0.85</b> |
| Register as Flip Flop  | 244        | 0        | 28800        | 0.85        |
| Register as Latch      | 0          | 0        | 28800        | 0.00        |
| <b>F7 Muxes</b>        | <b>0</b>   | <b>0</b> | <b>8800</b>  | <b>0.00</b> |
| <b>F8 Muxes</b>        | <b>0</b>   | <b>0</b> | <b>4400</b>  | <b>0.00</b> |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

- ```
-----  
1.1 Summary of Registers by Type  
-----
```

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 244   | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%  |
|---------------|------|-------|-----------|--------|
| Bonded IOB    | 147  | 0     | 54        | 272.22 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00   |
| Bonded IOPADs | 0    | 0     | 130       | 0.00   |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00   |
| PHASER_REF    | 0    | 0     | 2         | 0.00   |
| OUT_FIFO      | 0    | 0     | 8         | 0.00   |
| IN_FIFO       | 0    | 0     | 8         | 0.00   |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00   |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT2     | 320  | LUT                 |
| LUT6     | 296  | LUT                 |
| FDRE     | 244  | Flop & Latch        |
| LUT4     | 168  | LUT                 |
| IBUF     | 130  | IO                  |
| CARRY4   | 108  | CarryLogic          |
| LUT5     | 24   | LUT                 |
| LUT3     | 24   | LUT                 |

|      |    |       |
|------|----|-------|
| OBUF | 17 | IO    |
| LUT1 | 3  | LUT   |
| BUFG | 1  | Clock |

---

## 8. Black Boxes

---

| Ref Name | Used |
|----------|------|
|          |      |

## 9. Instantiated Netlists

---

| Ref Name | Used |
|----------|------|
|          |      |

## Timing estimation report (8x8 matrices)

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 7.881 ns | Worst Hold Slack (WHS): 0.170 ns | Worst Pulse Width Slack (WPWS): 4.500 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 227       | Total Number of Endpoints: 227   | Total Number of Endpoints: 245                    |

All user specified timing constraints are met.

## Power estimation report (8x8 matrices)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                |
|-------------------------------------|----------------|
| Total On-Chip Power:                | 0.119 W        |
| Design Power Budget:                | Not Specified  |
| Power Budget Margin:                | N/A            |
| Junction Temperature:               | 26.4°C         |
| Thermal Margin:                     | 73.6°C (6.2 W) |
| Effective 9JA:                      | 11.5°C/W       |
| Power supplied to off-chip devices: | 0 W            |
| Confidence level:                   | Low            |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Synthesized Schematic (16x16 matrices)

(Inserted in the next page - for brevity, only one multiplier module and one adder module expanded)



# Synthesis Reports (16x16 matrices)

## Resource utilization report (16x16 matrices)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
-----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 22:22:01 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file MulandAddTree_utilization_synth.rpt -pb  
MulandAddTree_utilization_synth.pb  
| Design      : MulandAddTree  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
-----  
-----
```

### Utilization Design Information

#### Table of Contents

- ```
-----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists  
-----  
1. Slice Logic  
-----
```

| Site Type              | Used        | Fixed | Available | Util% |
|------------------------|-------------|-------|-----------|-------|
| <b>Slice LUTs*</b>     | <b>1380</b> | 0     | 14400     | 9.58  |
| LUT as Logic           | 1380        | 0     | 14400     | 9.58  |
| LUT as Memory          | 0           | 0     | 6000      | 0.00  |
| <b>Slice Registers</b> | <b>501</b>  | 0     | 28800     | 1.74  |
| Register as Flip Flop  | 501         | 0     | 28800     | 1.74  |
| Register as Latch      | 0           | 0     | 28800     | 0.00  |
| <b>F7 Muxes</b>        | <b>0</b>    | 0     | 8800      | 0.00  |
| <b>F8 Muxes</b>        | <b>0</b>    | 0     | 4400      | 0.00  |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

- ```
-----  
1.1 Summary of Registers by Type  
-----
```

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 501   | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%  |
|---------------|------|-------|-----------|--------|
| Bonded IOB    | 275  | 0     | 54        | 509.26 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00   |
| Bonded IOPADs | 0    | 0     | 130       | 0.00   |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00   |
| PHASER_REF    | 0    | 0     | 2         | 0.00   |
| OUT_FIFO      | 0    | 0     | 8         | 0.00   |
| IN_FIFO       | 0    | 0     | 8         | 0.00   |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00   |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site Type  | Used | Fixed | Available | Util% |
|------------|------|-------|-----------|-------|
| BUFGCTRL   | 1    | 0     | 32        | 3.13  |
| BUFIO      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  | 0    | 0     | 2         | 0.00  |
| BUFMRCE    | 0    | 0     | 4         | 0.00  |
| BUFHCE     | 0    | 0     | 48        | 0.00  |
| BUFR       | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site Type   | Used | Fixed | Available | Util% |
|-------------|------|-------|-----------|-------|
| BSCANE2     | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   | 0    | 0     | 1         | 0.00  |
| DNA_PORT    | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 | 0    | 0     | 1         | 0.00  |
| ICAPE2      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   | 0    | 0     | 1         | 0.00  |
| XADC        | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT2     | 656  | LUT                 |
| LUT6     | 592  | LUT                 |
| FDRE     | 501  | Flop & Latch        |
| LUT4     | 336  | LUT                 |
| IBUF     | 258  | IO                  |
| CARRY4   | 220  | CarryLogic          |
| LUT5     | 48   | LUT                 |
| LUT3     | 48   | LUT                 |

|      |    |       |
|------|----|-------|
| OBUF | 17 | IO    |
| LUT1 | 4  | LUT   |
| BUFG | 1  | Clock |

---

## 8. Black Boxes

---

| Ref Name | Used |
|----------|------|
|          |      |

## 9. Instantiated Netlists

---

| Ref Name | Used |
|----------|------|
|          |      |

## Timing estimation report (16x16 matrices)

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 7.864 ns | Worst Hold Slack (WHS): 0.170 ns | Worst Pulse Width Slack (WPWS): 4.500 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 484       | Total Number of Endpoints: 484   | Total Number of Endpoints: 502                    |

All user specified timing constraints are met.

## Power estimation report (16x16 matrices)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                      |
|-------------------------------------|----------------------|
| <b>Total On-Chip Power:</b>         | <b>0.129 W</b>       |
| <b>Design Power Budget:</b>         | <b>Not Specified</b> |
| <b>Power Budget Margin:</b>         | <b>N/A</b>           |
| <b>Junction Temperature:</b>        | <b>26.5°C</b>        |
| Thermal Margin:                     | 73.5°C (6.2 W)       |
| Effective θ <sub>JA</sub> :         | 11.5°C/W             |
| Power supplied to off-chip devices: | 0 W                  |
| Confidence level:                   | Low                  |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



## Synthesized Schematic (32x32 matrices)

(Inserted in the next page - for brevity, only one multiplier module and one adder module expanded)



# Synthesis Reports (32x32 matrices)

## Resource utilization report (32x32 matrices)

Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

```
-----  
| Tool Version : Vivado v.2019.2 (win64) Build 2708876 Wed Nov  6 21:40:23 MST 2019  
| Date        : Sat Mar  7 22:29:17 2020  
| Host        : urumi running 64-bit major release (build 9200)  
| Command     : report_utilization -file MulandAddTree_utilization_synth.rpt -pb  
MulandAddTree_utilization_synth.pb  
| Design      : MulandAddTree  
| Device      : 7z007sclg225-2  
| Design State: Synthesized  
-----  
-----
```

### Utilization Design Information

#### Table of Contents

- ```
-----  
1. Slice Logic  
1.1 Summary of Registers by Type  
2. Memory  
3. DSP  
4. IO and GT Specific  
5. Clocking  
6. Specific Feature  
7. Primitives  
8. Black Boxes  
9. Instantiated Netlists  
-----  
1. Slice Logic  
-----
```

| Site Type              | Used        | Fixed    | Available    | Util%        |
|------------------------|-------------|----------|--------------|--------------|
| <b>Slice LUTs*</b>     | <b>2773</b> | <b>0</b> | <b>14400</b> | <b>19.26</b> |
| LUT as Logic           | 2773        | 0        | 14400        | 19.26        |
| LUT as Memory          | 0           | 0        | 6000         | 0.00         |
| <b>Slice Registers</b> | <b>1014</b> | <b>0</b> | <b>28800</b> | <b>3.52</b>  |
| Register as Flip Flop  | 1014        | 0        | 28800        | 3.52         |
| Register as Latch      | 0           | 0        | 28800        | 0.00         |
| <b>F7 Muxes</b>        | <b>0</b>    | <b>0</b> | <b>8800</b>  | <b>0.00</b>  |
| <b>F8 Muxes</b>        | <b>0</b>    | <b>0</b> | <b>4400</b>  | <b>0.00</b>  |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

- ```
-----  
1.1 Summary of Registers by Type  
-----
```

| Total | Clock Enable | Synchronous | Asynchronous |       |
|-------|--------------|-------------|--------------|-------|
| 0     | -            | -           | -            |       |
| 0     | -            | -           | -            | Set   |
| 0     | -            | -           | -            | Reset |
| 0     | -            | Set         | -            |       |
| 0     | -            | Reset       | -            |       |
| 0     | Yes          | -           | -            |       |
| 0     | Yes          | -           | -            | Set   |
| 0     | Yes          | -           | -            | Reset |
| 0     | Yes          | Set         | -            |       |
| 1014  | Yes          | Reset       | -            |       |

## 2. Memory

| Site Type      | Used | Fixed | Available | Util% |
|----------------|------|-------|-----------|-------|
| Block RAM Tile | 0    | 0     | 50        | 0.00  |
| RAMB36/FIFO*   | 0    | 0     | 50        | 0.00  |
| RAMB18         | 0    | 0     | 100       | 0.00  |

\* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if a FIFO18E1 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E1

## 3. DSP

| Site Type | Used | Fixed | Available | Util% |
|-----------|------|-------|-----------|-------|
| DSPs      | 0    | 0     | 66        | 0.00  |

## 4. IO and GT Specific

| Site Type     | Used | Fixed | Available | Util%  |
|---------------|------|-------|-----------|--------|
| Bonded IOB    | 531  | 0     | 54        | 983.33 |
| Bonded IPADs  | 0    | 0     | 2         | 0.00   |
| Bonded IOPADs | 0    | 0     | 130       | 0.00   |
| PHY_CONTROL   | 0    | 0     | 2         | 0.00   |
| PHASER_REF    | 0    | 0     | 2         | 0.00   |
| OUT_FIFO      | 0    | 0     | 8         | 0.00   |
| IN_FIFO       | 0    | 0     | 8         | 0.00   |
| IDELAYCTRL    | 0    | 0     | 2         | 0.00   |

|                             |   |   |     |      |
|-----------------------------|---|---|-----|------|
| IBUFDS                      | 0 | 0 | 54  | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY   | 0 | 0 | 8   | 0.00 |
| PHASER_IN/PHASER_IN_PHY     | 0 | 0 | 8   | 0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 100 | 0.00 |
| ILOGIC                      | 0 | 0 | 54  | 0.00 |
| OLOGIC                      | 0 | 0 | 54  | 0.00 |

## 5. Clocking

| Site       | Type | Used | Fixed | Available | Util% |
|------------|------|------|-------|-----------|-------|
| BUFGCTRL   |      | 1    | 0     | 32        | 3.13  |
| BUFIO      |      | 0    | 0     | 8         | 0.00  |
| MMCME2_ADV |      | 0    | 0     | 2         | 0.00  |
| PLLE2_ADV  |      | 0    | 0     | 2         | 0.00  |
| BUFMRCE    |      | 0    | 0     | 4         | 0.00  |
| BUFHCE     |      | 0    | 0     | 48        | 0.00  |
| BUFR       |      | 0    | 0     | 8         | 0.00  |

## 6. Specific Feature

| Site        | Type | Used | Fixed | Available | Util% |
|-------------|------|------|-------|-----------|-------|
| BSCANE2     |      | 0    | 0     | 4         | 0.00  |
| CAPTUREE2   |      | 0    | 0     | 1         | 0.00  |
| DNA_PORT    |      | 0    | 0     | 1         | 0.00  |
| EFUSE_USR   |      | 0    | 0     | 1         | 0.00  |
| FRAME_ECCE2 |      | 0    | 0     | 1         | 0.00  |
| ICAPE2      |      | 0    | 0     | 2         | 0.00  |
| STARTUPE2   |      | 0    | 0     | 1         | 0.00  |
| XADC        |      | 0    | 0     | 1         | 0.00  |

## 7. Primitives

| Ref Name | Used | Functional Category |
|----------|------|---------------------|
| LUT2     | 1328 | LUT                 |
| LUT6     | 1184 | LUT                 |
| FDRE     | 1014 | Flop & Latch        |
| LUT4     | 672  | LUT                 |
| IBUF     | 514  | IO                  |
| CARRY4   | 444  | CarryLogic          |
| LUT5     | 96   | LUT                 |
| LUT3     | 96   | LUT                 |

|      |    |       |
|------|----|-------|
| OBUF | 17 | IO    |
| LUT1 | 5  | LUT   |
| BUFG | 1  | Clock |

---

## 8. Black Boxes

---

| Ref Name | Used |
|----------|------|
|          |      |

## 9. Instantiated Netlists

---

| Ref Name | Used |
|----------|------|
|          |      |

## Timing estimation report (32x32 matrices)

### Design Timing Summary

| Setup                                | Hold                             | Pulse Width                                       |
|--------------------------------------|----------------------------------|---------------------------------------------------|
| Worst Negative Slack (WNS): 7.847 ns | Worst Hold Slack (WHS): 0.170 ns | Worst Pulse Width Slack (WPWS): 4.500 ns          |
| Total Negative Slack (TNS): 0.000 ns | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): 0.000 ns |
| Number of Failing Endpoints: 0       | Number of Failing Endpoints: 0   | Number of Failing Endpoints: 0                    |
| Total Number of Endpoints: 997       | Total Number of Endpoints: 997   | Total Number of Endpoints: 1015                   |

All user specified timing constraints are met.

## Power estimation report (32x32 matrices)

### Summary

Power estimation from Synthesized netlist. Activity derived from constraints files, simulation files or vectorless analysis. Note: these early estimates can change after implementation.

|                                     |                |
|-------------------------------------|----------------|
| Total On-Chip Power:                | 0.155 W        |
| Design Power Budget:                | Not Specified  |
| Power Budget Margin:                | N/A            |
| Junction Temperature:               | 26.8°C         |
| Thermal Margin:                     | 73.2°C (6.1 W) |
| Effective θ <sub>JA</sub> :         | 11.5°C/W       |
| Power supplied to off-chip devices: | 0 W            |
| Confidence level:                   | Low            |

[Launch Power Constraint Advisor](#) to find and fix invalid switching activity

### On-Chip Power



# APPENDIX 1: Sort - synthesized schematic (64 elements)

(The following 11 pages are the schematic)























## APPENDIX 2: Sort - synthesized schematic (128 elements)

(The following 21 pages are the schematic)













































