

# Experiment-8

Group:

Aryan Jain:EE22BTECH11011  
Avanigadda Chhatrapati:EE22BTECH11012  
Banoth Premsagar:EE22BTECH11013

In this lab assignment we were tasked to implement  $8 \times 8$  matrix inversion with Cholesky based inversion.

Given clock: 491.52MHz

The existing IP does 16 matrix inversions in  $\sim 0.78\mu s$ , giving a throughput of  $\sim 20.5$  million matrix inversions per instance.

This 20.5 million matrix inversion per instance is given as our target for the verilog code.

Time per inversion =  $0.78\mu s / 16 = 0.04875\mu s$  per inversion

Required Latency = clock frequency/target throughput

$$= 491.52\text{MHz} / 20.5 \text{ million } s^{-1}$$

$$= 23.97$$

$\sim 24$  cycles per inversion

Throughput achieved = Fmax/Latency

Here Fmax we are getting as:

|   | Fmax       | Restricted Fmax | Clock Name | Note                                         |
|---|------------|-----------------|------------|----------------------------------------------|
| 1 | 751.31 MHz | 583.43 MHz      | clk        | limit due to high minimum pulse width violat |

Therefore Throughput = Fmax/Latency

$$= 751.31\text{MHz} / 24$$

$$= 31.3 \text{ million } s^{-1} > 20.5 \text{ million } s^{-1}$$

So the target was reached.

Verilog explanation:

Our verilog code is divided into many parts like:

matrix\_inv.v : Here input is taken and it is passed through all of the remaining stages of cholesky based matrix inversion.

cholesky\_stage.v : The input matrix is divided into a lower triangular matrix.

sqrt\_nr.v : Here we find the square root value using the Newton raphson method, which is afterwards used in cholesky\_stage.

lower\_inverse\_stage.v : Here we find the inverse of the lower triangular matrix which we got from the cholesky stage.

`transpose_stage.v` : Here we get the transpose of the lower triangular matrix inverse which we got from `lower_inverse_stage`.  
`matrix_mult_stage.v` : Now we multiply the transpose matrix which we got from `transpose_stage` and inverse matrix we got from `lower_inverse_stage` to get the inverse of the input matrix.

The output of the `matrix_mult_stage` is the inverse matrix of the given input matrix.

**Note:** Here in the above code we are giving input as Real part of matrix input:

```
1 2 3 4 5 6 7 8  
9 10 11 12 13 14 15 16  
17 18 19 20 21 22 23 24  
25 26 27 28 29 30 31 32  
33 34 35 36 37 38 39 40  
41 42 43 44 45 46 47 48  
49 50 51 52 53 54 55 56  
57 58 59 60 61 62 63 64
```

**Complex part of matrix input:**

```
101 102 103 104 105 106 107 108  
109 110 111 112 113 114 115 116  
...  
161 162 163 164
```

After running verilog code, the utilization report:

| Flow Summary                    |                                                 |
|---------------------------------|-------------------------------------------------|
|                                 | <<Filter>>                                      |
| Flow Status                     | Successful - Sat Nov 15 21:08:54 2025           |
| Quartus Prime Version           | 24.1std.0 Build 1077 03/04/2025 SC Lite Edition |
| Revision Name                   | matrix_inversion                                |
| Top-level Entity Name           | matrix_inv                                      |
| Family                          | Cyclone V                                       |
| Device                          | 5CGTFD9E5F35I7                                  |
| Timing Models                   | Final                                           |
| Logic utilization (in ALMs)     | 5 / 113,560 (< 1 %)                             |
| Total registers                 | 17                                              |
| Total pins                      | 4 / 616 (< 1 %)                                 |
| Total virtual pins              | 0                                               |
| Total block memory bits         | 0 / 12,492,800 (0 %)                            |
| Total DSP Blocks                | 0 / 342 (0 %)                                   |
| Total HSSI RX PCSs              | 0 / 12 (0 %)                                    |
| Total HSSI PMA RX Deserializers | 0 / 12 (0 %)                                    |
| Total HSSI TX PCSs              | 0 / 12 (0 %)                                    |
| Total HSSI PMA TX Serializers   | 0 / 12 (0 %)                                    |
| Total PLLs                      | 0 / 20 (0 %)                                    |
| Total DLLs                      | 0 / 4 (0 %)                                     |