

# CE387: Real-Time Digital Systems Design and Verification with FPGAs

Professor David Zaretsky

## Assignment 6

### Cordic Sin & Cos

Matias Ketema

tnc5178

## Simulation Results

- **Clock cycle count:** 841.5 cycles (got from Time(ns)/10ns)
- **Errors reported:** 0

## Synthesis Results

- **Maximum frequency:** The synthesis report estimates a maximum operating frequency of 64.8 MHz.
- **Registers /LUTs/Logic Elements:** The design utilizes a total of 1288 registers on the device. It also consumes 2024 combinational functions (Logic Elements/LUTs).
- **Memory utilization:** The design uses 512 memory bits (Total ESB).
- **Multipliers (DSPs):** No Digital Signal Processing blocks implemented
- **Worst path (timing analysis):** The worst path has a negative slack of -2.313 nanoseconds with a total propagation delay of 13.11 ns. This critical timing path starts at the input fifo buffer and ends at the z\_out of the last pipeline stage.
- **Schematic architecture (RTL):** This is a streaming cordic architecture. The top module wraps a central processing core cordic with an input fifo and 2 output fifo's. The cordic instance consists of a 16-stage pipeline.
- **Performance / Speedup:** Performance is increased by the pipeline. It would be 16x speedup but we must take into account the fill time of the pipeline and that having the pipeline complicates the data paths and the clock frequency is less.
- **Throughput (samples-per-second):** 100000000 samples/sec
- **Include pictures and description of your architecture from Synplify Premier:** Pictures at bottom of report. Input comes from a 32-bit wide fifo for radians and is processed through the 16 stage pipelined cordic instance and two output fifos are produced each 16 wide. One for cos and one for sin.

This project implements a 16-stage pipelined CORDIC algorithm in SystemVerilog that computes sine and cosine values for any input angle. The design was built to match the structure of the provided grayscale reference project, reusing its FIFO module, UVM testbench architecture, and simulation scripts. The system was verified using the Universal Verification Methodology to confirm that the RTL output is bit-true accurate against a C software reference model, and to measure the quantization error of the fixed-point results compared to ideal floating-point sine and cosine.

The software model is a C program that implements CORDIC using 16-bit signed fixed-point arithmetic with 14 fractional bits in Q1.14 format. It sweeps theta from -360 to +360 degrees in 1-degree steps, giving 721 total samples. For each angle it quantizes the radian value to a 32-bit integer, runs 16 CORDIC iterations, and writes the results to three hex reference files: rad.txt, sin.txt, and cos.txt. These files serve as the golden reference for RTL verification.

The RTL design consists of three modules. The cordic\_stage module implements a single CORDIC micro-rotation with a pipeline register gated by an enable signal. The cordic module does combinational range reduction to bring the input angle into [-pi/2, pi/2], then instantiates 16 cordic\_stage components using a generate-for loop. The inter-stage wiring uses packed arrays in the style of mult\_pipe.sv, declared as logic signed [0:STAGES] [15:0] for x, y, z, and valid. A pipeline stall signal holds all stages when valid output data is present but the output FIFOs are full, preventing data loss for in-flight values. The top-level wrapper cordic\_top connects the core between a 32-bit input FIFO and two 16-bit output FIFOs for sine and cosine, all reused from the grayscale project.

The UVM testbench follows the grayscale architecture. The sequence reads quantized radian values directly from rad.txt to avoid any float precision mismatches. The output monitor reads sine and cosine from the RTL FIFOs and writes them to output files for visual comparison. The compare monitor reads expected values from the C reference files in lockstep. The scoreboard performs two checks: bit-true comparison of every RTL sample against the C reference, and quantization precision analysis by comparing the fixed-point output against ideal floating-point sin and cos. It reports maximum and average error for both outputs along with throughput at 100MHz.

The main challenge during development was a bug with the signing from using packed arrays. When accessing individual slices of a packed array like x[k], SystemVerilog returns an unsigned value even if the array is declared signed. This caused arithmetic right shifts to zero-fill instead of sign-extend, corrupting every CORDIC stage. The fix was declaring the cordic\_stage ports as input logic signed [15:0], which reinterprets the slices correctly at the module boundary. A second issue was the output monitor running in an unbounded forever loop, which caused it to read stale FIFO data during the UVM drain time and produce repeated values at the end of the output files. Bounding it to exactly 721 samples fixed the alignment.

The simulation passes with zero bit-true mismatches across all 721 samples for both sine and cosine. The Q1.14 fixed-point format has a quantization step of approximately 0.0000610352, and the maximum error versus ideal floating-point is on the order of 0.001. The pipelined design has a 16-cycle latency but produces one sample per clock after filling, giving a throughput of 100 million samples per second at 100MHz.

## Simulation Proof:



## Synthesis Screenshot

| Project Settings      |        |                                  |                                      |  |  |  |  |
|-----------------------|--------|----------------------------------|--------------------------------------|--|--|--|--|
| Project Name          | cordic | Device Name                      | rev_1: Intel CYCLONE IV E : EP4CE115 |  |  |  |  |
| Implementation Name   | rev_1  | Top Module                       | cordic_top                           |  |  |  |  |
| Pipelining            | 1      | Retiming                         | 0                                    |  |  |  |  |
| Resource Sharing      | 1      | Fanout Guide                     | 30                                   |  |  |  |  |
| Disable I/O Insertion | 0      | Disable Sequential Optimizations | 0                                    |  |  |  |  |
| Clock Conversion      | 1      | FSM Compiler                     | 1                                    |  |  |  |  |

  

| Run Status                      |             |                     |                    |   |          |           |        |                  |
|---------------------------------|-------------|---------------------|--------------------|---|----------|-----------|--------|------------------|
| Job Name                        | Status      |                     |                    |   | CPU Time | Real Time | Memory | Date/Time        |
| Compile Input (compiler)        | out-of-date | <a href="#">54</a>  | 0                  | 0 | -        | 00m:03s   | -      | 2/20/26 10:19 PM |
| <a href="#">Detailed report</a> |             |                     |                    |   |          |           |        |                  |
| Premap (premap)                 | Complete    | <a href="#">9</a>   | <a href="#">2</a>  | 0 | 0m:00s   | 0m:00s    | 121MB  | 2/20/26 10:19 PM |
| <a href="#">Detailed report</a> |             |                     |                    |   |          |           |        |                  |
| Map & Optimize (fpga_mapper)    | Complete    | <a href="#">108</a> | <a href="#">32</a> | 0 | 0m:05s   | 0m:06s    | 179MB  | 2/20/26 10:19 PM |
| <a href="#">Detailed report</a> |             |                     |                    |   |          |           |        |                  |

  

| Area Summary                                  |         |                                          |      |  |  |  |  |
|-----------------------------------------------|---------|------------------------------------------|------|--|--|--|--|
| LUTs for combinational functions (total_luts) | 2024    | Non I/O Registers (non_io_reg)           | 1288 |  |  |  |  |
| I/O Pins                                      | 72      | I/O registers (total_io_reg)             | 0    |  |  |  |  |
| DSP Blocks (dsp_used)                         | 0 (266) | Memory Bits                              | 512  |  |  |  |  |
| <a href="#">Detailed report</a>               |         | <a href="#">Hierarchical Area report</a> |      |  |  |  |  |

  

| Timing Summary                  |                     |                                    |               |  |  |  |  |
|---------------------------------|---------------------|------------------------------------|---------------|--|--|--|--|
| Clock Name (clock_name)         | Req Freq (req_freq) | Est Freq (est_freq)                | Slack (slack) |  |  |  |  |
| cordic_top clock                | 76.3 MHz            | 64.8 MHz                           | -2.313        |  |  |  |  |
| <a href="#">Detailed report</a> |                     | <a href="#">Timing Report View</a> |               |  |  |  |  |

# Hierarchical Area Report:

| Module name       | ATOMS | ARITHMETIC MOD | REGISTERS | SYNC RAMS | MACs |
|-------------------|-------|----------------|-----------|-----------|------|
| cordic_top        | 692   | 0              | 1288      | 1         | 0    |
| cordic            | 286   | 0              | 711       | 0         | 0    |
| fifo_16s_16s_5s   | 193   | 0              | 283       | 0         | 0    |
| fifo_16s_16s_5s_0 | 194   | 0              | 283       | 0         | 0    |
| fifo_32s_16s_5s   | 18    | 0              | 11        | 1         | 0    |

# RTL Hierarchical:



# Technology Hierarchical:



## Cordic Pipeline View:

