

# Design and Implementation of a 64-Tap 16-Bit FIR Filter with Dual-Clock Domain Crossing

By: Albert Wang (aw3741)

# Top Level Architecture in Testbench



## Input:

**clk1** - Input sampling clock (10 kHz) **clk2** - Processing clock (100 MHz)

**rsth** - Active-low async reset

**din[15:0]** - Input data samples (Q1.15 signed)  
**valid\_in** - Input data valid

**cin[15:0]** - Coefficient data (Q1.15 signed)  
**caddr[5:0]** - Coefficient address (0-63)

**load** - Coefficient load enable

## Output:

**dout[15:0]** - Output data (Q7.9 signed)

**valid\_out** - Output data valid

## Internal (Simplified):

**fifo\_dout[15:0]** - FIFO → REGFILE

**coeff\_out[15:0]** - CMEM → ALU

**coeff\_raddr[5:0]** - FSM → CMEM

**reg\_dout[15:0]** - REGFILE → ALU

**reg\_raddr[5:0]** - FSM → REGFILE

**reg\_waddr[5:0]** - FSM → REGFILE

**alu\_result[15:0]** - ALU → FSM

# Methods and Results

## Methods

- **Input:** Randomly generated 10,000 16-bit Q1.15 real-valued numbers
- **Coefficients:** Randomly generated 16-bit Q1.15 real-valued numbers
- **RTL coding from scratch:** Yes
- **Throughput:** Measured with PrimeTime using DC-generated SDF annotation
- **Maximum clock frequency:** Determined from PrimeTime timing analysis with DC-generated SDF annotation
- **Energy efficiency:** PrimeTime with DC-generated SDF and QuestaSim-generated VCD annotations
- **Area:** From Design Compiler synthesis report
- **Accuracy:** NRMSE compared to MATLAB results in 32-bit floating-point numbers

## Results

- **Throughput:** 10 kS/s
- **Maximum clock frequency:** 100 MHz
- **Energy efficiency:** 338.4 pJ/sample (0.034 mW/MHz)
- **Area:** 0.0466 mm<sup>2</sup> (46,551 μm<sup>2</sup>)
- **Accuracy NRMSE:**
  - Worst case: 0.012%
  - Average: 0.003%