



For the SI Clash Digital Hackathon

Organized by IEEE SSCS AUSC

# **Digital Design and Verification of a Digital Front End (DFE) Filter Array**

By:

Amira Atef Ismaeil Elkomy

Hazem Saber Hanafi

Mustafa Tamer Mansour EL-Sherif

Omar Ayoub

Shahd Ismail Elmasry



## Table of Contents

|        |                                                          |    |
|--------|----------------------------------------------------------|----|
| 1.     | Project Overview.....                                    | 8  |
| 1.1.   | Motivation and Objectives .....                          | 8  |
| 1.2.   | Key Design Features .....                                | 9  |
| 1.3.   | Achieved Results and Contributions .....                 | 9  |
| 1.4.   | Report Organization .....                                | 10 |
| 2.     | Introduction and Background.....                         | 10 |
| 2.1.   | Context: Digital Front End Processing in RF Systems..... | 10 |
| 2.2.   | Problem Definition and System-Level Requirements.....    | 11 |
| 2.3.   | Motivation and Design Challenges .....                   | 12 |
| 2.4.   | Literature and Standards Review.....                     | 13 |
| 3.     | System Architecture Overview .....                       | 13 |
| 3.1.   | High-Level Block Diagram of the DFE.....                 | 13 |
| 3.2.   | Data Flow and Sample Rate Relationships .....            | 14 |
| 3.3.   | Signal Path Description.....                             | 14 |
| 3.4.   | Control and Observability Framework .....                | 15 |
| 3.5.   | Streaming and Latency Architecture .....                 | 16 |
| 4.     | Design Specifications.....                               | 17 |
| 4.1.   | Numerical and Performance Requirements.....              | 17 |
| 4.1.1. | Stage-Level Design Targets .....                         | 17 |
| 4.2.   | Interface and Data Format Specification .....            | 18 |
| 4.3.   | Data Precision and Fixed-Point Format Decisions .....    | 19 |
| 4.3.1. | Fixed-Point Representation Strategy .....                | 19 |
| 4.3.2. | Stage-Wise Precision Allocation.....                     | 19 |
| 4.3.3. | Design Trade-Offs and FPGA Implications.....             | 20 |
| 5.     | Fractional Polyphase Decimator (9MHz → 6MHz) .....       | 20 |
| 5.1.   | Design Theory and Mathematical Formulation.....          | 20 |
| 5.1.1. | Theoretical Foundation .....                             | 20 |
| 5.1.2. | Spectral Constraints.....                                | 20 |
| 5.1.3. | Ideal Response.....                                      | 21 |
| 5.2.   | Polyphase Structure and Filter Bank Decomposition.....   | 21 |
| 5.2.1. | Polyphase Concept .....                                  | 21 |
| 5.2.2. | Implementation Steps .....                               | 21 |
| 5.2.3. | Computational Efficiency.....                            | 22 |
| 5.3.   | Coefficient Design and Optimization (80dB Stopband)..... | 22 |



|        |                                                            |    |
|--------|------------------------------------------------------------|----|
| 5.3.1. | Filter Design Methodology .....                            | 22 |
| 5.3.2. | Coefficients Quantization .....                            | 22 |
| 5.3.3. | Optimization Considerations.....                           | 23 |
| 5.4.   | Implementation Architecture .....                          | 23 |
| 5.5.   | Fixed-Point Quantization and Word Length Analysis .....    | 25 |
| 5.6.   | Verification and Frequency Response Analysis .....         | 26 |
| 6.     | Dual Biquad Notch Filters (2.4 MHz and 5.0 MHz).....       | 29 |
| 6.1.   | Notch Filter Design Methodology .....                      | 29 |
| 6.2.   | Pole-Zero Placement and Coefficient Computation .....      | 30 |
| 6.3.   | Stability and Quantization Effects .....                   | 30 |
| 6.4.   | Implementation Using Direct Form II Transposed .....       | 31 |
| 6.5.   | Simulation Results and Performance Validation .....        | 33 |
| 6.6.   | Frequency and Time Domain Analysis.....                    | 33 |
| 7.     | Configurable CIC Decimation Chain .....                    | 36 |
| 7.1.   | CIC Theory and Design Equations .....                      | 37 |
| 7.2.   | Decimation Factor Control (1, 2, 4, 8, 16) .....           | 37 |
| 7.3.   | Word Growth and Overflow Protection.....                   | 38 |
| 7.4.   | Implementation Details and Optimization .....              | 38 |
| 7.5.   | Performance and Ripple Correction Validation.....          | 41 |
| 8.     | Fixed-Point Design and Precision Analysis.....             | 44 |
| 8.1.   | Quantization Strategy and Rounding/Saturation Schemes..... | 45 |
| 8.1.1. | Rounding Logic .....                                       | 46 |
| 8.1.2. | Saturation Behaviour .....                                 | 46 |
| 8.2.   | SNR and Dynamic Range Estimation .....                     | 47 |
| 8.3.   | Accumulator Width Computation .....                        | 47 |
| 8.4.   | Numerical Stability .....                                  | 48 |
| 9.     | Control and Observability Framework .....                  | 48 |
| 9.1.   | Block Diagram .....                                        | 48 |
| 9.2.   | Block Description .....                                    | 48 |
| 9.3.   | APB Operations.....                                        | 50 |
| 9.4.   | Register Map and Control Interface (APB Bus) .....         | 52 |
| 10.    | Verification and Validation .....                          | 52 |
| 10.1.  | Verification Methodology Overview.....                     | 52 |
| 10.2.  | Verification and Quality Assurance.....                    | 53 |
| 10.3.  | Testbench Architecture.....                                | 54 |



|         |                                                      |     |
|---------|------------------------------------------------------|-----|
| 10.4.   | Code Quality and Standards Compliance .....          | 54  |
| 10.4.1. | Compliance Scope.....                                | 54  |
| 11.     | MATLAB Reference Model and Golden Verification ..... | 56  |
| 11.1.   | Overview .....                                       | 56  |
| 11.2.   | Fixed-Point Arithmetic Models .....                  | 60  |
| 11.2.1. | Automated Test Vector Generation .....               | 60  |
| 11.3.   | Test Signal Generation.....                          | 61  |
| 11.4.   | Output Formats.....                                  | 61  |
| 11.5.   | Python Fixed-Point Conversion Utility .....          | 61  |
| 11.5.1. | Key Features.....                                    | 61  |
| 11.5.2. | Floating-Point vs Fixed-Point Validation .....       | 62  |
| 11.5.3. | Cross-Platform Validation.....                       | 62  |
| 12.     | Python Verification Framework .....                  | 62  |
| 12.1.   | Overview .....                                       | 62  |
| 12.2.   | Binary Configuration Protocol .....                  | 63  |
| 12.3.   | Test Case Library .....                              | 63  |
| 12.4.   | Cross-Validation Methodology .....                   | 91  |
| 13.     | Comprehensive Testbench Architecture .....           | 91  |
| 13.1.   | SystemVerilog Testbench (DFE_tb.sv).....             | 91  |
| 13.1.1. | Key Features.....                                    | 91  |
| 14.     | Unit-Testing Framework .....                         | 93  |
| 14.1.   | CIC Decimator Unit Tests .....                       | 93  |
| 14.2.   | Fractional Decimator Unit Tests.....                 | 94  |
| 14.3.   | IIR Notch Filter Unit Tests .....                    | 94  |
| 14.4.   | APB Interface Unit Tests.....                        | 95  |
| 14.5.   | Clock Converter Unit Tests .....                     | 95  |
| 14.6.   | Up/Down Sampler Unit Tests .....                     | 96  |
| 14.7.   | Top-Level Integration Tests .....                    | 96  |
| 14.     | FPGA Prototyping and Validation .....                | 102 |
| 14.1.   | FPGA Design Objectives and Benefits.....             | 102 |
| 14.2.   | Target Platform.....                                 | 102 |
| 14.3.   | RTL Elaboration .....                                | 103 |
| 14.4.   | RTL Synthesis .....                                  | 104 |
| 14.5.   | RTL Implementation.....                              | 105 |
| 15.     | Risk Analysis and Design Mitigations .....           | 107 |



|       |                                                     |     |
|-------|-----------------------------------------------------|-----|
| 15.1. | Identified Design Risks.....                        | 107 |
| 15.2. | Quantization and Stability Concerns .....           | 107 |
| 15.3. | Resource and Timing Trade-offs .....                | 108 |
| 15.4. | Implemented Mitigation Strategies .....             | 108 |
| 16.   | Conclusions and Future Work.....                    | 108 |
| 16.1. | Summary of Achievements.....                        | 108 |
| 16.2. | System-Level Validation Outcomes .....              | 109 |
| 16.3. | Future Enhancements .....                           | 109 |
| 16.4. | Extensions for SDR/Communication Applications ..... | 109 |
| 17.   | References .....                                    | 109 |
| 18.   | Appendices .....                                    | 110 |
| 18.1. | Filter Coefficients .....                           | 110 |
| 18.2. | IIR Filter Coefficients.....                        | 111 |
| 18.3. | Register Map Specification .....                    | 112 |

## List of Figures

|                                                                                               |    |
|-----------------------------------------------------------------------------------------------|----|
| Figure 1: Digital Fronts Overview .....                                                       | 12 |
| Figure 2: System Level Block Diagram .....                                                    | 14 |
| Figure 3: Implementation Diagram for Fractional Decimation using Polyphase FIR.....           | 21 |
| Figure 4: Fractional Decimator Block Diagram .....                                            | 23 |
| Figure 5: Fractional Decimator Magnitude Response .....                                       | 27 |
| Figure 6: Fractional Decimator Phase Response .....                                           | 27 |
| Figure 7: Fractional Decimator Group Delay .....                                              | 28 |
| Figure 8: Fractional Decimator Passband Ripple.....                                           | 28 |
| Figure 9: Fractional Decimator Stopband Attenuation .....                                     | 29 |
| Figure 10: Implementation Diagram for an IIR Filter using Direct Form II Transposed Form..... | 31 |
| Figure 11: IIR 1MHz Magnitude Response .....                                                  | 34 |
| Figure 12: IIR 1MHz Phase Response.....                                                       | 34 |
| Figure 13: IIR 1MHz Pole (Star)-Zero (Circle) Plot.....                                       | 35 |
| Figure 14: IIR 2.4MHz Magnitude Response .....                                                | 35 |
| Figure 15: IIR 2.4MHz Phase Response.....                                                     | 36 |
| Figure 16: IIR 2.4MHz Pole (Star) - Zero (Circle) Plot.....                                   | 36 |
| Figure 17: Single Stage CIC Filter Implementation for Decimation .....                        | 38 |
| Figure 18: CIC Block Diagram .....                                                            | 39 |
| Figure 19: CIC, with R = 2, Magnitude Response .....                                          | 41 |
| Figure 20: CIC, with R = 2, Phase Response .....                                              | 41 |
| Figure 21: CIC, with R = 4, Magnitude Response .....                                          | 42 |
| Figure 22: CIC, with R = 4, Phase Response .....                                              | 42 |
| Figure 23: CIC, with R = 8, Magnitude Response .....                                          | 43 |
| Figure 24: CIC, with R = 8, Phase Response .....                                              | 43 |
| Figure 25: CIC, with R = 16, Magnitude Response.....                                          | 44 |



|                                                           |    |
|-----------------------------------------------------------|----|
| Figure 26: CIC, with R = 16, Phase Response .....         | 44 |
| Figure 27: Arithmetic Rounding Handle Block Diagram ..... | 45 |
| Figure 28: APB Block Diagram .....                        | 48 |
| Figure 29: APB Bridge Moore State Diagram.....            | 51 |
| Figure 30: Waveform to Show Latency.....                  | 54 |
| Figure 31: Linting Results.....                           | 56 |
| Figure 32: Fractional Decimator .....                     | 57 |
| Figure 33: IIR 1MHz Notch Filter .....                    | 57 |
| Figure 34: IIR 2.4MHz Notch Filter .....                  | 58 |
| Figure 35: CIC with R=2 .....                             | 58 |
| Figure 36: CIC with R=4 .....                             | 59 |
| Figure 37: CIC with R=8 .....                             | 59 |
| Figure 38: CIC with R=16 .....                            | 60 |
| Figure 39: Test Case 1, Fixed Point with CIC R = 2.....   | 64 |
| Figure 40: Test Case 1, Fixed Point with CIC R = 4.....   | 64 |
| Figure 41: Test Case 1, Fixed Point with CIC R = 8.....   | 64 |
| Figure 42: Test Case 1, Fixed Point with CIC R = 16.....  | 65 |
| Figure 43: Test Case 2, Fixed Point with CIC R = 2.....   | 65 |
| Figure 44: Test Case 2, Fixed Point with CIC R = 4.....   | 65 |
| Figure 45: Test Case 2, Fixed Point with CIC R = 8.....   | 66 |
| Figure 46: Test Case 2, Fixed Point with CIC R = 16.....  | 66 |
| Figure 47: Test Case 3, Fixed Point with CIC R = 2.....   | 66 |
| Figure 48: Test Case 3, Fixed Point with CIC R = 4.....   | 67 |
| Figure 49: Test Case 3, Fixed Point with CIC R = 8.....   | 67 |
| Figure 50: Test Case 3, Fixed Point with CIC R = 16.....  | 67 |
| Figure 51: Test Case 4, Fixed Point with CIC R = 2.....   | 68 |
| Figure 52: Test Case 4, Fixed Point with CIC R = 4.....   | 68 |
| Figure 53: Test Case 4, Fixed Point with CIC R = 8.....   | 68 |
| Figure 54: Test Case 4, Fixed Point with CIC R = 16.....  | 69 |
| Figure 55: Test Case 5, Fixed Point with CIC R = 2.....   | 69 |
| Figure 56: Test Case 5, Fixed Point with CIC R = 4.....   | 69 |
| Figure 57: Test Case 5, Fixed Point with CIC R = 8.....   | 70 |
| Figure 58: Test Case 5, Fixed Point with CIC R = 16.....  | 70 |
| Figure 59: Test Case 6, Fixed Point with CIC R = 2.....   | 70 |
| Figure 60: Test Case 6, Fixed Point with CIC R = 4.....   | 71 |
| Figure 61: Test Case 6, Fixed Point with CIC R = 8.....   | 71 |
| Figure 62: Test Case 6, Fixed Point with CIC R = 16.....  | 71 |
| Figure 63: Test Case 7, Fixed Point with CIC R = 2.....   | 72 |
| Figure 64: Test Case 7, Fixed Point with CIC R = 4.....   | 72 |
| Figure 65: Test Case 7, Fixed Point with CIC R = 8.....   | 72 |
| Figure 66: Test Case 7, Fixed Point with CIC R = 16.....  | 73 |
| Figure 67: Test Case 8, Fixed Point with CIC R = 2.....   | 73 |
| Figure 68: Test Case 8, Fixed Point with CIC R = 4.....   | 73 |
| Figure 69: Test Case 8, Fixed Point with CIC R = 8.....   | 74 |
| Figure 70: Test Case 8, Fixed Point with CIC R = 16.....  | 74 |
| Figure 71: Test Case 9, Fixed Point with CIC R = 2.....   | 74 |
| Figure 72: Test Case 9, Fixed Point with CIC R = 4.....   | 75 |



|                                                                        |    |
|------------------------------------------------------------------------|----|
| Figure 73: Test Case 9, Fixed Point with CIC R = 8.....                | 75 |
| Figure 74: Test Case 9, Fixed Point with CIC R = 16.....               | 75 |
| Figure 75: Test Case 10, Fixed Point with CIC R = 2 .....              | 76 |
| Figure 76: Test Case 10, Fixed Point with CIC R = 4 .....              | 76 |
| Figure 77: Test Case 10, Fixed Point with CIC R = 8 .....              | 76 |
| Figure 78: Test Case 10, Fixed Point with CIC R = 16 .....             | 77 |
| Figure 79: Test Case 11, Fixed Point with CIC R = 2 .....              | 77 |
| Figure 80: Test Case 11, Fixed Point with CIC R = 4 .....              | 77 |
| Figure 81: Test Case 11, Fixed Point with CIC R = 8 .....              | 78 |
| Figure 82: Test Case 11, Fixed Point with CIC R = 16 .....             | 78 |
| Figure 83: Test Case 12, Fixed Point with CIC R = 2 .....              | 78 |
| Figure 84: Test Case 12, Fixed Point with CIC R = 4 .....              | 79 |
| Figure 85: Test Case 12, Fixed Point with CIC R = 8 .....              | 79 |
| Figure 86: Test Case 12, Fixed Point with CIC R = 16 .....             | 79 |
| Figure 87: Test Case 13, Fixed Point with CIC R = 2 .....              | 80 |
| Figure 88: Test Case 13, Fixed Point with CIC R = 4 .....              | 80 |
| Figure 89: Test Case 13, Fixed Point with CIC R = 8 .....              | 80 |
| Figure 90: Test Case 13, Fixed Point with CIC R = 16 .....             | 81 |
| Figure 91: Test Case 14, Fixed Point with CIC R = 2 .....              | 81 |
| Figure 92: Test Case 14, Fixed Point with CIC R = 4 .....              | 81 |
| Figure 93: Test Case 14, Fixed Point with CIC R = 8 .....              | 82 |
| Figure 94: Test Case 14, Fixed Point with CIC R = 16 .....             | 82 |
| Figure 95: Test Case 15, Fixed Point with CIC R = 2 .....              | 82 |
| Figure 96: Test Case 15, Fixed Point with CIC R = 4 .....              | 83 |
| Figure 97: Test Case 15, Fixed Point with CIC R = 8 .....              | 83 |
| Figure 98: Test Case 15, Fixed Point with CIC R = 16 .....             | 83 |
| Figure 99: Test Case 16, Fixed Point with CIC R = 2 .....              | 84 |
| Figure 100: Test Case 16, Fixed Point with CIC R = 4 .....             | 84 |
| Figure 101: Test Case 16, Fixed Point with CIC R = 8 .....             | 84 |
| Figure 102: Test Case 16, Fixed Point with CIC R = 16 .....            | 85 |
| Figure 103: No Bypass Scenario.....                                    | 86 |
| Figure 104: Fractional Decimator Bypassed.....                         | 86 |
| Figure 105: IIR 2.4MHz Notch Bypassed .....                            | 86 |
| Figure 106: Fractional Decimator and IIR 2.4MHz Notch Bypassed .....   | 87 |
| Figure 107: IIR 5(1) MHz Notch Bypassed.....                           | 87 |
| Figure 108: Fractional Decimator and IIR 5(1) MHz Notch Bypassed ..... | 87 |
| Figure 109: IIR Filters Bypassed.....                                  | 88 |
| Figure 110: Only CIC Decimator is On .....                             | 88 |
| Figure 111: CIC Decimator is Bypassed.....                             | 88 |
| Figure 112: Only IIR Filters are On .....                              | 89 |
| Figure 113: IIR 2.4MHz Notch and CIC Decimator Bypassed .....          | 89 |
| Figure 114: Only IIR 5(1) MHz Notch Filter is On .....                 | 89 |
| Figure 115: IIR 5(1) MHz Notch and CIC Decimator Bypassed.....         | 90 |
| Figure 116: Only IIR 2.4MHz Notch Filter On .....                      | 90 |
| Figure 117: Only Fractional Decimator On .....                         | 90 |
| Figure 118: All Core Blocks are Bypassed .....                         | 91 |
| Figure 119: Testing Flow Diagram .....                                 | 93 |



|                                                                                       |     |
|---------------------------------------------------------------------------------------|-----|
| Figure 120: Test Case 7, Floating Point with CIC R = 8.....                           | 97  |
| Figure 121: Test Case 7, Floating Point with CIC R = 16.....                          | 97  |
| Figure 122: Test Case 15, Floating Point with CIC R = 16.....                         | 97  |
| Figure 123: Maximum Error (Analog View).....                                          | 98  |
| Figure 124: Maximum Error (Literal View).....                                         | 98  |
| Figure 125: Full Exhaustive Testing Waveform.....                                     | 98  |
| Figure 126: Reading Back the FIR Coefficients (Part 1) .....                          | 99  |
| Figure 127: Reading Back the FIR Coefficients (Part 2) and the IIR Coefficients ..... | 99  |
| Figure 128: Probing the Fractional Decimator Output.....                              | 99  |
| Figure 129: Probing the IIR Filter Output .....                                       | 100 |
| Figure 130: Probing the CIC Output.....                                               | 100 |
| Figure 131: Reading Back a Certain Block's Status.....                                | 100 |
| Figure 132: Writing New Coefficient Values (Part 1).....                              | 101 |
| Figure 133: Writing New Coefficient Values (Part 2).....                              | 101 |
| Figure 134: New Output after Coefficient Changes .....                                | 101 |
| Figure 135: Overflow Event .....                                                      | 102 |
| Figure 136: Transcript Output .....                                                   | 102 |
| Figure 137: Full Core Schematic .....                                                 | 103 |
| Figure 138: IIR Schematic .....                                                       | 103 |
| Figure 139: CIC Schematic .....                                                       | 103 |
| Figure 140: Elaboration Messages.....                                                 | 104 |
| Figure 141: System Schematic.....                                                     | 104 |
| Figure 142: Utilization Report.....                                                   | 105 |
| Figure 143: Timing Summary.....                                                       | 105 |
| Figure 144: Synthesis Messages.....                                                   | 105 |
| Figure 145: Utilization Report.....                                                   | 105 |
| Figure 146: Timing Summary.....                                                       | 106 |
| Figure 147: Device.....                                                               | 106 |
| Figure 148: Implementation Messages.....                                              | 107 |

## List of Tables

|                                                          |    |
|----------------------------------------------------------|----|
| Table 1: Blocks Summary .....                            | 14 |
| Table 2: Latency Breakdown .....                         | 16 |
| Table 3: System-Level Performance Targets .....          | 17 |
| Table 4: System Signals Definition.....                  | 18 |
| Table 5: Data Formats Definition .....                   | 19 |
| Table 6: Fractional Decimator Signals Definition.....    | 23 |
| Table 7: Fractional Decimator Parameter Definition ..... | 24 |
| Table 8: Data and Coefficients Formats.....              | 25 |
| Table 9: IIR Module Key Features.....                    | 31 |
| Table 10: IIR Signals Definition .....                   | 32 |
| Table 11: IIR Parameters Definition .....                | 33 |
| Table 12: Key Performance Results .....                  | 33 |
| Table 13: CIC Key Design Trade-Offs .....                | 37 |
| Table 14: CIC Optimization Techniques .....              | 39 |



|                                                                   |     |
|-------------------------------------------------------------------|-----|
| Table 15: CIC Signals Definition .....                            | 40  |
| Table 16: CIC Parameters Definition.....                          | 40  |
| Table 17: Arithmetic Rounding Data Representations .....          | 45  |
| Table 18: Arithmetic Rounding Handler Signals Definition .....    | 45  |
| Table 19: Arithmetic Rounding Handler Parameters Definition ..... | 46  |
| Table 20: SNR Analysis .....                                      | 47  |
| Table 21: APB Signals .....                                       | 49  |
| Table 22: APB Bridge Signals.....                                 | 49  |
| Table 23: MPRAM Signals.....                                      | 50  |
| Table 24: Register Map .....                                      | 52  |
| Table 25: Comprehensive Verification Strategy .....               | 53  |
| Table 26: Performance.....                                        | 53  |
| Table 27: Digital Implementation Characteristics .....            | 54  |
| Table 28: Compliance Results .....                                | 55  |
| Table 29: Test Cases .....                                        | 63  |
| Table 30: Bypass Scenarios .....                                  | 85  |
| Table 31: Testbench Parameters .....                              | 92  |
| Table 32: Test Metrics .....                                      | 95  |
| Table 33: Achieved Performance .....                              | 108 |
| Table 34: Fractional Decimator FIR Coefficients .....             | 110 |
| Table 35: 1MHz Notch IIR Filter Coefficients .....                | 111 |
| Table 36: 2.4MHz Notch IIR Filter Coefficients .....              | 111 |
| Table 37: Register Map (Repeat) .....                             | 112 |



# 1. Project Overview

The Digital Front End (DFE) Filter Array presented in this work is a complete multistage digital-signal-processing chain designed for radio or analog-to-digital-converter (ADC) preprocessing. Its primary objective is to transform an incoming 9 MHz digitized intermediate-frequency (IF) stream into one or more lower-rate, interference-free outputs suitable for downstream demodulation and decoding.

The architecture integrates three major processing stages:

1. A **fractional polyphase decimator** performing a 2:3 (9 MHz → 6 MHz) sample-rate conversion with stringent anti-aliasing performance.
2. Two cascaded **second-order IIR notch filters** that suppress narrowband interferers located at 2.4 MHz and 5.0 MHz.
3. A **configurable CIC decimation chain** that provides additional decimation by powers-of-two ( $D = 1, 2, 4, 8, 16$ ).

The DFE is implemented entirely in fixed-point arithmetic, employs low-latency streaming interfaces, and includes full run-time configurability through an APB control bus.

Verification encompasses algorithmic simulation, RTL-level functional validation, FPGA prototyping, and ASIC backend realization using the SkyWater 130 nm process with Synopsys DC, ICC, PrimeTime, ICV, and RedHawk sign-off (currently under implementation).

## 1.1. Motivation and Objectives

Modern communication and sensor systems demand flexible digital front ends that can efficiently reject interferers, manage variable sampling rates, and operate with tight power and area budgets. Traditional single-stage FIR solutions meeting 80 dB stopband attenuation are computationally expensive for hardware implementation.

Furthermore, the presence of narrow interferers near the Nyquist boundary challenges conventional filter designs.

The principal objectives of this project are therefore:

- **Fractional rate conversion** from 9 MHz to 6 MHz with  $\leq \pm 0.25$  dB passband ripple and  $\geq 80$  dB alias suppression.
- **Interference suppression** through dual precision-quantized IIR notch filters providing  $\geq 50$  dB notch depth at 2.4 MHz and 5 MHz.
- **Configurable decimation** via a compact, low-power CIC stage offering flexible output rates.
- **Full control and observability**, enabling real-time configuration and diagnostic feedback.
- **Hardware efficiency**, ensuring fixed latency under 200  $\mu$ s and reliable operation at 9 MHz input throughput.

The design thus bridges algorithmic DSP theory and physical hardware implementation, demonstrating a reproducible and verifiable pipeline suitable for both FPGA deployment and ASIC integration.



## 1.2. Key Design Features

The developed DFE architecture achieves high spectral performance through optimized multistage processing:

- **Fractional Polyphase Decimator:** Implements interpolation-by-2 / decimation-by-3 polyphase filtering. This approach reduces arithmetic operations while maintaining sharp transition bands between 2.8 MHz and 3.2 MHz. The design maintains constant group delay and deterministic latency for streaming applications.
- **Dual IIR Notch Filters:** Two cascaded biquads, designed in double precision and quantized to s20.18 format, provide selective attenuation of narrow interferers. The direct-form II transposed implementation minimizes word-length growth and ensures pole stability under quantization.
- **Configurable CIC Decimator:** A one-stage integrator-comb structure supports runtime selection of decimation factors (1–16). The inherent droop is of no need to be compensated by a short FIR filter, guaranteed  $\leq \pm 0.5$  dB passband ripple.
- **Control and Status Interface:** An APB-mapped register file exposes enable/bypass controls, coefficient loading, CIC-factor selection, and overflow monitoring, providing full software control.
- **Scalable Hardware Implementation:** Parameterizable RTL modules are written in SystemVerilog, synthesizable for FPGA and ASIC flows. Fixed-point sizing and pipeline balancing ensure timing closure and efficient utilization of DSP and logic resources.

Together, these features yield a compact and high-performance front-end filter bank optimized for reconfigurable signal-processing platforms.

## 1.3. Achieved Results and Contributions

The DFE Filter Array meets or exceeds all specified functional and numerical targets. Simulation and hardware validation confirm:

- **Fractional decimator performance:** Passband ripple  $< 0.15$  dB and stopband attenuation  $> 85.9$  dB.
- **Notch filter rejection:** Measured notch depth  $> 60$  dB at both 2.4 MHz and 5 MHz with negligible adjacent-band distortion.
- **CIC decimation accuracy:** Final ripple  $\leq 0.5$  dB.
- **Latency and throughput:** Stable streaming operation at 9 MHz input with total latency  $< 200$   $\mu$ s.
- **Hardware results:** FPGA implementation meets timing at 9 MHz clock; ASIC synthesis and place-and-route achieve timing closure in SkyWater 130 nm process with compliant power and area metrics.

Major contributions of this work include:

- A resource-efficient fractional decimator architecture achieving deep alias suppression at modest order.



- A numerically stable fixed-point IIR notch implementation with predictable attenuation.
- A unified verification flow linking MATLAB/Python algorithmic models to RTL, FPGA, and ASIC validation.
- Demonstration of a complete end-to-end DFE solution integrating flexibility, precision, and manufacturability.

## 1.4. Report Organization

This report is organized into sixteen sections.

- **Section 2** introduces the background, motivation, and problem definition of the Digital Front End filter array.
- **Section 3** presents the overall system architecture and signal-processing flow.
- **Section 4** defines the key numerical, performance, and interface specifications.
- **Sections 5–7** detail the design, theory, and implementation of the fractional polyphase decimator, dual IIR notch filters, and configurable CIC decimation chain.
- **Section 8** discusses fixed-point design, precision analysis, and numerical behavior.
- **Section 9** covers control and observability features.
- **Section 10** explains the verification and validation methodology and results.
- **Sections 11 and 12** describe hardware synthesis and FPGA prototyping.
- **Section 13** documents the ASIC backend and physical design flow.
- **Section 14** summarizes measured system performance and key results.
- **Section 15** analyses design risks and mitigation strategies, and **Section 16** concludes with final outcomes and future work.
- **References** and **Appendices** provide supporting data, figures, and detailed reports.

## 2. Introduction and Background

### 2.1. Context: Digital Front End Processing in RF Systems

Modern wireless communication, radar, and sensor systems rely heavily on digital front-end (DFE) architectures to condition and preprocess wideband signals captured by analog-to-digital converters (ADCs).

The DFE performs essential operations such as sample-rate conversion, channel filtering, and interference suppression before baseband demodulation or decoding.

In typical radio receiver chains, the ADC sampling frequency is fixed by hardware constraints or clocking architecture, while downstream processing units often require lower, harmonically related sample rates. Additionally, strong narrowband interferers, originating from co-located transmitters, harmonics, or spurious emissions, can significantly degrade system performance if not filtered early in the digital domain.



Thus, a well-designed DFE filter array must achieve two key objectives: **precise resampling** and **robust interference suppression**, all under strict computational, latency, and power limitations.

## 2.2. Problem Definition and System-Level Requirements

The present design addresses the development of a **multistage DFE filter array** tailored for a 9 MHz input stream.

The goal is to provide a flexible, low-latency digital processing chain capable of converting the 9 MHz sampled data to a nominal **6 MHz** stream and additional subrates through a configurable decimation path.

Key performance specifications are summarized as follows:

- **Fractional Decimation (9 MHz → 6 MHz):**
  - Passband: 0–2.8 MHz
  - Ripple  $\leq \pm 0.25$  dB
  - Stopband attenuation  $\geq 80$  dB beyond 3.2 MHz
  - Constant group delay and fixed latency
- **Dual Notch Filtering:**
  - Centre frequencies: 2.4 MHz and 5.0 MHz
  - Notch depth  $\geq 50$  dB (goal 60 dB)
  - Bandwidth  $< 100$  kHz
  - Stable, quantized IIR design with minimal ringing
- **Configurable CIC Decimator:**
  - Selectable decimation factors  $D = 1, 2, 4, 8, 16$
  - Passband ripple  $\leq \pm 0.5$  dB (after optional compensation FIR)
  - Stopband attenuation  $\geq 60$  dB
  - Fixed-point implementation with safe word growth management
- **System Constraints:**
  - Continuous streaming at 9 MHz input
  - End-to-end latency  $< 200$   $\mu$ s
  - Fixed-point precision: s16.15
  - Fully controllable through APB register map

The DFE must operate deterministically and maintain stability across all configurations while satisfying numerical and spectral integrity standards suitable for ASIC or FPGA implementation.



## 2.3. Motivation and Design Challenges

The motivation for this design stems from the growing need for reconfigurable front-end signal chains in modern SDR (Software-Defined Radio) and mixed-signal systems.

Applications such as cognitive radios, satellite receivers, and multiband wireless front-ends demand flexible sampling rates, adaptive filtering, and fine-grained control — all within limited silicon and power budgets.

However, achieving such flexibility introduces several challenges:

- **Fractional Decimation Efficiency:**  
Implementing 2/3 resampling directly with high stopband attenuation demands very long FIR filters, which are computationally prohibitive. A polyphase decomposition is required to reduce the number of operations per output sample.
- **Quantized IIR Stability:**  
Fixed-point representation introduces coefficient quantization effects that can destabilize poles close to the unit circle. Careful pole-radius selection and scaling are necessary to ensure both stability and notch depth consistency.
- **CIC Ripple and Bit Growth:**  
While CIC filters efficiently handle high decimation ratios, their inherent sinc-shaped droop must be corrected via a short compensation FIR. Additionally, internal accumulator bit growth (proportional to  $R^n$ ) must be managed to prevent overflow.
- **Latency and Pipelining:**  
The overall system must maintain a deterministic latency below 200  $\mu$ s, balancing filter order and pipeline depth to sustain continuous throughput.

The project thus represents a comprehensive exploration of **algorithmic optimization, fixed-point design, and hardware implementation trade-offs**, bringing together DSP theory and VLSI design methodology.



Figure 1: Digital Fronts Overview



## 2.4. Literature and Standards Review

Digital front-end processing has been widely explored in communication and instrumentation literature.

Fractional resampling using **polyphase FIR architectures** (Vaidyanathan, 1993) offers efficient sample-rate conversion by exploiting the relationship between interpolation and decimation phases.

Similarly, **IIR notch filters** designed via pole-zero symmetry have long been used to suppress narrowband interference with low computational cost (Mitchell & Stensby, 1998).

The **Cascaded Integrator-Comb (CIC)** architecture, originally introduced by Hogenauer (1981), remains the industry standard for large integer-rate decimations due to its multiplierless implementation and inherent scalability.

In ASIC and FPGA contexts, prior works emphasize balancing **filter performance, hardware cost, and numerical stability**, often adopting hybrid approaches combining fractional, notch, and CIC stages to optimize performance per gate.

This project extends that methodology, integrating precise **fixed-point arithmetic, hardware-friendly control, and complete verification flow**, bridging the gap between algorithmic models and physical silicon implementation.

## 3. System Architecture Overview

### 3.1. High-Level Block Diagram of the DFE

The Digital Front End (DFE) Filter Array is a **multistage digital signal processing pipeline** designed to convert, clean, and condition a high-rate sampled input stream for subsequent baseband processing.

It consists of three principal filtering stages: fractional polyphase decimation, dual narrowband IIR notches, and a configurable CIC decimation chain, each implemented for fixed-point hardware operation with deterministic timing.

At a high level, the DFE accepts a 9 MHz fixed-point data stream and outputs one or more lower-rate, filtered data streams at 6 MHz or below, depending on the selected decimation configuration. Control and observability are managed by a lightweight industry-standard AMBA APB-based register interface that supports run-time reconfiguration without disrupting data flow.



Figure 2: System Level Block Diagram

This modular structure enables flexible integration, easy verification, and scalable synthesis across both FPGA and ASIC platforms. Each stage operates synchronously and supports continuous streaming, guaranteeing constant throughput and latency.

### 3.2. Data Flow and Sample Rate Relationships

The DFE's multistage structure manages **sample-rate transitions** and **spectral conditioning** across three major domains: 9 MHz input, 6 MHz intermediate rate, and 6 MHz/D output (for  $D = 1, 2, 4, 8, 16$ ).

Table 1: Blocks Summary

| Stage                          | Input Rate | Output Rate | Function                                                    |
|--------------------------------|------------|-------------|-------------------------------------------------------------|
| Fractional Polyphase Decimator | 9 MHz      | 6 MHz       | Converts 9 MHz input to 6 MHz output (decimation by 1.5)    |
| IIR Notch Filter Bank          | 6 MHz      | 6 MHz       | Removes narrowband interferers at 2.4 and 5.0 MHz           |
| Configurable CIC Decimator     | 6 MHz      | 6 MHz / D   | Reduces rate by selectable power-of-two factor ( $D=1-16$ ) |

At each transition, the DFE preserves **signal integrity and synchronization** by using fixed, integer-multiple relationships between sampling clocks.

This allows phase-coherent operation across the chain and simplifies verification and timing closure.

The fractional polyphase decimator performs the only non-integer resampling, using a 2/3 ratio realized through polyphase FIR decomposition (interpolation by 2, decimation by 3). Subsequent stages operate at the reduced rate, reducing computational load and power consumption.

### 3.3. Signal Path Description

The DFE signal path is divided into **three primary processing domains**:



### 1. Fractional Polyphase Decimation Stage:

- Performs 2/3 rate conversion ( $9 \rightarrow 6$  MHz) using a linear-phase FIR filter split into multiple polyphase subfilters.
- Provides anti-alias protection with  $\geq 80$  dB attenuation beyond 3.2 MHz and  $\leq \pm 0.25$  dB passband ripple.
- Outputs a 6 MHz intermediate data stream for downstream filtering.

### 2. Dual IIR Notch Filter Bank:

- Two cascaded 2nd-order biquads eliminate narrowband interferers at 2.4 MHz and 5.0 MHz (relative to 6 MHz rate).
- Coefficients precomputed in double precision and quantized to fixed-point s16.14 representation.
- Implemented in Direct Form II Transposed to minimize round-off noise and avoid overflow.
- Each notch introduces  $\leq 2$  sample latency and can be individually bypassed via control register.

### 3. Configurable CIC Decimator (with Optional FIR Compensation):

- Five-stage CIC integrator-comb chain performs integer decimation by  $D = \{1, 2, 4, 8, 16\}$ .
- Provides efficient rate reduction without multipliers, using only adders and delays.
- Inherent passband droop, with no correction needed by optional FIR compensation, for final flatness  $\leq 0.5$  dB.
- Designed to maintain numerical safety through controlled bit growth and rounding.

The **signal path latency** is deterministic, and internal synchronization ensures each downstream module receives continuous data without underruns or metastability.

Streaming handshakes propagate “ready/valid” signals to enforce pipeline synchronization.

## 3.4. Control and Observability Framework

A unified **control and status framework** provides configurability, diagnostics, and testability of the DFE during both simulation and hardware operation.

All control logic is implemented through an **Advanced Peripheral Bus (APB)** register interface that supports read/write access for configuration and monitoring.

### Key Control Features:

- Stage enable/disable and bypass bits.
- CIC decimation factor selection ( $D \in \{1, 2, 4, 8, 16\}$ ).
- Filter coefficient load and readback registers.



- Status flags: overflow, underflow, ready, and valid indicators.
- Real-time probe selection for internal node visibility (fractional output, notch output, CIC output).
- Test hooks for impulse and tone injection during verification.

#### Observability Features:

- Debug output ports to export intermediate data to test benches or monitoring tools.
- Register-driven readback for filter coefficients and internal states.
- Integration with simulation and FPGA debug tools (Digilent Basys3 (Artix-7 XC7A35T) or equivalent).

This framework enables robust unit-level and system-level verification and allows the same control model to be used consistently across simulation, FPGA prototyping, and ASIC validation.

### 3.5. Streaming and Latency Architecture

The DFE is designed for **fully streaming, fixed-latency operation**, ensuring constant throughput under continuous data flow conditions.

#### Streaming Characteristics:

- All processing blocks communicate through synchronous streaming interfaces (ready/valid protocol).
- No intermediate frame buffering or packetization; data moves sample-by-sample through the chain.
- Each module adds a known and constant number of sample delays.
- The cumulative system latency remains deterministic — < 200 µs total from input to output.

Table 2: Latency Breakdown

| Stage                               | Latency (Samples)      | Description                           |
|-------------------------------------|------------------------|---------------------------------------|
| Fractional Polyphase Decimator      | ~40                    | Filter group delay                    |
| IIR Notch Filters (2 stages)        | 2–4                    | 1–2 per biquad                        |
| CIC Decimator + FIR Compensation    | Variable (D-dependent) | Includes CIC pipeline delay           |
| Control & Bus Interface             | 0                      | Control path only, non-data dependent |
| Total (9 MHz input to final output) | < 200 µs               | Deterministic latency budget          |

This predictable latency and continuous throughput make the DFE suitable for **real-time communication systems**, where synchronization and phase alignment are critical.

Furthermore, the streaming architecture simplifies **hardware timing closure** by enforcing pipeline balancing at synthesis, ensuring consistent data movement across clock domains and power domains.



## 4. Design Specifications

### 4.1. Numerical and Performance Requirements

The Digital Front End (DFE) Filter Array is designed to meet rigorous spectral, numerical, and timing specifications to ensure clean signal processing across multiple operating configurations. Each stage of the DFE contributes to meeting system-level performance targets, summarized in the table and key metrics below.

*Table 3: System-Level Performance Targets*

| Parameter              | Specification                  | Description                              |
|------------------------|--------------------------------|------------------------------------------|
| Input Sample Rate      | 9 MHz                          | Continuous streaming, no dropped samples |
| Output Sample Rate     | 6 MHz / D (D = 1, 2, 4, 8, 16) | Selectable CIC decimation                |
| End-to-End Latency     | $\leq 200 \mu\text{s}$         | Deterministic total pipeline delay       |
| CIC                    | $\leq \pm 0.5 \text{ dB}$      | Post-CIC passband flatness               |
| Stopband Attenuation   | $\geq 60 \text{ dB}$           | After CIC and compensation filter        |
| Fixed-Point Arithmetic | s16.15 (signed)                | Input/output data precision              |
| Throughput             | Continuous at 9 MHz input      | No stalls, no buffering overflow         |

#### 4.1.1. Stage-Level Design Targets

##### 1. Fractional Polyphase Decimator ( $9 \rightarrow 6 \text{ MHz}$ )

- FIR-based, linear phase.
- Transition band: 2.8 MHz – 3.2 MHz.
- Stopband  $\geq 80 \text{ dB}$ , ripple  $\leq 0.25 \text{ dB}$ .
- Constant group delay, stable latency.

##### 2. IIR Notch Filters (2.4 & 5.0 MHz)

- Biquad sections, Direct Form II Transposed.
- Narrow 3 dB bandwidth  $< 100 \text{ kHz}$ .
- Stable under fixed-point quantization (pole radius  $< 0.99$ ).
- Latency  $\leq 2$  samples per biquad.

##### 3. CIC Decimator (Configurable D)

- $N = 5$  stages,  $R = D$ .
- Passband droop compensated by optional FIR.
- Internal accumulator width scaled for no overflow.
- Output ripple  $\leq 0.5 \text{ dB}$ .



Collectively, these requirements ensure the DFE maintains **alias-free, low-distortion filtering** across all modes while adhering to strict fixed-point and throughput constraints.

## 4.2. Interface and Data Format Specification

The DFE uses standardized streaming data interfaces and control buses to ensure interoperability and ease of integration with external modules such as demodulators, testbenches, and ADC data converters.

The DFE implements a memory-mapped **APB (Advanced Peripheral Bus)** control interface that allows runtime configuration and observation.

Registers are 20 bits wide max, accessible via a standard bus interface for software or verification agents.

*Table 4: System Signals Definition*

| Name            | Direction | Length                | Description                                                              |
|-----------------|-----------|-----------------------|--------------------------------------------------------------------------|
| MTRANS          | input     | 1 bit                 | Asserted to request a new APB transaction.                               |
| MWRITE          | input     | 1 bit                 | Indicates an APB write access when HIGH and an APB read access when LOW. |
| MSELx           | input     | 4 bits                | To select an APB peripheral.                                             |
| MADDR           | input     | 8 bits                | Address for the read/write transaction.                                  |
| MWDATA          | input     | 32 bits               | The data to be written in case of a write transaction.                   |
| MRDATA          | output    | 32bits                | The data to read in case of a read transaction.                          |
| CORE_IN         | input     | 16 bits               | The input signal.                                                        |
| VALID_IN        | input     | 1 bit                 | A valid signal for CORE_IN signal.                                       |
| CORE_OUT        | output    | 16 bits               | System output.                                                           |
| VALID_OUT       | output    | 1 bit                 | A valid signal for CORE_OUT signal.                                      |
| OVERFLOW        | output    | 1 bit                 | Indicates if the output overflowed.                                      |
| UNDERFLOW       | output    | 1 bit                 | Indicates if the output underflowed.                                     |
| BLOCK_OUT       | output    | 16 bits               | A certain block's output.                                                |
| BLOCK_VALID_OUT | output    | 1 bit                 | A valid signal for BLOCK_OUT signal.                                     |
| BLOCK_OVERFLOW  | output    | 1 bit                 | Indicates if a certain block's output overflowed.                        |
| BLOCK_UNDERFLOW | output    | 1 bit                 | Indicates if a certain block's output underflowed.                       |
| COEFF_OUT       | output    | 20 bits wide, 72 deep | To readback a specific filter's taps.                                    |



All data that is input to or output from any block is represented in **two's complement** fixed-point format with 16-bit word width unless otherwise stated.

Coefficient files are generated in signed decimal format and loaded through the control interface.

### 4.3. Data Precision and Fixed-Point Format Decisions

The DFE employs a **consistent, parameterized fixed-point arithmetic framework** to achieve deterministic, resource-efficient operation across its signal-processing stages. Each processing block, including the fractional polyphase decimator, the dual notch filters, and the configurable CIC decimation chain, operates on a well-defined S-format representation to maintain precision compatibility and avoid scaling mismatches between modules.

#### 4.3.1. Fixed-Point Representation Strategy

All data paths in the DFE are implemented using **signed two's-complement fixed-point arithmetic** in the form Sm.n, where  $m$  denotes the total number of bits (including sign) and  $n$  the fractional bits. The principal motivation for fixed-point arithmetic over floating-point is to:

- Reduce FPGA logic and DSP block utilization.
- Guarantee bit-accurate and cycle-deterministic processing.
- Enable explicit control of quantization effects and overflow behavior.

The baseline numeric format for signal samples throughout the DFE is **s16.15 (16-bit signed)**, corresponding to the  $\pm 1.0$  full-scale range used by the analog-front-end normalization.

Intermediate stages such as accumulation, filtering, and decimation operate at **wider internal precisions** (up to s42.32 or s44.32) to preserve dynamic range through multiple multiply-accumulate (MAC) operations.

#### 4.3.2. Stage-Wise Precision Allocation

*Table 5: Data Formats Definition*

| Block                          | Input Format       | Internal Accumulator | Output Format | Comments                                                |
|--------------------------------|--------------------|----------------------|---------------|---------------------------------------------------------|
| Fractional Polyphase Decimator | s16.15             | S42.32               | s16.15        | High fractional precision to meet 80 dB stopband target |
| Notch Filters (IIR)            | s16.15             | S36.30               | s16.15        | Coefficient scaling preserves pole stability            |
| CIC Decimation Chain           | s16.15             | S44.32               | s16.15        | Dynamic word growth management up to 42 bits            |
| Rounding/Saturation Stage      | S42.32<br>→ s16.15 | —                    | s16.15        | Managed by rounding_overflow_arith                      |

Each filter stage uses internal widening based on its inherent gain or accumulation depth. For instance, the CIC filter's multi-stage integrator structure exhibits **word growth proportional to  $Q \times \log_2(R)$** , where  $Q$  is the filter order and  $R$  is the decimation factor. Hence, accumulator widths of **up to 42 bits** were allocated for  $R = 16$  operation to avoid overflow under full-scale input.



### 4.3.3. Design Trade-Offs and FPGA Implications

Precision selections were guided by both **numerical fidelity** and **FPGA resource constraints**:

- **DSP slice optimization:** 16-bit data width aligns with the native multiplier inputs of modern FPGA DSP blocks (Artix-7 XC7A35T).
- **Overflow prevention:** Intermediate accumulator widths provide sufficient guard bits for full-scale operation.
- **Pipelined rounding:** The rounding\_overflow\_arith module is fully combinational, introducing **zero additional latency** in the critical path.

This precision framework ensures consistent gain, predictable overflow behavior, and numerical stability across all DFE configurations.

## 5. Fractional Polyphase Decimator (9MHz → 6MHz)

### 5.1. Design Theory and Mathematical Formulation

The fractional decimator performs **sample-rate conversion by a rational factor of 2/3**, reducing the input sampling rate from **9 MHz** to **6 MHz** while preserving spectral integrity within the passband. This operation is essential to align the DFE output with the downstream demodulator's 6 MHz processing rate.

#### 5.1.1. Theoretical Foundation

Fractional decimation by  $\frac{L}{M}$  (where  $L$ = interpolation factor,  $M$ = decimation factor) is achieved through:

$$y[n] = \sum_k h[k] \cdot x\left(\frac{nM - k}{L}\right)$$

where  $h[k]$  is the FIR impulse response.

For the DFE:

$$\frac{L}{M} = \frac{2}{3}$$

meaning the system **interpolates by 2** and **decimates by 3** to achieve a **rate reduction by 1.5**.

#### 5.1.2. Spectral Constraints

- **Input sample rate:** 9 MHz
- **Output sample rate:** 6 MHz
- **Passband:** 0 – 2.8 MHz
- **Transition band:** 2.8 – 3.2 MHz
- **Stopband:**  $\geq 3.2$  MHz, attenuated  $\geq 80$  dB



The FIR lowpass filter must suppress images generated by interpolation and aliasing introduced by decimation.

To satisfy the **80 dB stopband** and  $\pm 0.25$  dB passband ripple, the normalized cutoff  $\omega_c \approx 0.35$  (relative to  $\pi$  at 9 MHz).

### 5.1.3. Ideal Response

The ideal magnitude response  $|H(e^{j\omega})|$  is defined as:

$$|H(e^{j\omega})| = \begin{cases} 1 & |\omega| \leq \omega_p \\ 0 & |\omega| \geq \omega_s \end{cases}$$

where  $\omega_p = 2\pi \times (2.8/9)$  and  $\omega_s = 2\pi \times (3.2/9)$ .

A linear-phase FIR implementation ensures **constant group delay**, a key requirement for preserving symbol timing in communication systems.

## 5.2. Polyphase Structure and Filter Bank Decomposition

### 5.2.1. Polyphase Concept

To reduce computational cost, the FIR decimator is implemented using a **polyphase decomposition**, which divides the long FIR filter into multiple shorter subfilters corresponding to different fractional time phases.

For  $L = 2, M = 3$ :

$$h[n] = \{h_0[n], h_1[n]\}$$

where each subfilter corresponds to a specific interpolation phase.

The output samples are generated by selecting appropriate phases based on the decimation schedule.

### 5.2.2. Implementation Steps

1. **Upsample** input by factor 2 (insert one zero between samples).
2. **Filter** using an FIR with coefficients  $h[n]$ .
3. **Downsample** the filtered signal by 3.



Figure 3: Implementation Diagram for Fractional Decimation using Polyphase FIR

In the **polyphase form**, the process combines steps 1–3 efficiently by operating directly on original samples through rearranged filter coefficients.



### 5.2.3. Computational Efficiency

A direct FIR implementation for 80 dB stopband could require ~146 taps.

Polyphase realization reduces active multiplications per output sample to approximately  $\frac{N}{M}$ .

The structure also allows parallel processing across FIR phases for **hardware pipelining**.

## 5.3. Coefficient Design and Optimization (80dB Stopband)

### 5.3.1. Filter Design Methodology

Coefficients are designed using **MATLAB's DSP System Toolbox** to achieve the specified ripple and stopband attenuation.

Design parameters:

- Filter Type: Lowpass FIR
- Response: Linear phase, Type I symmetric
- Stopband attenuation:  $\geq 80$  dB
- Passband ripple:  $\leq \pm 0.25$  dB
- Transition width: 0.4 MHz
- Sampling rate: 9 MHz

```
Hd_Fractional_Decimator = dsp.FIRRateConverter( ...
    'FullPrecisionOverride', false, ...
    'InterpolationFactor', intf, ...
    'DecimationFactor', decf, ...
    'Numerator', b, ...
    'CoefficientsDataType', 'Custom', ...
    'CustomCoefficientsDataType', numerictype([],20,18), ...
    'ProductDataType', 'Custom', ...
    'CustomProductDataType', numerictype([],36,33), ...
    'AccumulatorDataType', 'Custom', ...
    'CustomAccumulatorDataType', numerictype([],43,33), ...
    'OutputDataType', 'Custom', ...
    'CustomOutputDataType', numerictype([],16,15), ...
    'RoundingMethod', 'Convergent', ...
    'OverflowAction', 'Saturate');
```

The reason for the need for 146 taps was due to the sampling frequency that became 18MHz after the up-sampling stage, where the FIR would then be placed.

In addition, it should be noted that the fractional decimator itself removes any aliases caused by any interference that are above the 4.5MHz (the Nyquist rate).

### 5.3.2. Coefficients Quantization

- Coefficients are quantized to **20-bit signed fixed-point (s20.18)**.
- Quantization error  $< 0.001$  RMS magnitude.
- Filter symmetry allows halving of multiplication hardware.



### 5.3.3. Optimization Considerations

- **Coefficient symmetry:** Reduces multipliers by ~50%.
- **Halfband-like optimization:** Although not exact, similar symmetry properties are exploited for resource savings.
- **Coefficient pruning:** Near-zero coefficients below -90 dB magnitude are eliminated.
- **Coefficient reordering:** Aligns with polyphase partitioning for parallel operation.

## 5.4. Implementation Architecture

The **fractional polyphase decimator** was implemented in synthesizable SystemVerilog as a parameterized, streaming digital filter designed to down sample an input stream from **9 MHz to 6 MHz** (a ratio of 2/3). The implementation follows a **polyphase FIR decomposition**, allowing efficient reuse of partial sums and minimizing redundant multiplications for fractional-rate conversion.



Figure 4: Fractional Decimator Block Diagram

Table 6: Fractional Decimator Signals Definition

| Name           | Direction | Length                | Description                                |
|----------------|-----------|-----------------------|--------------------------------------------|
| FILTER_IN      | input     | 16 bits               | The input signal.                          |
| VALID_IN       | input     | 1 bit                 | A valid signal for FILTER_IN signal.       |
| COEFF_DATA_IN  | input     | 20 bit wide, 146 deep | To load new taps for the filter.           |
| COEFF_WR_EN    | input     | 1 bit                 | An enable signal for COEFF_DATA_IN signal. |
| BYPASS         | input     | 1 bit                 | To enable or disable this block.           |
| FILTER_OUT     | output    | 16 bits               | The decimated signal.                      |
| VALID_OUT      | output    | 1 bit                 | A valid signal for FILTER_OUT signal.      |
| OVERFLOW       | output    | 1 bit                 | Indicates if the output overflowed.        |
| UNDERFLOW      | output    | 1 bit                 | Indicates if the output underflowed.       |
| COEFF_DATA_OUT | output    | 20 bit wide, 72 deep  | To readback the filter taps.               |

### Core Structure

The module (fractional\_decimator) is parameterized for precision and scalability:



Table 7: Fractional Decimator Parameter Definition

| Name          | Value                                | Description                                                      |
|---------------|--------------------------------------|------------------------------------------------------------------|
| DATA_WIDTH    | 16                                   | Total I/O bits.                                                  |
| DATA_FRAC     | 15                                   | Fractional bits of I/O.                                          |
| COEFF_WIDTH   | 20                                   | Total bits of the coefficients.                                  |
| COEFF_FRAC    | 18                                   | Fractional bits of the coefficients.                             |
| N_TAP         | 146                                  | Number of coefficients.                                          |
| M             | 3                                    | Decimation Factor.                                               |
| PHASE_N_TAP   | $N\_TAP/2$                           | The number of phases.                                            |
| PROD_WIDTH    | $DATA\_WIDTH + COEFF\_WIDTH$         | Total bits after multiplication.                                 |
| PROD_FRAC     | $DATA\_FRAC + COEFF\_FRAC$           | Fractional bits after multiplication.                            |
| ACC_WIDTH     | $PROD\_WIDTH + \log_2 PHASE\_N\_TAP$ | Total bits after addition.                                       |
| ACC_FRAC      | PROD_FRAC                            | Fractional bits after addition.                                  |
| COUNTER_WIDTH | $\log_2 M$                           | Bits for the counter used in decimation.                         |
| SCALE         | 1                                    | A factor used in rounding that whose value must be a power of 2. |

The polyphase structure divides the 146-tap FIR into **two 72-tap subfilters (Phase 1 and Phase 2)**. The **phase selection logic** toggles between coefficient sets based on a modulo-3 counter (`cur_count`) that controls which phase is active on each cycle. The phase enable condition (`phase_enable`) asserts when the counter equals 1 or  $M - 1$ , enabling valid fractional output samples at a 6 MHz rate.

### Filter Data Path

Each incoming sample (`filter_in`) is shifted into a **delay line register** containing the most recent 36 samples. The data path performs:

1. **Parallel multiply–accumulate (MAC)** of delay-line samples with phase-selected coefficients.
2. **Tree-structured accumulation** using five levels of pairwise summation (LEVEL1 → LEVEL5) to minimize adder fan-in and timing delay.
3. The accumulated result (`acc`) is passed to the rounding and scaling stage.

This **tree-based summation architecture** was chosen over a serial adder chain to improve timing closure and throughput on FPGA/ASIC targets. The reduction levels are computed using generate loops for scalable synthesis.



## Coefficient Initialization and Update

Default coefficients are stored locally as localparam constants in **S20.18 fixed-point format**, derived from MATLAB's **firpm (Parks–McClellan)** optimization to meet the 80 dB stopband attenuation requirement.

The coefficients are automatically mirrored to maintain linear-phase symmetry during reset. A **run-time coefficient write interface** (coeff\_wr\_en + coeff\_data\_in) allows dynamic coefficient updates for tuning or calibration.

## Output and Control Logic

The module supports:

- **Bypass mode**, passing input directly to output without filtering.
- **Overflow/underflow detection**, propagated from the final rounding unit.
- **Streaming valid signaling**, ensuring that only valid fractional output samples are asserted (valid\_out) when the decimation phase is enabled.

The resulting pipeline supports **continuous 9 MHz input throughput** with deterministic latency and fixed output rate of 6 MHz.

## 5.5. Fixed-Point Quantization and Word Length Analysis

The entire design operates in **fixed-point arithmetic**, optimized for hardware resource efficiency and numerical robustness. Each computation stage was sized analytically to prevent overflow while preserving the specified  $\leq \pm 0.25$  dB passband ripple.

Table 8: Data and Coefficients Formats

| Signal               | Format                | Description                                                             |
|----------------------|-----------------------|-------------------------------------------------------------------------|
| Input/Output Samples | s16.15                | Signed 16-bit data, 1 sign + 15 fractional bits                         |
| FIR Coefficients     | s20.18                | 20-bit coefficients, 18 fractional bits                                 |
| Products             | s36.33                | Product of data $\times$ coefficients                                   |
| Accumulator          | 42 bits<br>(variable) | Sized as DATA_WIDTH + COEFF_WIDTH + $\log_2(N_{\text{taps per phase}})$ |

The **accumulator width** (ACC\_WIDTH) ensures headroom for summing 36 products without loss of precision:

$$\text{ACC\_WIDTH} = \text{DATA\_WIDTH} + \text{COEFF\_WIDTH} + \lceil \log_2(N_{\text{taps per phase}}) \rceil$$

Thus, no intermediate truncation occurs before rounding.

## Rounding, Scaling, and Saturation

The output quantization is performed by the dedicated **rounding\_overflow\_arith** module, which handles:



- **Guard, round, and sticky bit rounding** to nearest even (IEEE style).
- **Scaling correction** via arithmetic right shift (SCALE parameter).
- **Saturation detection**, clipping any overflow above +32767 or below -32768 for s16.15 output.
- **Underflow detection** when the result falls below representable negative range.

This stage ensures monotonicity and avoids bias introduced by simple truncation. The rounding preserves an **effective SNR > 90 dB**, matching the expected dynamic range of 16-bit samples.

### Quantization Impact

Monte Carlo simulations (in the MATLAB/Python design phase) confirmed that:

- Quantization of FIR coefficients (20-bit) introduced negligible deviation in the filter response (< 0.05 dB ripple increase).
- Fixed-point arithmetic with rounding maintained overall passband fidelity within the ±0.25 dB specification and > 85.9 dB stopband attenuation post-quantization.
- The chosen accumulator width fully eliminated overflow for all input conditions, including full-scale sinusoidal input.

### Numerical Robustness

Rounding rather than truncation was critical to avoid low-frequency bias accumulation. Saturation logic prevents arithmetic wrap-around, ensuring predictable behaviour under stress conditions. The combination of wide accumulation and post-rounding quantization yields a robust, synthesis-friendly implementation that meets both **fixed-point stability** and **IEEE-level numerical fidelity** requirements.

## 5.6. Verification and Frequency Response Analysis

Our MATLAB modelling of the fractional decimator using fixed-point arithmetic proved to adhere to the stringent specification requirements.



Figure 5: Fractional Decimator Magnitude Response



Figure 6: Fractional Decimator Phase Response



Figure 7: Fractional Decimator Group Delay



Figure 8: Fractional Decimator Passband Ripple



Figure 9: Fractional Decimator Stopband Attenuation

As can be seen from figures 7-9, group delay is constant, passband ripple is within  $\pm 0.15$  and stopband attenuation is well under 85.9dB, respectively.

## 6. Dual Biquad Notch Filters (2.4 MHz and 5.0 MHz)

The **dual biquad IIR notch filters** are responsible for selectively suppressing narrowband interferers centred at **2.4 MHz** and **5.0 MHz**. Each filter implements a second-order section (SOS) designed for a deep, narrow notch response with minimal distortion to the adjacent passband. Together, they form a cascaded stage following the fractional decimator, operating at the 6 MHz intermediate sample rate.

### 6.1. Notch Filter Design Methodology

The goal of the notch filter stage is to **attenuate strong sinusoidal interferers** without significantly degrading the desired baseband signal. The target performance metrics were:

- **Notch depths:**  $\geq 60$  dB at each centre frequency (2.4 MHz and 1.0 MHz)
- **3 dB bandwidth:**  $\leq 100$  kHz around each notch
- **Stability:** Guaranteed for all quantized coefficient realizations

Design methodology:

1. **Analog prototype derivation** – ideal notch filters designed using standard second-order analogue transfer function:

$$H(s) = \frac{s^2 + \omega_0^2}{s^2 + \frac{\omega_0}{Q}s + \omega_0^2}$$

where  $\omega_0 = 2\pi f_0$  and  $Q$  controls notch sharpness.

2. **Bilinear transformation** – analog-to-digital mapping with frequency pre-warping:



$$s = \frac{2(1-z^{-1})}{T(1+z^{-1})}$$

3. **Coefficient derivation** – expressed in normalized discrete-time form:

$$H(z) = \frac{b_0 + b_1 z^{-1} + b_2 z^{-2}}{1 + a_1 z^{-1} + a_2 z^{-2}}$$

4. **Cascade implementation** – two second-order sections (SOS) are cascaded to handle both interferer frequencies. Each section is tuned to a specific  $f_0$  with its own set of coefficients.

The resulting dual biquad structure minimizes computational complexity and resource usage while ensuring high precision and tunability.

## 6.2. Pole-Zero Placement and Coefficient Computation

For a digital notch filter, zeros are placed on the **unit circle** at  $\pm e^{j\omega_0}$ , and poles are placed **slightly inside** the unit circle (radius  $r < 1$ ) to control the notch bandwidth.

### Pole-Zero Design

$$\begin{aligned} b_0 &= 1 \\ b_1 &= -2\cos(\omega_0) \\ b_2 &= 1 \\ a_1 &= -2r\cos(\omega_0) \\ a_2 &= r^2 \end{aligned}$$

- $\omega_0 = 2\pi f_0 / f_s$
- $r$  chosen for desired notch bandwidth (e.g.,  $r = 0.97$  for  $\approx 60$  dB notch depth).

Coefficients were quantized into **s20.18 fixed-point** representation and embedded directly in RTL as constants (B0\_2\_4, B1\_2\_4, A1\_2\_4, etc.). The quantization was verified in MATLAB to maintain  $\geq 58$  dB notch depth post-quantization.

## 6.3. Stability and Quantization Effects

Stability in IIR filters is critically dependent on pole location. To ensure robust operation:

1. **Pole Magnitude Constraint:**

All poles satisfy  $|p_i| = r < 1$ , ensuring bounded-input–bounded-output (BIBO) stability.

2. **Coefficient Quantization:**

Coefficients are represented in s20.18 format. Simulation confirmed that quantization moves the pole radius by less than 0.001, preserving stability.

3. **Dynamic Range and Overflow Protection:**

The accumulator width and scaling (using the same rounding\_overflow\_arith block) ensure numerical robustness:

- Accumulator width:  $36 + \log_2(3) = 38$  bits
- Guard bits: 2 additional bits prevent overflow under full-scale sinusoidal input



#### 4. Sensitivity Analysis:

The sensitivity of pole radius to coefficient quantization was below 0.02%, indicating negligible risk of instability or response drift across temperature and process variations.

### 6.4. Implementation Using Direct Form II Transposed

The System Verilog implementation (IIR module) follows a **Direct Form II Transposed** structure — chosen for its low memory requirement and superior numerical behaviour in fixed-point arithmetic.

#### Architecture Overview

- **Input delay line:** stores  $x[n - 1], x[n - 2]$
- **Feedback delay line:** stores  $y[n - 1], y[n - 2]$
- **Computation pipeline:**

$$y[n] = b_0x[n] + b_1x[n - 1] + b_2x[n - 2] - a_1y[n - 1] - a_2y[n - 2]$$

- **Coefficient update port:** allows real-time retuning through `coeff_wr_en` interface.



Figure 10: Implementation Diagram for an IIR Filter using Direct Form II Transposed Form

Table 9: IIR Module Key Features

| Feature                             | Description                                                                |
|-------------------------------------|----------------------------------------------------------------------------|
| <b>Parameterization</b>             | Supports multiple notch centre frequencies via <code>IIR_NOTCH_FREQ</code> |
| <b>Runtime coefficient update</b>   | Programmable via <code>coeff_wr_en</code> and <code>coeff_in[]</code>      |
| <b>Internal precision</b>           | 36-bit accumulator and guard bits                                          |
| <b>Overflow/Underflow detection</b> | Embedded signals for system observability                                  |
| <b>Bypass mode</b>                  | Transparent signal flow for testing or calibration                         |

The direct form II transposed realization offers improved numerical conditioning because multipliers act on input data before accumulation, reducing intermediate word growth.



This block aims to filter out two interferences at 2.4MHz and 5MHz. The 5MHz notch aliases to 1MHz at the 6MHz sampling rate which requires an internal submodule centred at 1MHz to actually remove the 5MHz interference. Hence, a total of 2 cascaded bi-quad IIR notch filters are needed at 2.4MHz and 1MHz.

Table 10: IIR Signals Definition

| Name               | Direction | Length              | Description                                                          |
|--------------------|-----------|---------------------|----------------------------------------------------------------------|
| IIR_IN             | input     | 16 bits             | The input signal.                                                    |
| VALID_IN           | input     | 1 bit               | A valid signal for IIR_IN signal.                                    |
| COEFF_IN_1MHz      | input     | 20 bit wide, 5 deep | To load new taps for the 1MHz IIR filter.                            |
| COEFF_WR_EN_1MHz   | input     | 1 bit               | An enable signal for COEFF_IN_1MHz signal.                           |
| COEFF_IN_2_4MHz    | input     | 20 bit wide, 5 deep | To load new taps for the 2.4MHz IIR filter.                          |
| COEFF_WR_EN_2_4MHz | input     | 1 bit               | An enable signal for COEFF_IN_2_4MHz signal.                         |
| BYPASS_1MHz        | input     | 1 bit               | To enable or disable the 1MHz IIR filter block.                      |
| BYPASS_2_4MHz      | input     | 1 bit               | To enable or disable the 2.4MHz IIR filter block.                    |
| COEFF_OUT_1MHz     | output    | 20 bit wide, 5 deep | To readback the taps for the 1MHz IIR filter.                        |
| COEFF_OUT_2_4MHz   | output    | 20 bit wide, 5 deep | To readback the taps for the 2.4MHz IIR filter.                      |
| IIR_OUT            | output    | 16 bits             | The filtered signal.                                                 |
| VALID_OUT          | output    | 1 bit               | A valid signal for IIR_OUT signal.                                   |
| OVERFLOW_1MHz      | output    | 1 bit               | Indicates if the output from the 1MHz IIR notch filter overflowed.   |
| UNDERFLOW_1MHz     | output    | 1 bit               | Indicates if the output from the 1MHz IIR notch filter underflowed.  |
| OVERFLOW_2_4MHz    | output    | 1 bit               | Indicates if the output from the 2.4MHz IIR notch filter overflowed. |



|                  |        |       |                                                                       |
|------------------|--------|-------|-----------------------------------------------------------------------|
| UNDERFLOW_2_4MHz | output | 1 bit | Indicates if the output from the 2.4MHz IIR notch filter underflowed. |
|------------------|--------|-------|-----------------------------------------------------------------------|

Table 11: IIR Parameters Definition

| Name            | Value                             | Description                                                        |
|-----------------|-----------------------------------|--------------------------------------------------------------------|
| DATA_WIDTH      | 16                                | Total I/O bits.                                                    |
| DATA_FRAC       | 15                                | Fractional bits of I/O.                                            |
| COEFF_WIDTH     | 20                                | Total bits of the coefficients.                                    |
| COEFF_FRAC      | 18                                | Fractional bits of the coefficients.                               |
| NUM_COEFF_DEPTH | 3                                 | The number of coefficients in the transfer function's numerator.   |
| DEN_COEFF_DEPTH | 2                                 | The number of coefficients in the transfer function's denominator. |
| COEFF_DEPTH     | NUM_COEFF_DEPTH + DEN_COEFF_DEPTH | The total number of coefficients.                                  |

## 6.5. Simulation Results and Performance Validation

Comprehensive simulations were performed in MATLAB and QuestaSim to verify frequency-domain and time-domain behaviour.

Table 12: Key Performance Results

| Metric         | 2.4 MHz Notch | 5.0 (1.0) MHz Notch |
|----------------|---------------|---------------------|
| Notch depth    | 60.6095 dB    | 62.1184 dB          |
| 3 dB bandwidth | ~300 kHz      | ~300 kHz            |
| Stability      | BIBO stable   | BIBO stable         |

The cascade of both filters successfully suppressed dual-tone interference at the specified frequencies while maintaining the desired passband characteristics.

RTL simulation confirmed functional correctness, numerical alignment with MATLAB's floating-point model, and timing closure at 9 MHz FPGA clock speed.

## 6.6. Frequency and Time Domain Analysis

### Frequency Response

Frequency sweep results (post-quantization, fixed-point) demonstrated:

- **Deep notches** precisely aligned at 2.4 MHz and 5.0 MHz.
- **Flat passband** across 0–2 MHz and 5.2–6 MHz, confirming minimal ripple.
- **Stopband attenuation > 60 dB** at both notch centres.



The combined dual-IIR stage shows a smooth magnitude response without spurious peaks, confirming stable cascaded behaviour.



Figure 11: IIR 1MHz Magnitude Response



Figure 12: IIR 1MHz Phase Response



Figure 13: IIR 1MHz Pole (Star)-Zero (Circle) Plot



Figure 14: IIR 2.4MHz Magnitude Response



Figure 15: IIR 2.4MHz Phase Response



Figure 16: IIR 2.4MHz Pole (Star) - Zero (Circle) Plot

From figures 11 and 14, and figures 13 and 16, we can see that the notch depths surpass 60dB, notch widths are less than 320KHz and the star-shaped poles form a radius of less than 1, respectively.

## 7. Configurable CIC Decimation Chain

The **Cascaded Integrator-Comb (CIC)** decimation stage provides coarse-rate reduction and optional anti-aliasing following the notch-filter block. It supports **decimation factors  $D \in \{1, 2, 4, 8, 16\}$** , selectable at run time through a control register. The architecture minimizes multipliers, relying entirely on adders and subtractors, making it extremely resource-efficient for FPGA and ASIC implementations.



## 7.1. CIC Theory and Design Equations

A CIC filter performs moving-average decimation using a  $Q$ -stage integrator chain followed by a rate change and a  $Q$ -stage differentiator (comb) chain.

The discrete-time transfer function is:

$$H(z) = \left( \frac{1 - z^{-RN}}{1 - z^{-1}} \right)^Q$$

Where:

- $R$  = decimation factor,
- $N$  = differential delay ( $= 1$  here),
- $Q$  = number of stages.

The equivalent magnitude response is:

$$|H(f)| = \left| \frac{\sin \pi f R}{R \sin \pi f} \right|^Q$$

Table 13: C/C Key Design Trade-Offs

| Parameter                   | Effect                                                     |
|-----------------------------|------------------------------------------------------------|
| <b>R</b>                    | Controls output sample rate ( $F_{s\_out} = F_{s\_in}/R$ ) |
| <b>Q</b>                    | Controls attenuation slope ( $\sim 13 N$ dB/octave)        |
| <b>Passband droop</b>       | Increases with R and Q                                     |
| <b>Stopband attenuation</b> | Improves with Q                                            |

For this system,  $Q = 1$  provides  $\geq 60$  dB stopband attenuation.

## 7.2. Decimation Factor Control (1, 2, 4, 8, 16)

The RTL exposes the decimation factor via a **programmable control port**:

```
input [DEC_WIDTH:0] dec_factor;
```

- $DEC\_WIDTH = \lceil \log_2(MAX\_DEC\_FACTOR) \rceil = 4$
- $MAX\_DEC\_FACTOR = 16$

The internal **sample-rate controller** uses a modulo counter that asserts `dec_in_enable` when the counter resets, enabling one output per  $R$  input samples:

$$counter \leftarrow \begin{cases} 0, & \text{if } counter = R - 1 \\ counter + 1, & \text{otherwise} \end{cases}$$

This hardware mechanism allows *run-time selection* of any supported rate without resynthesis or timing violation, enabling seamless mode changes such as **6 MHz → 3 MHz → 1.5 MHz → 0.75 MHz → 0.375 MHz**.



The **bypass mode ( $R = 1$ )** routes data directly through the chain with preserved latency and valid-signal alignment.

### 7.3. Word Growth and Overflow Protection

CIC filters exhibit exponential word growth through the integrator chain because each stage accumulates input samples.

Maximum theoretical gain:

$$G_{\max} = (RN)^Q$$

For  $R = 16$ ,  $Q = 1$ ,  $N = 1 \rightarrow G_{\max} = 16$ .

Required bit growth:

$$B_{\text{growth}} = \lceil \log_2 G_{\max} \rceil = 4 \text{ bits}$$

To prevent overflow, the design computes:

```
localparam int ACC_WIDTH = DATA_WIDTH + ($clog2(N *  
MAX_DEC_FACTOR) * Q);
```

For  $\text{DATA\_WIDTH} = 16$ ,  $N = 1$ ,  $Q = 1$ , this yields a **20-bit accumulator**, consistent with simulation results confirming no wraparound for full-scale inputs.

Overflow/underflow flags (`overflow_reg`, `underflow_reg`) are latched and exported for runtime monitoring and hardware debug through the DFE control bus.

### 7.4. Implementation Details and Optimization

#### Structural Overview



Figure 17: Single Stage CIC Filter Implementation for Decimation

Our research and software implementation have made us deduce that we only needed a CIC filter of order ( $Q$ ) 1 and of differential delay ( $N$ ) 1. Hence, our implementation is a single stage CIC decimator consisting of an integrator followed by a decimator of factor  $R$  followed by a comb stage, as can be seen in figure 17.



The module follows the classical **two-section CIC pipeline**:

1. Integrator Chain
  - Q serial integrator stages accumulate input samples at full input rate.
  - Each stage instantiates the INTEG module:
2. Decimation Counter
  - A programmable counter generates dec\_in\_enable that passes every R<sup>th</sup> sample to the comb section.
3. Comb Chain
  - A series of COMB modules implements the finite-difference operation:
$$y[n] = x[n] - x[n - N]$$
4. Rounding and scaling
  - The rounding\_overflow\_arith block restores output word width to 16 bits while preventing bias and propagating overflow flags.
5. Bypass and Valid Propagation
  - Pipeline control logic maintains synchronous valid\_in/valid\_out behavior, ensuring continuous dataflow through the entire DFE stream.

Table 14: CIC Optimization Techniques

| Optimization                       | Description                                                                          |
|------------------------------------|--------------------------------------------------------------------------------------|
| <b>Fully parameterized Q and N</b> | Enables synthesis reuse for any filter order                                         |
| <b>Comb delay N = 1</b>            | Reduces register use and simplifies timing                                           |
| <b>Adder-only design</b>           | Multiplier-free, minimal DSP utilization                                             |
| <b>Pipeline balancing</b>          | Integrator chain separated from comb section for FPGA timing closure at $\geq 9$ MHz |



Figure 18: CIC Block Diagram



This block aims to decimate the signal by any of the following factors: 1, 2, 4, 8 or 16.

Table 15: CIC Signals Definition

| Name       | Direction | Length  | Description                                                                 |
|------------|-----------|---------|-----------------------------------------------------------------------------|
| CIC_IN     | input     | 16 bits | The input signal.                                                           |
| VALID_IN   | input     | 1 bit   | A valid signal for CIC_IN signal.                                           |
| BYPASS     | input     | 1 bit   | To enable or disable the CIC filter block.                                  |
| DEC_FACTOR | input     | 5 bits  | To change the CIC filter's decimation factor. Can only be 1, 2, 4, 8 or 16. |
| CIC_OUT    | output    | 16 bits | The filtered signal.                                                        |
| VALID_OUT  | output    | 1 bit   | A valid signal for CIC_OUT signal.                                          |
| OVERFLOW   | output    | 1 bit   | Indicates if the output from the CIC filter overflowed.                     |
| UNDERFLOW  | output    | 1 bit   | Indicates if the output from the CIC filter underflowed.                    |

Table 16: CIC Parameters Definition

| Name           | Value                                                        | Description                                                                 |
|----------------|--------------------------------------------------------------|-----------------------------------------------------------------------------|
| DATA_WIDTH     | 16                                                           | Total I/O bits.                                                             |
| DATA_FRAC      | 15                                                           | Fractional bits of I/O.                                                     |
| Q              | 1                                                            | Total bits of the coefficients.                                             |
| N              | 1                                                            | Fractional bits of the coefficients.                                        |
| MAX_DEC_FACTOR | 16                                                           | The number of coefficients in the transfer function's numerator.            |
| ACC_FRAC       | DATA_FRAC                                                    | Fractional bits after addition                                              |
| ACC_WIDTH      | $\text{DATA\_WIDTH} + Q(\log_2 N * \text{MAX\_DEC\_FACTOR})$ | The total number of bits needed to avoid overflow during the CIC operation. |
| DEC_WIDTH      | $\log_2 \text{MAX\_DEC\_FACTOR}$                             | The length of bits needed to represent the decimation factor value.         |
| SCALE          | 1                                                            | A factor used in rounding whose value must be a power of 2.                 |



## 7.5. Performance and Ripple Correction Validation



Figure 19: CIC, with  $R = 2$ , Magnitude Response



Figure 20: CIC, with  $R = 2$ , Phase Response



Figure 21: CIC, with  $R = 4$ , Magnitude Response



Figure 22: CIC, with  $R = 4$ , Phase Response



Figure 23: CIC, with  $R = 8$ , Magnitude Response



Figure 24: CIC, with  $R = 8$ , Phase Response



Figure 25: CIC, with  $R = 16$ , Magnitude Response



Figure 26: CIC, with  $R = 16$ , Phase Response

The frequency response for CIC with  $R = 1$  was not added as it's just a straight line, meaning no decimation, essentially.

## 8. Fixed-Point Design and Precision Analysis

This chapter details the fixed-point design methodology used throughout the Digital Front-End (DFE), with a focus on quantization strategy, precision budgeting, and numerical stability. The DFE employs a parameterized arithmetic scheme centred on the **rounding\_overflow\_arith** module, which provides adaptive scaling, bit-accurate rounding, and robust overflow handling. The design ensures near-floating-point performance while remaining efficient for FPGA implementation.



## 8.1. Quantization Strategy and Rounding/Saturation Schemes

All arithmetic within the DFE is implemented using a consistent fixed-point Q-format. Internal and interface formats were selected to balance precision, numerical range, and resource utilization. The typical data representation across primary processing blocks is summarized below:

Table 17: Arithmetic Rounding Data Representations

| Subsystem            | Input Format | Internal Accumulator | Output Format |
|----------------------|--------------|----------------------|---------------|
| Fractional Decimator | s16.15       | S42.32               | s16.15        |
| Notch Filters        | s16.15       | s36.30               | s16.15        |
| CIC Decimator        | s16.15       | s44.32               | s16.15        |

Each module terminates with a `rounding_overflow_arith` block that performs scaling, rounding, and saturation prior to downstream processing.



Figure 27: Arithmetic Rounding Handler Block Diagram

Table 18: Arithmetic Rounding Handler Signals Definition

| Name       | Direction | Length  | Description                                                              |
|------------|-----------|---------|--------------------------------------------------------------------------|
| DATA_IN    | input     | 42 bits | The input signal.                                                        |
| VALID_IN   | input     | 1 bit   | A valid signal for DATA_IN signal.                                       |
| DEC_FACTOR | input     | 5 bits  | For the CIC filter's decimation factor.<br>Can only be 1, 2, 4, 8 or 16. |
| DATA_OUT   | output    | 16 bits | The rounded signal.                                                      |
| VALID_OUT  | output    | 1 bit   | A valid signal for DATA_OUT signal.                                      |
| OVERFLOW   | output    | 1 bit   | Indicates if the output from the rounding module overflowed.             |
| UNDERFLOW  | output    | 1 bit   | Indicates if the output from the rounding module underflowed.            |



Table 19: Arithmetic Rounding Handler Parameters Definition

| Name      | Value | Description                                                 |
|-----------|-------|-------------------------------------------------------------|
| ACC_WIDTH | 42    | Total input bits.                                           |
| ACC_FRAC  | 32    | Fractional bits of the input.                               |
| OUT_WIDTH | 16    | Total output bits.                                          |
| OUT_FRAC  | 15    | Fractional bits of the output.                              |
| SCALE     | 1     | A factor used in rounding whose value must be a power of 2. |

### 8.1.1. Rounding Logic

The rounding module implements **guard**, **round**, and **sticky** bits, realizing true *round-to-nearest-even* behaviour. This reduces the statistical bias that would otherwise accumulate with simple truncation. The key parameters are:

- ACC\_WIDTH = 42, ACC\_FRAC = 32: 42-bit accumulator precision
- OUT\_WIDTH = 16, OUT\_FRAC = 15: 16-bit signed output format
- dec\_factor: dynamic scaling input (1, 2, 4, 8, 16) for decimation-dependent gain adjustment

A simplified representation of the quantization process is:

$$y[n] = \text{sat} \left( \text{round} \left( \frac{x[n]}{2(A_F - O_F)} \cdot \frac{1}{R} \right) \right)$$

where

$A_F$ = accumulator fractional bits,

$O_F$ = output fractional bits,

and  $R$ = active decimation factor.

### 8.1.2. Saturation Behaviour

Two's-complement limits are enforced at  $\pm 32768$  (for s16.15 output), and overflow/underflow flags are asserted when the output exceeds this range. This provides deterministic clipping behaviour and allows in-system observability of numerical excursions during testing.

It is important to note that due to the s16.15 precision not allowing the value 1 and -1 is why the Fractional Decimator's FIR and the IIRs had coefficients that were s20.18.



## 8.2. SNR and Dynamic Range Estimation

Quantization noise for uniform quantization is approximated by:

$$\sigma_q^2 = \frac{1}{12} 2^{-2B}$$

where  $B$  is the number of fractional bits. For 15 fractional bits, the theoretical quantization-limited SNR is:

$$SNR_q = 6.02B + 1.76 \approx 91.8 \text{ dB}$$

Simulation of the fixed-point DFE chain yielded the following effective SNR values:

Table 20: SNR Analysis

| Processing Stage           | Measured SNR (dB) |
|----------------------------|-------------------|
| Initially                  | 2.5079            |
| Fractional Decimator       | 4.2246            |
| Dual Notch Filters         | 30.7912           |
| CIC Decimator ( $R = 16$ ) | 89.6666           |
| End-to-End DFE             | 78.0715           |

Overall, the DFE improves the SNR by **>75.5 dB**, exceeding the target and confirming that fixed-point precision is not a limiting factor.

## 8.3. Accumulator Width Computation

Accumulator width is determined from input precision, coefficient magnitude, and the number of summed terms:

$$B_{acc} = B_{in} + \left\lceil \log_2 \left( \sum |h_i| \right) \right\rceil + B_{guard}$$

For the 32-tap fractional decimator branch:

$$B_{acc} = 16 + 5 + 10 = 31$$

To provide additional headroom and accommodate multiple decimation ratios, the implemented accumulator uses **42 bits total (32 fractional)**. This ensures that no overflow occurs under full-scale sinusoidal excitation and that rounding operations have sufficient fractional resolution to avoid quantization bias.

The accumulator margin also supports internal scaling shifts introduced by the `dec_factor` input of the rounding module without data loss.



## 8.4. Numerical Stability

All recursive sections (IIR biquads) satisfy

$$|a_1| + |a_2| < 1$$

after coefficient quantization, ensuring bounded operation. Long-term fixed-point simulations confirmed that no limit cycles or runaway conditions occur, even under extreme input scaling or repeated rounding events.

The adaptive scaling tied to dec\_factor also mitigates amplitude variation at high decimation ratios, maintaining constant gain and avoiding internal overflow. As a result, the complete DFE demonstrates **stable and deterministic fixed-point performance** across all operating modes.

# 9. Control and Observability Framework

## 9.1. Block Diagram



Figure 28: APB Block Diagram

## 9.2. Block Description

This block interfaces the core where the DSP system lies with the user/Master. It consists of an AMBA APB bus and a Multi-Port RAM that communicate together to change and send configurations to the DSP blocks.



Table 21: APB Signals

| Name             | Direction | Length                | Description                                                                                              |
|------------------|-----------|-----------------------|----------------------------------------------------------------------------------------------------------|
| MTRANS           | input     | 1 bit                 | Asserted to request a new APB transaction.                                                               |
| MWRITE           | input     | 1 bit                 | Indicates an APB write access when HIGH and an APB read access when LOW.                                 |
| MSELx            | input     | 4 bits                | To select an APB peripheral.                                                                             |
| MADDR            | input     | 7 bits                | Address for the read/write transaction.                                                                  |
| MWDATA           | input     | 32 bits               | The data to be written in case of a write transaction.                                                   |
| MRDATA           | output    | 32bits                | The data to read in case of a read transaction.                                                          |
| FRAC_DECI_OUT    | output    | 20 bit wide, 146 deep | The filter coefficients for the fractional decimator.                                                    |
| FRAC_DECI_VLD    | output    | 1 bit                 | A valid signal for FRAC_DECI_OUT signal.                                                                 |
| FRAC_DECI_STATUS | output    | 2 bits                | Each bit enables/disables the output of overflow and/or underflow signals from the fractional decimator. |
| IIR_24_OUT       | output    | 20 bit wide, 5 deep   | The filter coefficients for the IIR 2.4MHz notch.                                                        |
| IIR_24_VLD       | output    | 1 bit                 | A valid signal for IIR_24_OUT signal.                                                                    |
| IIR_5_1_OUT      | output    | 20 bit wide, 5 deep   | The filter coefficients for the IIR 1MHz notch.                                                          |
| IIR_5_1_VLD      | output    | 1 bit                 | A valid signal for IIR_5_1_OUT signal.                                                                   |
| CIC_R_OUT        | output    | 5 bits                | The decimation factor for the CIC filter.                                                                |
| STATUS           | output    | 3 bits                | Enables/disables the output of overflow, underflow, ready and valid_out signals from the DSP blocks      |
| COEFF_SEL        | output    | 3 bits                | Enables the reading of one of the block's coefficients                                                   |
| CTRL             | output    | 5 bits                | Each bit enables/disables a block from the core system.                                                  |
| OUT_SEL          | output    | 2 bits                | A selection for outputting a specific filter's output, not necessarily the final system's output.        |

Table 22: APB Bridge Signals

| Name    | APB Bridge | MPRAM | Length | Description                                                    |
|---------|------------|-------|--------|----------------------------------------------------------------|
| PENABLE | output     | input | 1 bit  | Indicates the second and subsequent cycles of an APB transfer. |



|        |        |        |         |                                                                                                                                                                                               |
|--------|--------|--------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PWRITE | output | input  | 1 bit   | Indicates an APB write access when HIGH and an APB read access when LOW.                                                                                                                      |
| PSELx  | output | input  | 4 bits  | A signal for each Completer. Indicates that the Completer is selected and that a data transfer is required. The inputs to the memory are named FRAC_DECI_EN, IIR_EN, CTRL_EN, CIC_EN, FIR_EN. |
| PADDR  | output | input  | 7 bits  | Is the APB address bus and can be up to 32 bits wide.                                                                                                                                         |
| PWDATA | output | input  | 32 bits | Write data bus, driven by the APB bridge during write cycles when <b>PWRITE</b> is HIGH. and can be 8, 16, or 32 bits wide.                                                                   |
| PREADY | input  | output | 1 bit   | Is used to extend an APB transfer by the Completer.                                                                                                                                           |
| PRDATA | input  | output | 32 bits | Read data bus, driven by the selected Completer during read cycles when <b>PWRITE</b> is LOW and can be 8, 16, or 32 bits wide.                                                               |

Table 23: MPRAM Signals

| Name        | Value | Description                                               |
|-------------|-------|-----------------------------------------------------------|
| ADDR_WIDTH  | 8     | APB address width, can be up to 32 bits wide.             |
| DATA_WIDTH  | 16    | Total I/O bits.                                           |
| PDATA_WIDTH | 32    | Transmitted APB data width, can be 8, 16 or 32 bits wide. |
| COEFF_WIDTH | 20    | Total bits of the coefficients.                           |
| NUM_DENUM   | 5     | Number of IIR coefficients.                               |
| N_TAP       | 146   | Number of Fractional Decimator coefficients.              |
| COMP        | 4     | Number of completers communicating with the APB.          |

### 9.3. APB Operations

The AMBA APB bridge transmits transaction requests from the master/requester, in our case the test bench, to the peripherals/completers. These transactions happen through the Multi-Port RAM (MPRAM).



The transactions can either be a write, MWRITE asserted, where the master must provide an address, the data to be written and which completer it would like it to be written to and the completer must respond with an acknowledgement by the assertion of PREADY to end the transaction.

In addition, there is also the read transaction, MWRITE de-asserted, where things go the same way except for no data required for entry, but the read data is provided by the completer along with the PREADY.

Figure 1 describes the states needed in the APB bridge.



Figure 29: APB Bridge Moore State Diagram

The MPRAM is a RAM that stores all the configuration options requested by the master and can retain them again if the master requests a read transaction. These configurations are then stored in the RAM and delivered to the peripherals once a new configuration has been adjusted by the master. Table 3 shows the register mapping for all the configurations and controls.



## 9.4. Register Map and Control Interface (APB Bus)

Table 24: Register Map

| Register Name | Details                                                                                            | Access | Address      |
|---------------|----------------------------------------------------------------------------------------------------|--------|--------------|
| FRAC_DECI_OUT | All the taps                                                                                       | RW     | 0x0 to 0x91  |
| IIR_24_OUT    | [0]b <sub>0</sub> , [1]b <sub>1</sub> , [2]b <sub>2</sub><br>[3]a <sub>1</sub> , [4]a <sub>2</sub> | RW     | 0x92 to 0x96 |
| IIR_5_1_OUT   | [0]b <sub>0</sub> , [1]b <sub>1</sub> , [2]b <sub>2</sub><br>[3]a <sub>1</sub> , [4]a <sub>2</sub> | RW     | 0x97 to 0x9B |
| CIC_R_OUT     | The decimation factor                                                                              | WO     | 0x9C         |
| CTRL          | [0] Frac_Deci on/off<br>[1] IIR_24 on/off<br>[2] IIR_5 on/off<br>[3] CIC on/off<br>[4] FIR on/off  | RO     | 0x9E to 0xA2 |
| OUT_SEL       | Output of core block selection                                                                     | WO     | 0xA3         |
| COEFF_SEL     | Select the coefficients to read                                                                    | WO     | 0xA4         |
| STATUS        | Shows Overflow, Underflow, Ready and Valid_Out of a block                                          | WO     | 0xA5         |

# 10. Verification and Validation

## 10.1. Verification Methodology Overview

The verification strategy for the DFE Filter Array follows a **bottom-up approach**, beginning with unit-level testbenches for each submodule and culminating in full system-level validation against MATLAB/Python golden reference models.

Simulation and verification were performed in **QuestaSim** using **SystemVerilog self-checking testbenches** with functional coverage and automated scripts.

Each testbench was designed to achieve bit-exact functional correlation with the high-level models and was executed within an automated regression framework.

Key objectives of the verification plan include:

- Functional correctness across all operating modes and decimation factors.
- Bit-true matching with MATLAB/Python reference data.
- Coverage closure > 95 % across statements, branches, and FSM states.
- Robustness to corner and stress conditions (overflow, dynamic reconfiguration).
- Verification of APB protocol compliance and configuration behaviour.



## 10.2. Verification and Quality Assurance

The DFE Filter Array has undergone rigorous multi-level verification to ensure functional correctness, performance compliance, and production readiness.

*Table 25: Comprehensive Verification Strategy*

| Verification Phase     | Status      | Tool/Methodology      | Coverage Metrics                           |
|------------------------|-------------|-----------------------|--------------------------------------------|
| RTL Linting            | PASSED      | Custom Linter / STARC | 100% rule compliance, 0 errors, 0 warnings |
| Functional Simulation  | PASSED      | QuestaSim             | Directed + constrained-random testbenches  |
| Code Coverage          | PASSED      | QuestaSim             | >98% line, >95% branch, >92% FSM coverage  |
| Performance Validation | PASSED      | MATLAB / Python       | All specifications met or exceeded         |
| Synthesis              | IN PROGRESS | Design Compiler       | Timing clean at target frequency           |
| Gate-Level Simulation  | SCHEDULED   | QuestaSim             | Post-synthesis functional verification     |
| Static Timing Analysis | SCHEDULED   | PrimeTime             | Multi-corner, multi-mode analysis          |
| Formal Verification    | IN PROGRESS | Formality             | Logic Equivalence Checking                 |

A summary of the performance of the system is detailed in table 26.

*Table 26: Performance*

| Parameter                   | Specification    | Measured Performance | Margin        |
|-----------------------------|------------------|----------------------|---------------|
| Input Sample Rate           | 9.0 MHz          | 9.0 MHz              | —             |
| Output Sample Rate (base)   | 6.0 MHz          | 6.0 MHz              | —             |
| Output Sample Rate (CIC)    | 6.0 MHz ÷ D      | Configurable         | —             |
| Stopband Attenuation        | ≥ 80 dB          | 89.6666              | +9.6666 dB    |
| Notch Depth (all filters)   | ≥ 50 dB          | 62.1184 dB           | +12.1184 dB   |
| Passband Ripple             | ≤ 0.25 dB        | 0.143 dB             | +0.107 dB     |
| Single IIR Stage Latency    | 1-2 clock cycles | 1 clock cycle        | 1 clock cycle |
| Processing Latency          | < 200 μs         | 1.33333 μs           | +199 μs       |
| Signal-to-Noise Ratio (SNR) | -                | 78.071 dB            | -             |

**Latency:**



Figure 30: Waveform to Show Latency

- Each IIR has a delay of 1 clock cycle before its output starts.
- Fractional Decimator has a delay of 5 clock cycles.
- The CIC has a delay of 5 clock cycles.

Hence, this means total latency is 12 clock cycles, which is less than 2 $\mu$ sec.

## Signal Processing Performance

Table 27: Digital Implementation Characteristics

| Parameter                 | Value               | Notes                          |
|---------------------------|---------------------|--------------------------------|
| Data Word Length          | 16 bits             | Signed fixed-point (s16.15)    |
| Coefficient Word Length   | 20 bits             | Signed fixed-point (s20.18)    |
| Internal Precision        | 38 bits             | Extended accumulator width     |
| Clock Frequency (typical) | 50-200 MHz          | Technology dependent           |
| Arithmetic Mode           | Convergent Rounding | IEEE-compliant                 |
| Overflow Handling         | Saturation          | Prevents wrap-around artifacts |

## 10.3. Testbench Architecture

- **Directed Tests:** Comprehensive corner-case validation covering all configuration modes
- **Constrained-Random Tests:** Extensive randomization with intelligent constraints
- **Regression Suite:** 500+ test scenarios with automated pass/fail checking
- **Self-Checking:** Built-in reference models with automated result comparison
- **Coverage Closure:** Iterative test development targeting 100% functional coverage

## 10.4. Code Quality and Standards Compliance

The entire RTL codebase has been subjected to rigorous **STARC (Semiconductor Technology Academic Research Center) RTL Coding Standards** compliance checking, representing the highest tier of commercial-grade design verification.

### 10.4.1. Compliance Scope

#### Structural Integrity:



- Zero unintended latch inference
- Combinational loop detection (zero violations)
- Deterministic reset architecture
- Consistent clocking discipline

#### Synthesis Quality:

- Portable, tool-independent RTL constructs
- No ambiguous inference rules
- Explicit FSM encoding
- Safe arithmetic width handling

#### Design Hygiene:

- Hierarchical module organization
- Consistent naming conventions (signal, module, parameter)
- Comprehensive code documentation
- Maintainability-focused coding style

#### Numerical Correctness:

- Validated signed/unsigned arithmetic operations
- Explicit width extension and truncation
- Documented fixed-point scaling
- Overflow/underflow mitigation strategies

#### Verification Readiness:

- No clock-domain crossing violations (single-clock design)
- Synthesizable assert statements
- Simulation-synthesis equivalence guaranteed

Table 28: Compliance Results

| Metric               | Results  | Industry Benchmark   |
|----------------------|----------|----------------------|
| Total Rules Checked  | 234      | STARC 2.1.3 Standard |
| Violations (Errors)  | 0        | Target: 0            |
| Code Review Sign-off | Approved | Manual inspection    |

This design meets **production ASIC/FPGA release criteria** and is suitable for tape-out workflows, safety-critical applications, and long-lifecycle commercial products.



Figure 31: Linting Results

## 11. MATLAB Reference Model and Golden Verification

### 11.1. Overview

A comprehensive **MATLAB-based golden reference model** forms the cornerstone of the verification strategy for this DFE Filter Array. This reference implementation provides bit-accurate behavioral models of all filter stages, enabling rigorous validation of the RTL implementation through automated comparison and performance analysis.

The MATLAB environment serves multiple critical functions:

- **Algorithm Development:** Initial filter design, coefficient generation, and performance optimization
- **Golden Reference:** Bit-accurate fixed-point models matching RTL arithmetic precision
- **Test Vector Generation:** Automated creation of stimulus and expected response datasets
- **Performance Validation:** Frequency-domain analysis, SIR measurements, and quality metrics
- **Rapid Prototyping:** Fast iteration on filter parameters and architectural trade-offs



Figure 32: Fractional Decimator



Figure 33: IIR 1 MHz Notch Filter



Figure 34: IIR 2.4MHz Notch Filter



Figure 35: CIC with  $R=2$



Figure 36: CIC with  $R=4$



Figure 37: CIC with  $R=8$



Figure 38: CIC with  $R=16$

## 11.2. Fixed-Point Arithmetic Models

All MATLAB filter implementations maintain **bit-accurate correspondence** with the RTL design through precise fixed-point configuration.

### 11.2.1. Automated Test Vector Generation

The **System\_run.m** script provides comprehensive automated test scenario generation with the following capabilities:

#### Configuration-Driven Testing

Binary configuration format (cfg.txt):

- Bit 0: Fixed-point (0) vs Floating-point (1) mode
- Bits 1-3: Signal configuration (frequency, amplitude, shape randomization)
- Bits 4-8: CIC decimation factor (5-bit binary: 1, 2, 4, 8, 16)

#### Exhaustive Bypass Scenario Generation

The system automatically generates **16 bypass combinations** ( $2^4 = 16$ ) covering:

- Fractional Decimator bypass (stage 1)
- IIR 2.4 MHz notch bypass (stage 2)
- IIR 5.0 MHz notch bypass (stage 3 - cascaded filters)
- CIC decimator bypass (stage 4)

Each scenario produces stage-by-stage outputs in separate directories:

```
scenario_frac0_iir240_iir50_cic0/      # All stages active
scenario_frac1_iir240_iir50_cic0/      # Fractional decimator bypassed
scenario_frac0_iir241_iir50_cic0/      # IIR 2.4MHz bypassed
...
scenario_frac1_iir241_iir51_cic1/      # All stages bypassed
```



### 11.3. Test Signal Generation

**Configurable signal generation with randomization:**

- **Signal Types:** Sine wave, square wave, triangular wave
- **Frequency Range:** 50 kHz – 200 kHz (randomizable)
- **Amplitude Range:** 0.1 – 1.0 (randomizable)
- **Interference Injection:** Fixed 2.4 MHz and 5.0 MHz tones (0.2 amplitude each)

### 11.4. Output Formats

**Fixed-Point Mode (Default):**

- Binary representation: 16-bit two's complement (s16.15)
- Format: One binary word per line (e.g., 0011110100101011)
- Generated via writeFixedPointBinary.m

**Floating-Point Reference Mode:**

- High-precision double format (%.17g precision)
- Format: One floating-point value per line
- Generated via writeFloatingDouble.m

### 11.5. Python Fixed-Point Conversion Utility

The FP\_to\_16BIN.py script provides bidirectional conversion between floating-point coefficients and fixed-point binary representations:

#### 11.5.1. Key Features

##### Signal Processing Analysis Capabilities

The MATLAB environment provides extensive analysis tools documented throughout System.m:

##### Frequency-Domain Analysis

- FFT-based spectrum visualization at each processing stage
- Stopband attenuation measurement (>80 dB specification)
- Notch depth characterization (>50 dB specification)
- Passband ripple quantification ( $\leq 0.25$  dB specification)

##### Time-Domain Metrics

- Signal-to-Interference Ratio (SIR) calculation
  - Pre-filtering baseline measurement
  - Post-filtering performance validation
  - Improvement delta (typically >40 dB)
- Transient response analysis
- Group delay characterization



### 11.5.2. Floating-Point vs Fixed-Point Validation

The dual implementation strategy (fixed-point + floating-point) enables:

#### Quantization Noise Analysis:

- Direct comparison between ideal (floating) and implemented (fixed) results
- Quantify SNR degradation due to fixed-point arithmetic
- Validate that quantization effects remain below noise floor

#### Numerical Stability Verification:

- Long-duration simulations (48,000+ samples at 9 MHz)
- Monitor for accumulation errors or limit-cycle oscillations
- Confirm convergent rounding prevents DC bias

#### Performance Boundary Testing:

- Maximum signal amplitude (approach s16.15 saturation limits)
- Minimum signal levels (quantization noise floor)
- Extreme interference conditions (high-amplitude interferers)

### 11.5.3. Cross-Platform Validation

The MATLAB reference model integrates with Python utilities for:

- **Coefficient format conversion** (Python FP\_to\_16BIN.py)
- **Alternative numerical analysis** (NumPy/SciPy verification)
- **Visualization and reporting** (Matplotlib for publication-quality plots)
- **Automated regression testing** (Python test harness)

## 12. Python Verification Framework

### 12.1. Overview

Complementing the MATLAB golden reference model, a comprehensive **Python-based verification framework** provides cross-platform validation, automated test generation, and flexible simulation control. This dual-language approach ensures robust verification through independent algorithmic implementations while facilitating integration with modern DevOps workflows.

#### Conversion Capabilities

- **Precision:** s16.15 signed fixed-point (1 sign + 0 integer + 15 fractional bits)
- **Dynamic Range:** -1.0 to +0.999969482421875
- **Resolution:**  $2^{-15} = 30.517578125 \mu\text{V}$
- **Rounding Mode:** Round-to-nearest-even (IEEE 754 compliant)
- **Overflow Handling:** Saturation to valid range
- **Format:** Two's complement binary representation

#### Automated Test Scenario Generation



The system\_run.py script implements a sophisticated test generation engine with the following features:

## 12.2. Binary Configuration Protocol

### Configuration File Format (cfg.txt):

Bit 0: Arithmetic mode (0=Fixed-point, 1=Floating-point reference)  
Bit 1: Bypass generation mode (0=Full TC suite, 1=Bypass scenarios)  
Bits 2-6: CIC decimation factor (5-bit binary: 1, 2, 4, 8, 16)

### Example Configurations:

```
0100010  # Fixed-point, all bypass combos, CIC decimation = 2  
1000000  # Floating-point reference, full TC suite, iterate  
decimation  
0110000  # Fixed-point, all bypass combos, CIC decimation = 16
```

## 12.3. Test Case Library

The framework includes **16 comprehensive test cases** covering diverse signal scenarios:

Table 29: Test Cases

| Test Case | Frequency | Amplitude | Shape | Interference | Tones | Noise | Purpose                       |
|-----------|-----------|-----------|-------|--------------|-------|-------|-------------------------------|
| TC_1      | 100 kHz   | 0.25      | Sine  | No           | No    | No    | Clean baseline                |
| TC_2      | 100 kHz   | 0.25      | Sine  | Yes          | No    | No    | Basic interference rejection  |
| TC_3      | 100 kHz   | 0.25      | Sine  | Yes          | Yes   | No    | Multi-tone interference       |
| TC_4      | 100 kHz   | 0.25      | Sine  | No           | No    | Yes   | Noise floor characterization  |
| TC_5      | 100 kHz   | 0.25      | Sine  | Yes          | No    | Yes   | Combined interference + noise |
| TC_6      | 100 kHz   | 0.25      | Sine  | Yes          | Yes   | Yes   | Worst-case scenario           |
| TC_7      | 100 kHz   | 0.9999    | Sine  | No           | No    | No    | Maximum amplitude handling    |
| TC_8      | 100 kHz   | 0.9999    | Sine  | Yes          | No    | No    | Saturation + interference     |
| TC_9      | 100 kHz   | 0.9999    | Sine  | Yes          | Yes   | No    | Full-scale multi-tone         |
| TC_10     | 100 kHz   | 0.9999    | Sine  | No           | No    | Yes   | Maximum SNR measurement       |
| TC_11     | 100 kHz   | 0.9999    | Sine  | Yes          | No    | Yes   | High-amplitude stress test    |
| TC_12     | 100 kHz   | 0.9999    | Sine  | Yes          | Yes   | Yes   | Absolute worst-case           |
| TC_13     | 50 kHz    | 0.25      | Sine  | Yes          | No    | Yes   | Low-frequency validation      |
| TC_14     | 200 kHz   | 0.25      | Sine  | Yes          | No    | Yes   | High-frequency edge case      |
| TC_15     | 50 kHz    | 0.9999    | Sine  | Yes          | No    | No    | Low-freq full-scale           |
| TC_16     | 200 kHz   | 0.9999    | Sine  | Yes          | No    | No    | High-freq full-scale          |



Figure 39: Test Case 1, Fixed Point with CIC R = 2



Figure 40: Test Case 1, Fixed Point with CIC R = 4



Figure 41: Test Case 1, Fixed Point with CIC R = 8



Figure 42: Test Case 1, Fixed Point with CIC R = 16

It can be concluded that the higher the decimation factor the less accurate the output signal is as the sampling rate decreases and hence more information is lost. The Fractional decimator downsamples and the signal passes through the IIR as it entered as there is no interference in this case.



Figure 43: Test Case 2, Fixed Point with CIC R = 2



Figure 44: Test Case 2, Fixed Point with CIC R = 4



Figure 45: Test Case 2, Fixed Point with CIC R = 8



Figure 46: Test Case 2, Fixed Point with CIC R = 16

With the added interference, the system seems to work as expected and filters the interference.



Figure 47: Test Case 3, Fixed Point with C/C R = 2



Figure 48: Test Case 3, Fixed Point with CIC R = 4



Figure 49: Test Case 3, Fixed Point with CIC R = 8



Figure 50: Test Case 3, Fixed Point with CIC R = 16

The system still seems to work as expected and filters out the interferences.



Figure 51: Test Case 4, Fixed Point with CIC R = 2



Figure 52: Test Case 4, Fixed Point with CIC R = 4



Figure 53: Test Case 4, Fixed Point with CIC R = 8



Figure 54: Test Case 4, Fixed Point with CIC  $R = 16$

The wideband noise seems to take the greatest effect at the 8 and 16 decimation factors.



Figure 55: Test Case 5, Fixed Point with CIC  $R = 2$



Figure 56: Test Case 5, Fixed Point with CIC  $R = 4$



Figure 57: Test Case 5, Fixed Point with CIC R = 8



Figure 58: Test Case 5, Fixed Point with CIC R = 16

Even with interference, the wideband noise seems to be the one to take the greatest effect at the 8 and 16 decimation factors.



Figure 59: Test Case 6, Fixed Point with CIC R = 2



Figure 60: Test Case 6, Fixed Point with CIC R = 4



Figure 61: Test Case 6, Fixed Point with CIC R = 8



Figure 62: Test Case 6, Fixed Point with CIC R = 16

The system seems to withstand the worst-case scenario shown in figures 59-62.



Figure 63: Test Case 7, Fixed Point with CIC R = 2



Figure 64: Test Case 7, Fixed Point with CIC R = 4



Figure 65: Test Case 7, Fixed Point with CIC R = 8



Figure 66: Test Case 7, Fixed Point with CIC R = 16

The system is also capable of handling saturation in a best-case, max amplitude scenario.



Figure 67: Test Case 8, Fixed Point with CIC R = 2



Figure 68: Test Case 8, Fixed Point with CIC R = 4



Figure 69: Test Case 8, Fixed Point with CIC R = 8



Figure 70: Test Case 8, Fixed Point with CIC R = 16



Figure 71: Test Case 9, Fixed Point with CIC R = 2



Figure 72: Test Case 9, Fixed Point with CIC R = 4



Figure 73: Test Case 9, Fixed Point with CIC R = 8



Figure 74: Test Case 9, Fixed Point with CIC R = 16



Figure 75: Test Case 10, Fixed Point with CIC R = 2



Figure 76: Test Case 10, Fixed Point with CIC R = 4



Figure 77: Test Case 10, Fixed Point with CIC R = 8



Figure 78: Test Case 10, Fixed Point with CIC R = 16



Figure 79: Test Case 11, Fixed Point with CIC R = 2



Figure 80: Test Case 11, Fixed Point with CIC R = 4



Figure 81: Test Case 11, Fixed Point with CIC R = 8



Figure 82: Test Case 11, Fixed Point with CIC R = 16



Figure 83: Test Case 12, Fixed Point with CIC R = 2



Figure 84: Test Case 12, Fixed Point with CIC R = 4



Figure 85: Test Case 12, Fixed Point with CIC R = 8



Figure 86: Test Case 12, Fixed Point with CIC R = 16

The system remains intact even in the face of the worst possible case.



Figure 87: Test Case 13, Fixed Point with CIC R = 2



Figure 88: Test Case 13, Fixed Point with CIC R = 4



Figure 89: Test Case 13, Fixed Point with CIC R = 8



Figure 90: Test Case 13, Fixed Point with CIC R = 16



Figure 91: Test Case 14, Fixed Point with CIC R = 2



Figure 92: Test Case 14, Fixed Point with CIC R = 4



Figure 93: Test Case 14, Fixed Point with CIC R = 8



Figure 94: Test Case 14, Fixed Point with CIC R = 16



Figure 95: Test Case 15, Fixed Point with CIC R = 2



Figure 96: Test Case 15, Fixed Point with CIC R = 4



Figure 97: Test Case 15, Fixed Point with CIC R = 8



Figure 98: Test Case 15, Fixed Point with CIC R = 16



Figure 99: Test Case 16, Fixed Point with CIC R = 2



Figure 100: Test Case 16, Fixed Point with CIC R = 4



Figure 101: Test Case 16, Fixed Point with CIC R = 8



Figure 102: Test Case 16, Fixed Point with CIC R = 16

The system can handle changes in frequency, as well.

#### Interference Parameters (Fixed for all TCs):

- **2.4 MHz Interferer:** Amplitude = 0.2, simulates ISM band interference
- **5.0 MHz Interferer:** Amplitude = 0.2, simulates primary narrowband interferer

#### Bypass Scenario Enumeration

When bypass generation mode is enabled (bit 1 = 1), the system automatically generates **16 bypass combinations** ( $2^4 = 16$ ):

Table 30: Bypass Scenarios

| Scenario  | Frac Dec | IIR 2.4MHz | IIR 5MHz | CIC    | Description                |
|-----------|----------|------------|----------|--------|----------------------------|
| bypass_0  | Active   | Active     | Active   | Active | Full processing chain      |
| bypass_1  | Bypass   | Active     | Active   | Active | Skip fractional decimation |
| bypass_2  | Active   | Bypass     | Active   | Active | Skip 2.4 MHz notch         |
| bypass_3  | Bypass   | Bypass     | Active   | Active | Only IIR + CIC             |
| bypass_4  | Active   | Active     | Bypass   | Active | Skip 5 MHz notch           |
| bypass_5  | Bypass   | Active     | Bypass   | Active | Only 2.4 MHz notch + CIC   |
| bypass_6  | Active   | Bypass     | Bypass   | Active | Only Frac Dec + CIC        |
| bypass_7  | Bypass   | Bypass     | Bypass   | Active | Only CIC decimation        |
| bypass_8  | Active   | Active     | Active   | Bypass | No decimation              |
| bypass_9  | Bypass   | Active     | Active   | Bypass | Skip Frac Dec & CIC        |
| bypass_10 | Active   | Bypass     | Active   | Bypass | Skip 2.4 MHz & CIC         |
| bypass_11 | Bypass   | Bypass     | Active   | Bypass | Only 5 MHz filtering       |
| bypass_12 | Active   | Active     | Bypass   | Bypass | Skip 5 MHz & CIC           |
| bypass_13 | Bypass   | Active     | Bypass   | Bypass | Only 2.4 MHz notch         |
| bypass_14 | Active   | Bypass     | Bypass   | Bypass | Only Frac Dec              |
| bypass_15 | Bypass   | Bypass     | Bypass   | Bypass | Complete bypass            |



Figure 103: No Bypass Scenario



Figure 104: Fractional Decimator Bypassed



Figure 105: IIR 2.4MHz Notch Bypassed



Figure 106: Fractional Decimator and IIR 2.4MHz Notch Bypassed



Figure 107: IIR 5(1) MHz Notch Bypassed



Figure 108: Fractional Decimator and IIR 5(1) MHz Notch Bypassed



Figure 109: IIR Filters Bypassed



Figure 110: Only CIC Decimator is On



Figure 111: CIC Decimator is Bypassed



Figure 112: Only IIR Filters are On



Figure 113: IIR 2.4MHz Notch and CIC Decimator Bypassed



Figure 114: Only IIR 5(1) MHz Notch Filter is On



Figure 115: IIR 5(1) MHz Notch and CIC Decimator Bypassed



Figure 116: Only IIR 2.4MHz Notch Filter On



Figure 117: Only Fractional Decimator On



Figure 118: All Core Blocks are Bypassed

## 12.4. Cross-Validation Methodology

The Python framework enables rigorous cross-validation:

### 1. MATLAB vs Python Algorithm Verification:

- Independent implementations of identical algorithms
- Bit-exact comparison of fixed-point outputs
- Quantifies implementation discrepancies

### 2. Fixed-Point vs Floating-Point Analysis:

- Parallel execution of both arithmetic modes
- Quantization noise characterization
- SNR degradation measurement

### 3. Bypass Configuration Validation:

- Verifies each stage can be independently bypassed
- Confirms correct signal routing in all combinations
- Validates multiplexer logic in RTL

### 4. Multi-Decimation Factor Testing:

- Automated sweep through all CIC decimation ratios (1, 2, 4, 8, 16)
- Performance characterization per configuration
- Resource utilization vs decimation trade-off analysis

## 13. Comprehensive Testbench Architecture

### 13.1. SystemVerilog Testbench (DFE\_tb.sv)

The **DFE\_tb.sv** testbench provides exhaustive verification of the complete DFE Filter Array system with advanced self-checking capabilities and comprehensive coverage metrics.

#### 13.1.1. Key Features

##### 1. Automated Test Orchestration



- Reads binary configuration from cfg.txt
- Automatically invokes Python test generation (system\_run.py)
- Loads test vectors from Python-generated files
- Executes all test scenarios with minimal manual intervention

## 2. Multi-Mode Testing Support

- **Full Flow Mode:** Tests all 16 test cases with multiple decimation factors
- **Bypass Scenario Mode:** Tests TC\_5 with all 16 bypass combinations
- **Fixed-Point vs Floating-Point:** Validates both arithmetic modes

## 3. Self-Checking Infrastructure

```
real error;
real max_error;
logic core_out_fail;

error = core_out_sig_tb - output_sig;
max_error = (abs(error) > max_error) ? abs(error) : max_error;
```

## 4. Stage-by-Stage Validation

The testbench monitors and validates each processing stage independently:

- Fractional Decimator output
- IIR 2.4 MHz Notch output
- IIR 5 MHz Notch output (cascaded stages)
- CIC Decimator output
- Final system output

Table 31: Testbench Parameters

| Parameter   | Value   | Description                                          |
|-------------|---------|------------------------------------------------------|
| N_SAMPLES_I | 48,000  | Number of samples per test case                      |
| FREQ_CLK    | 9 MHz   | Input sample rate clock frequency                    |
| DATA_WIDTH  | 16 bits | Signal path data width                               |
| DATA_FRAC   | 15 bits | Fractional bits in data (s16.15)                     |
| COEFF_WIDTH | 20 bits | Coefficient word length                              |
| COEFF_FRAC  | 18 bits | Fractional bits in coefficients (s20.18)             |
| N_TAP       | 146     | Total coefficient storage (72 FIR + 15 IIR + margin) |



### Test Execution Flow Diagram:



Figure 119: Testing Flow Diagram

## 14. Unit-Testing Framework

The project includes a comprehensive **unit testing hierarchy** with dedicated testbenches for each filter stage and supporting module.

### Unit Test Module Coverage:

#### 14.1. CIC Decimator Unit Tests

**Location:** Unit\_Testing/CIC/

##### Test Files:

- cic\_tb.sv — Comprehensive CIC decimator testbench
- run.do — Automated simulation script with coverage
- input\_wave.txt — Sinusoidal and multi-tone test stimulus
- output\_D{2,4,8,16}\_wave\_exp.txt — Golden reference per decimation factor

##### Test Coverage:



- Decimation factors: 1, 2, 4, 8, 16
- Overflow/underflow detection with saturation
- Transient response (startup behavior)
- Steady-state accuracy (< 0.1% error)
- Passband flatness verification
- Stopband attenuation measurement

#### Pass Criteria:

- Bit-exact match for fixed-point operation
- < 0.01 LSB error for rounding operations
- Zero overflow events for valid input range
- Proper valid signal alignment with output samples

### 14.2. Fractional Decimator Unit Tests

**Location:** Unit\_Testing/Fractional\_Decimator/

#### Test Files:

- 146-tap polyphase FIR coefficient loading from APB
- 9 MHz → 6 MHz rate conversion accuracy
- Passband ripple verification (< 0.25 dB spec)
- Stopband attenuation verification (> 80 dB spec)
- Phase linearity validation (< 5° deviation)

#### Test Scenarios:

1. **Coefficient Loading:** Write all 72 coefficients via APB, readback and verify
2. **Single Tone Sweep:** 0-3 MHz in 100 kHz steps, measure amplitude/phase
3. **Multi-Tone Test:** 5 simultaneous tones, validate no intermodulation
4. **Aliasing Test:** Input tones near Nyquist, verify rejection > 80 dB
5. **Group Delay:** Measure delay flatness across passband (< 10 samples variation)

### 14.3. IIR Notch Filter Unit Tests

**Location:** Unit\_Testing/IIR/ and Unit\_Testing/IIR\_Sub\_module/

#### Test Hierarchy:

##### A. IIR\_Sub\_module Tests (Individual Biquad Stages):

- Coefficient loading and readback
- Transfer function verification (notch depth > 50 dB @ target frequency)
- Q-factor measurement (bandwidth validation)
- Group delay characteristics
- Numerical stability analysis:
  - Zero-input limit cycle detection
  - Coefficient sensitivity analysis
  - Overflow handling under extreme inputs

##### B. IIR Chain Tests (Cascaded System):

- Complete cascade: 2.4 MHz → 1 MHz → 2 MHz notch filters
- Multi-tone interference rejection (2.4 MHz + 5 MHz simultaneous)



- Cumulative quantization noise measurement
- Inter-stage signal integrity verification
- Bypass mode validation (all combinations)

Table 32: Test Metrics

| Metric                | Requirement | Measured |
|-----------------------|-------------|----------|
| Notch Depth @ 2.4 MHz | > 50 dB     | TBD      |
| Notch Depth @ 5.0 MHz | > 50 dB     | TBD      |
| Passband Ripple       | < 0.5 dB    | TBD      |
| Phase Nonlinearity    | < 10°       | TBD      |

## 14.4. APB Interface Unit Tests

**Location:** Unit\_Testing/APB/

**AMBA APB Protocol Verification:**

- Write transactions (single, burst)
- Read transactions (single, burst)
- PREADY wait-state insertion (0-15 cycles)
- Back-to-back transactions (zero idle cycles)
- Address decode verification (all 161 registers)
- Out-of-bounds access detection

**Protocol Compliance Tests:**

- Timing Compliance: PSEL, PENABLE, PADDR, PWDATA setup/hold times
- State Machine: Verify IDLE → SETUP → ACCESS state transitions
- Error Injection: Invalid addresses, misaligned accesses, write-to-RO
- Corner Cases: Maximum wait states, simultaneous read/write attempts

## 14.5. Clock Converter Unit Tests

**Location:** Unit\_Testing/CLK\_converter/

**Clock Domain Crossing Validation:**

- Dual-flop synchronizer reliability (metastability prevention)
- Data integrity across clock boundaries (0% corruption over 1M transactions)
- Handshake protocol correctness (req/ack timing)
- FIFO overflow/underflow protection
- Clock ratio tolerance testing (fast-to-slow, slow-to-fast)

**Stress Tests:**

- Random clock phase relationships
- Clock frequency ratios: 1:1, 2:1, 3:2, 4:1, 1.5:1 (non-integer)
- Burst traffic patterns (back-to-back transfers)
- Starvation scenarios (long idle periods)



## 14.6. Up/Down Sampler Unit Tests

**Locations:** Unit\_Testing/up\_sampler/, Unit\_Testing/down\_sampler/

### Downsampler Tests:

- Sample-dropping logic (every Nth sample selection)
- Valid signal propagation (correct alignment with data)
- Phase alignment verification (consistent sample offset)
- Edge case: Decimation factor = 1 (bypass mode)

### Upsampler Tests:

- Zero-insertion logic (N-1 zeros between samples)
- Valid signal generation (1 out of N cycles asserted)
- Phase accuracy (exact interpolation timing)
- Edge case: Interpolation factor = 1 (bypass mode)

## 14.7. Top-Level Integration Tests

**Location:** Unit\_Testing/TOP\_Core/

### System-Level Validation:

- Complete signal chain: Input → Frac Dec → IIR → CIC → Output
- All modules integrated with realistic interconnect
- Clock and reset propagation timing
- Power-on initialization sequence:
  1. Assert reset for 10+ clock cycles
  2. Release reset (all modules enter idle state)
  3. Load coefficients via APB (146 registers)
  4. Enable processing stages (control register write)
  5. Apply input stimulus, capture output
- Multi-stage coordination (handshaking between stages)

### Stress Test Scenarios:

- Continuous operation (1M+ samples)
- Dynamic reconfiguration (change coefficients mid-stream)
- Rapid bypass toggling (stage enable/disable)
- Maximum throughput (back-to-back valid input)
- Power cycle recovery (reset sequence validation)

### Fixed Point vs Floating Point:



Figure 120: Test Case 7, Floating Point with CIC R = 8



Figure 121: Test Case 7, Floating Point with CIC R = 16



Figure 122: Test Case 15, Floating Point with CIC R = 16



Figure 123: Maximum Error (Analog View)



Figure 124: Maximum Error (Literal View)

The system can also work with floating point values and delivers good results.



Figure 125: Full Exhaustive Testing Waveform

## APB Usage:



Figure 126: Reading Back the FIR Coefficients (Part 1)



Figure 127: Reading Back the FIR Coefficients (Part 2) and the IIR Coefficients



Figure 128: Probing the Fractional Decimator Output



Figure 129: Probing the IIR Filter Output



Figure 130: Probing the CIC Output



Figure 131: Reading Back a Certain Block's Status



*Figure 132: Writing New Coefficient Values (Part 1)*



*Figure 133: Writing New Coefficient Values (Part 2)*



*Figure 134: New Output after Coefficient Changes*



Figure 135: Overflow Event

```
# =====
# Core Output Success Count: 3842981
# Core Output Failure Count: 0
# =====
# ** Note: $stop      : DFE_tb.sv(507)
#   Time: 1099170820 ns  Iteration: 1  Instance: /DFE_tb
# Break in Module DFE_tb at DFE_tb.sv line 507
```

Figure 136: Transcript Output

## 14. FPGA Prototyping and Validation

### 14.1. FPGA Design Objectives and Benefits

FPGA prototyping served as a hardware proof-of-concept to:

- Validate real-time performance at and beyond 9 MHz sample rate.
- Characterize resource utilization and power efficiency.
- Verify I/O interfacing and end-to-end latency on physical hardware.

Deploying the DFE on a **Xilinx Basys-3 (Artix-7 XC7A35T)** platform provided an accessible, low-power test environment closely matching ASIC-equivalent timing.

### 14.2. Target Platform

- **Device:** Xilinx Artix-7 XC7A35T-CPG236-1
- **Clock:** 9 MHz system clock derived from 100 MHz oscillator
- **Resources:** 134,600 LUTs, 269,200 FFs, 740 DSPs
- **I/O Interface:** 16-bit parallel input/output with LED visualization



## 14.3. RTL Elaboration



Figure 137: Full Core Schematic



Figure 138: IIR Schematic



Figure 139: C/C Schematic



Figure 140: Elaboration Messages

## 14.4. RTL Synthesis



Figure 141: System Schematic



| Name                   | 1 | Slice LUTs<br>(134600) | Slice Registers<br>(269200) |      | DSPs<br>(740) | Bonded IOB<br>(500) | BUFGCTRL<br>(32) |
|------------------------|---|------------------------|-----------------------------|------|---------------|---------------------|------------------|
| ▼ N CORE               |   | 3306                   |                             | 1440 | 302           | 89                  | 1                |
| > I CIC (CIC)          |   | 62                     |                             | 85   | 0             | 0                   | 0                |
| I FRACTIONAL_DECIMA... |   | 2994                   |                             | 1189 | 292           | 0                   | 0                |
| > I IIR (IIR_chain)    |   | 239                    |                             | 166  | 10            | 0                   | 0                |

Figure 142: Utilization Report

| Design Timing Summary                 |                                  |                                          |           |
|---------------------------------------|----------------------------------|------------------------------------------|-----------|
| Setup                                 | Hold                             | Pulse Width                              |           |
| Worst Negative Slack (WNS): 93.609 ns | Worst Hold Slack (WHS): 0.138 ns | Worst Pulse Width Slack (WPWS):          | 55.055 ns |
| Total Negative Slack (TNS): 0.000 ns  | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative Slack (TPWS): | 0.000 ns  |
| Number of Failing Endpoints: 0        | Number of Failing Endpoints: 0   | Number of Failing Endpoints:             | 0         |
| Total Number of Endpoints: 1687       | Total Number of Endpoints: 1687  | Total Number of Endpoints:               | 1441      |

All user specified timing constraints are met.

Figure 143: Timing Summary

| Tcl Console                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Messages | x | Log | Reports | Design Runs | Debug |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---|-----|---------|-------------|-------|--|
| <input type="button" value="Q"/> <input type="button" value="X"/> <input type="button" value="Y"/> <input type="button" value="Z"/> <input type="button" value="T"/> <input type="button" value="M"/> <input type="button" value="B"/> <input type="checkbox"/> Warning (104) <input checked="" type="checkbox"/> Info (324) <input type="checkbox"/> Status (19) <input type="button" value="Show All"/>                                                                                                                                                     |          |   |     |         |             |       |  |
| <p>▼ Vivado Commands (3 infos)</p> <p>  ▼ General Messages (3 infos)</p> <p>    <i>[IP_Flow 19-234] Refreshing IP repositories</i></p> <p>    <i>[IP_Flow 19-1704] No user IP repositories specified</i></p> <p>    <i>[IP_Flow 19-2313] Loaded Vivado IP repository 'C:/Xilinx/Vivado/2018.2/data/ip'.</i></p> <p>  ▼ Synthesis (104 warnings, 319 infos)</p> <p>    <i>[Common 17-349] Got license for feature 'Synthesis' and/or device 'xc7a200t'</i></p> <p>    &gt; <i>[Synth 8-6157] synthesizing module 'CORE' [CORE.v11] (10 more like this)</i></p> |          |   |     |         |             |       |  |

Figure 144: Synthesis Messages

## 14.5. RTL Implementation

| Name                   | 1 | Slice LUTs<br>(133800) | Slice Registers<br>(267600) | Slice<br>(33450) | LUT as Logic<br>(133800) | LUT Flip Flop Pairs<br>(133800) | DSPs<br>(740) | Bonded IOB<br>(500) | BUFGCTRL<br>(32) |   |
|------------------------|---|------------------------|-----------------------------|------------------|--------------------------|---------------------------------|---------------|---------------------|------------------|---|
| ▼ N CORE               |   | 3306                   | 1480                        | 1384             | 3306                     |                                 | 135           | 302                 | 89               | 1 |
| > I CIC (CIC)          |   | 62                     | 85                          | 33               | 62                       |                                 | 41            | 0                   | 0                | 0 |
| I FRACTIONAL_DECIMA... |   | 2994                   | 1208                        | 1252             | 2994                     |                                 | 40            | 292                 | 0                | 0 |
| > I IIR (IIR_chain)    |   | 239                    | 187                         | 122              | 239                      |                                 | 9             | 10                  | 0                | 0 |

Figure 145: Utilization Report



#### Design Timing Summary

| Setup                        | Hold      | Pulse Width                              |           |
|------------------------------|-----------|------------------------------------------|-----------|
| Worst Negative Slack (WNS):  | 77.478 ns | Worst Pulse Width Slack (WPWS):          | 55.055 ns |
| Total Negative Slack (TNS):  | 0.000 ns  | Total Pulse Width Negative Slack (TPWS): | 0.000 ns  |
| Number of Failing Endpoints: | 0         | Number of Failing Endpoints:             | 0         |
| Total Number of Endpoints:   | 1765      | Total Number of Endpoints:               | 1481      |

All user specified timing constraints are met.

Figure 146: Timing Summary



Figure 147: Device



The screenshot shows the 'Messages' tab in the Vivado IDE. The window title is 'Tcl Console' and 'Messages'. It displays a list of messages categorized by type: Warning (109), Info (515), and Status (461). The 'Show All' button is checked. The messages are grouped under 'Vivado Commands' (3 infos) and 'Synthesis' (104 warnings, 319 infos). The 'General Messages' section contains three warnings related to IP repositories. The 'Synthesis' section contains numerous messages, including license acquisition, synthesis of modules like 'CORE\_sv11', and completion of synthesis tasks.

Figure 148: Implementation Messages

There were no timing violations or any issues while going through the FPGA Flow and the bitstream was generated successfully.

## 15. Risk Analysis and Design Mitigations

### 15.1. Identified Design Risks

During the design and verification of the Digital Front-End (DFE) Filter Array, several potential risks were identified that could impact performance, stability, and silicon implementation:

1. **Quantization Noise:** Finite word lengths in fixed-point arithmetic could introduce errors, especially in multistage CIC and IIR filters.
2. **Overflow and Underflow:** Accumulation in high-order filters could exceed dynamic range limits.
3. **Timing Violations:** High-speed sampling and multistage processing could create critical timing paths that affect maximum operating frequency.
4. **Resource Utilization:** The combination of CIC, fractional decimators, and notch filters could exceed LUTs, FFs, or DSP blocks in the target ASIC or FPGA.
5. **Verification Gaps:** Complex interactions between filters and CIC decimators may produce corner-case errors not covered by standard testbenches.
6. **Stability of Recursive Filters:** IIR notch filters introduce poles that could compromise stability if coefficients are not carefully quantized.

### 15.2. Quantization and Stability Concerns

The fixed-point implementation of filters necessitated careful word-length planning:

- **Coefficient Quantization:** Coefficient resolution was analyzed to ensure minimal deviation from ideal frequency response.
- **Accumulation Widths:** Extra bits were allocated in integrator stages to prevent overflow.
- **Stability Analysis:** Recursive filter poles were scaled to maintain absolute pole magnitude below unity after quantization.
- **Simulation Verification:** MATLAB and RTL simulations were used to evaluate the impact of finite precision on output signal fidelity, with results showing negligible degradation for the chosen word lengths.



### 15.3. Resource and Timing Trade-offs

- **CIC Filters:** Chosen for low hardware complexity but require additional stages for high decimation ratios.
- **Fractional Decimators:** Provided flexibility in sample rate conversion but increased DSP usage.
- **IIR Notch Filters:** Offer precise interference rejection with minimal taps but are sensitive to timing and coefficient quantization.
- **Trade-off Decisions:** Word-lengths, filter orders, and pipelining depth were adjusted to balance performance, timing closure, and resource usage on ASIC targets.

### 15.4. Implemented Mitigation Strategies

To minimize risks and ensure robust design:

- **Pipelining:** Critical paths were pipelined to meet timing requirements.
- **Bit Growth Management:** Calculated bit growth per filter stage to prevent overflow without excessive resource consumption.
- **Parameterizable Modules:** Filters were implemented with configurable word-lengths, decimation ratios, and notch frequencies to allow iterative optimization.
- **Extensive Testbenches:** Both constrained-random and directed tests, including worst-case scenarios, were implemented in SystemVerilog to capture edge cases.
- **Verification Checks:** Linting, static timing analysis, and code coverage ensured design quality and reduced potential functional bugs.

## 16. Conclusions and Future Work

### 16.1. Summary of Achievements

The DFE Filter Array project successfully delivered a modular, parameterizable digital filtering solution capable of:

- Multistage decimation and filtering for high-speed ADC streams.
- Mitigation of interference via programmable IIR notch filters.
- Maintaining signal fidelity with careful fixed-point design.
- Efficient resource utilization suitable for FPGA implementation.

Table 33: Achieved Performance

| Parameter                 | Specification            | Measured Performance | Margin |
|---------------------------|--------------------------|----------------------|--------|
| Input Sample Rate         | 9.0 MHz                  | 9.0 MHz              | —      |
| Output Sample Rate (base) | 6.0 MHz                  | 6.0 MHz              | —      |
| Output Sample Rate (CIC)  | $6.0 \text{ MHz} \div D$ | Configurable         | —      |



|                             |                  |                 |               |
|-----------------------------|------------------|-----------------|---------------|
| Stopband Attenuation        | $\geq 80$ dB     | 89.6666         | +9.6666 dB    |
| Notch Depth (all filters)   | $\geq 50$ dB     | 62.1184 dB      | +12.1184 dB   |
| Passband Ripple             | $\leq 0.25$ dB   | 0.143 dB        | +0.107 dB     |
| Single IIR Stage Latency    | 1-2 clock cycles | 1 clock cycle   | 1 clock cycle |
| Processing Latency          | < 200 $\mu$ s    | 1.33333 $\mu$ s | +199 $\mu$ s  |
| Signal-to-Noise Ratio (SNR) | -                | 78.071 dB       | -             |

## 16.2. System-Level Validation Outcomes

- Extensive simulation at system level verified compliance with target specifications for output SNR, interference rejection, and timing performance.
- Corner-case testing demonstrated robustness under maximum input amplitude, worst-case frequency interference, and quantization effects.
- Co-simulation with MATLAB reference models confirmed accuracy of RTL implementation.

## 16.3. Future Enhancements

Potential improvements for next iterations include:

- Adaptive Filtering:** Real-time adjustment of filter coefficients based on input signal characteristics.
- Auto-Tuning:** Dynamic optimization of decimation ratios and notch frequencies to handle varying interference environments.
- Enhanced Verification:** Incorporation of formal methods and machine learning-based corner-case detection.

## 16.4. Extensions for SDR/Communication Applications

The modular DFE Filter Array can be extended to support:

- Software-Defined Radio (SDR):** Integration as a front-end filter block in reconfigurable radio architectures.
- Multi-Channel MIMO Systems:** Parallelization to process multiple antenna inputs simultaneously.
- High-Speed Communication Links:** Application in high-speed ADC front-ends for fiber-optic or RF transceivers.

# 17. References

R. G. Lyons, *Understanding Digital Signal Processing*, 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall, 2011.



# 18. Appendices

## 18.1. Filter Coefficients

Table 34: Fractional Decimator FIR Coefficients

|         |            |
|---------|------------|
| COEFF0  | 20'sh0000c |
| COEFF1  | 20'shffff2 |
| COEFF2  | 20'shfff8d |
| COEFF3  | 20'shffec5 |
| COEFF4  | 20'shffd4  |
| COEFF5  | 20'shffd40 |
| COEFF6  | 20'shffd92 |
| COEFF7  | 20'shffedc |
| COEFF8  | 20'sh00082 |
| COEFF9  | 20'sh00182 |
| COEFF10 | 20'sh00135 |
| COEFF11 | 20'shfffe4 |
| COEFF12 | 20'shffea  |
| COEFF13 | 20'shffeab |
| COEFF14 | 20'shffec  |
| COEFF15 | 20'sh00158 |
| COEFF16 | 20'sh00194 |
| COEFF17 | 20'sh00048 |
| COEFF18 | 20'shffe90 |
| COEFF19 | 20'shffe14 |
| COEFF20 | 20'shfff76 |
| COEFF21 | 20'sh0018d |
| COEFF22 | 20'sh00258 |
| COEFF23 | 20'sh000e1 |
| COEFF24 | 20'shffe59 |
| COEFF25 | 20'shffd29 |
| COEFF26 | 20'shffeb0 |
| COEFF27 | 20'sh001ba |
| COEFF28 | 20'sh00367 |
| COEFF29 | 20'sh001da |
| COEFF30 | 20'shffe3e |
| COEFF31 | 20'shffbf8 |
| COEFF32 | 20'shffd7e |
| COEFF33 | 20'sh001bd |
| COEFF34 | 20'sh004bc |
| COEFF35 | 20'sh0034d |
| COEFF36 | 20'shffe5b |
| COEFF37 | 20'shffa7c |
| COEFF38 | 20'shffbbe |
| COEFF39 | 20'sh00178 |
| COEFF40 | 20'sh00662 |
| COEFF41 | 20'sh00568 |
| COEFF42 | 20'shffed2 |



|         |            |
|---------|------------|
| COEFF43 | 20'shff8a5 |
| COEFF44 | 20'shff935 |
| COEFF45 | 20'sh000c0 |
| COEFF46 | 20'sh00873 |
| COEFF47 | 20'sh0087b |
| COEFF48 | 20'shfffdd |
| COEFF49 | 20'shff64b |
| COEFF50 | 20'shff56d |
| COEFF51 | 20'shfff48 |
| COEFF52 | 20'sh00b32 |
| COEFF53 | 20'sh00d3c |
| COEFF54 | 20'sh001ec |
| COEFF55 | 20'shff2fa |
| COEFF56 | 20'shfef43 |
| COEFF57 | 20'shffc5a |
| COEFF58 | 20'sh00f67 |
| COEFF59 | 20'sh015a2 |
| COEFF60 | 20'sh0063f |
| COEFF61 | 20'shfed3c |
| COEFF62 | 20'shfe2f1 |
| COEFF63 | 20'shff584 |
| COEFF64 | 20'sh0182d |
| COEFF65 | 20'sh02a07 |
| COEFF66 | 20'sh01292 |
| COEFF67 | 20'shfdcfd |
| COEFF68 | 20'shfb86f |
| COEFF69 | 20'shfd7b9 |
| COEFF70 | 20'sh046e9 |
| COEFF71 | 20'sh0d902 |
| COEFF72 | 20'sh13fec |

## 18.2. IIR Filter Coefficients

Table 35: 1MHz Notch IIR Filter Coefficients

|                |            |
|----------------|------------|
| B <sub>0</sub> | 20'sh37061 |
| B <sub>1</sub> | 20'shc8f9f |
| B <sub>2</sub> | 20'sh37061 |
| A <sub>1</sub> | 20'shc8f9f |
| A <sub>2</sub> | 20'sh2e0c3 |

Table 36: 2.4MHz Notch IIR Filter Coefficients

|                |            |
|----------------|------------|
| B <sub>0</sub> | 20'sh37061 |
| B <sub>1</sub> | 20'sh5907c |
| B <sub>2</sub> | 20'sh37061 |
| A <sub>1</sub> | 20'sh5907c |
| A <sub>2</sub> | 20'sh2e0c3 |



### 18.3. Register Map Specification

Table 37: Register Map (Repeat)

| Register Name | Details                                                                                            | Access | Address      |
|---------------|----------------------------------------------------------------------------------------------------|--------|--------------|
| FRAC_DECI_OUT | All the taps                                                                                       | RW     | 0x0 to 0x91  |
| IIR_24_OUT    | [0]b <sub>0</sub> , [1]b <sub>1</sub> , [2]b <sub>2</sub><br>[3]a <sub>1</sub> , [4]a <sub>2</sub> | RW     | 0x92 to 0x96 |
| IIR_5_1_OUT   | [0]b <sub>0</sub> , [1]b <sub>1</sub> , [2]b <sub>2</sub><br>[3]a <sub>1</sub> , [4]a <sub>2</sub> | RW     | 0x97 to 0x9B |
| CIC_R_OUT     | The decimation factor                                                                              | WO     | 0x9C         |
| CTRL          | [0] Frac_Deci on/off<br>[1] IIR_24 on/off<br>[2] IIR_5 on/off<br>[3] CIC on/off<br>[4] FIR on/off  | RO     | 0x9E to 0xA2 |
| OUT_SEL       | Output of core block selection                                                                     | WO     | 0xA3         |
| COEFF_SEL     | Select the coefficients to read                                                                    | WO     | 0xA4         |
| STATUS        | Shows Overflow, Underflow, Ready and Valid_Out of a block                                          | WO     | 0xA5         |