

# Comprehensive Deployment Roadmap: Multi-Modal Gunshot Detection on FPGA

This document outlines a structured, 6-week learning and implementation plan for deploying the CNN14 + BiLSTM + Attention gunshot detection (GSD) model onto a Field-Programmable Gate Array (FPGA), integrated with a 6-axis microphone array and a thermal camera for multi-modal localization.

The roadmap is divided into three major phases: Foundational Preparation, Core ML/DSP Acceleration, and System Integration, ensuring a systematic build-up of knowledge and hardware functionality.

## Phase I: Foundational Preparation (Weeks 1–2)

This phase establishes proficiency in Vitis High-Level Synthesis (HLS), the core tool for custom hardware development, and initiates the preparation of the CNN component using the Vitis AI framework.

### Week 1: Vitis HLS Fundamentals and CNN Preparation

The primary focus is mastering the HLS workflow, optimization pragmas, and preparing the CNN14 feature extractor for the Deep Learning Processor Unit (DPU).

| Day | Topic Focus | Key Learning Objectives | Free Resource / Tutorial |
|-----|-------------|-------------------------|--------------------------|
|     |             |                         |                          |

|              |                                           |                                                                                                                                                                                        |                                                                                                                                                                      |
|--------------|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>Day 1</b> | <b>Vitis HLS Introduction</b>             | Understand the HLS flow (C/C++ to RTL), C-simulation, and C-synthesis. <sup>1</sup> Learn when and why to use HLS for hardware acceleration. <sup>4</sup>                              | BYU Computing Bootcamp Vitis HLS <sup>3</sup> , Webinar: HLS - What Is It and When Do You Use It? <sup>4</sup> , Vitis HLS Tutorials (Getting Started). <sup>5</sup> |
| <b>Day 2</b> | <b>HLS Data Types &amp; Pragma Basics</b> | Implement <b>fixed-point arithmetic</b> (ap_fixed) for efficiency, essential for all ML and DSP kernels. <sup>6</sup> Learn and apply primary optimization pragmas (PIPELINE, UNROLL). | Vitis HLS User Guide Tutorials: Data Types, Loops Primer. <sup>6</sup>                                                                                               |
| <b>Day 3</b> | <b>Array Management in HLS</b>            | Map software arrays to hardware Block RAMs (BRAMs). Implement <b>Array Partitioning</b> (complete, block) for parallel weight access, critical for BiLSTM. <sup>6</sup>                | Vitis HLS Introductory Examples: Array Partitioning. <sup>7</sup>                                                                                                    |
| <b>Day 4</b> | <b>Vitis AI Framework &amp; DPU</b>       | Overview of the Vitis AI framework, DPU architecture, and the flow for deploying neural networks (CNN14). <sup>8</sup>                                                                 | Introduction to Vitis AI (Video) <sup>9</sup> , Vitis AI Playlist. <sup>8</sup>                                                                                      |
| <b>Day 5</b> | <b>Vitis AI Model Quantization</b>        | Start the process of <b>quantizing</b> the CNN14 model to                                                                                                                              | Vitis AI Playlist: Quantization/Deep Dive. <sup>8</sup>                                                                                                              |

|  |  |                                                                                                                                                           |  |
|--|--|-----------------------------------------------------------------------------------------------------------------------------------------------------------|--|
|  |  | <p>the fixed-point format required by the DPU.<sup>11</sup></p> <p>Understand how Vitis AI Optimizer prepares the model for compilation.<sup>11</sup></p> |  |
|--|--|-----------------------------------------------------------------------------------------------------------------------------------------------------------|--|

## Week 2: Acoustic Signal Chain (PDM Demodulation)

This phase focuses on the first custom hardware kernel: processing the raw audio data from the 6-axis microphone array using highly parallelized PDM demodulation.

| Day   | Topic Focus                          | Key Learning Objectives                                                                                                                                                         | Free Resource / Tutorial                                      |
|-------|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| Day 6 | <b>PDM Demodulation Theory</b>       | Understand Pulse Density Modulation (PDM) and the necessity of the Cascaded Integrator-Comb (CIC) filter chain for conversion to PCM audio. <sup>12</sup>                       | Research on CIC Decimation Filter Architecture. <sup>14</sup> |
| Day 7 | <b>HLS CIC Filter Implementation</b> | Implement a single-channel CIC filter C++ function using <b>fixed-point data types</b> . Use the <code>#pragma HLS PIPELINE</code> directive for high throughput. <sup>16</sup> | HLS Matched Filter/Decimation Filter Tutorials. <sup>16</sup> |
| Day 8 | <b>Parallel Dataflow</b>             | Implement the                                                                                                                                                                   | Vitis HLS User                                                |

|               |                                          |                                                                                                                                                                                                           |                                                                  |
|---------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|
|               |                                          | architecture for <b>six parallel CIC filters</b> concurrently using the #pragma HLS DATAFLOW pragma. This is essential for phase coherence in the 6-axis array. <sup>6</sup>                              | Guide Tutorials:<br>Dataflow Paradigm. <sup>6</sup>              |
| <b>Day 9</b>  | <b>C/RTL Co-simulation</b>               | Verify the CIC filter bank's functional correctness using C-simulation and then confirm the generated RTL's timing and behavior using Co-simulation. <sup>18</sup>                                        | Vitis HLS User Guide Tutorials (RTL Co-simulation). <sup>6</sup> |
| <b>Day 10</b> | <b>IP Packaging &amp; AXI Interfaces</b> | Package the CIC filter bank as a reusable IP core for Vivado. <sup>19</sup> Define the I/O using the <b>AXI4-Stream</b> protocol for high-speed data flow and <b>AXI4-Lite</b> for control. <sup>20</sup> | Vitis HLS IP Integration Tutorial. <sup>20</sup>                 |

## Phase II: Core ML and DSP Acceleration (Weeks 3–4)

This phase accelerates the complex ML classification logic (BiLSTM-Attention) and the core DSP localization engine (DOA).

### Week 3: BiLSTM-Attention Layer Acceleration

Optimize the recurrent neural network component using custom HLS techniques to minimize the inherent latency of sequential models.

| <b>Day</b>    | <b>Topic Focus</b>                     | <b>Key Learning Objectives</b>                                                                                                                                                                                             | <b>Free Resource / Tutorial</b>                            |
|---------------|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| <b>Day 11</b> | <b>BiLSTM Structure in C++</b>         | Review the mathematical structure of the LSTM cell (gates, recurrence). Begin writing the unoptimized BiLSTM C++ implementation. <sup>22</sup>                                                                             | Open-source FPGA-based LSTM HLS C++ Example. <sup>22</sup> |
| <b>Day 12</b> | <b>BiLSTM Optimization: Pipelining</b> | Apply aggressive <code>#pragma HLS PIPELINE II=1</code> to the loops inside the LSTM gate computations to maximize parallelism within each sequential time step. <sup>6</sup>                                              | Vitis HLS User Guide: Pipelining Loops. <sup>6</sup>       |
| <b>Day 13</b> | <b>BiLSTM Optimization: Memory</b>     | Apply Array Partitioning directives to the BiLSTM's weight and bias matrices (e.g., <code>\$W_{ih}</code> , <code>W_{hh}\$</code> ) to enable parallel reading from BRAMs, supporting pipelined computations. <sup>6</sup> | Vitis HLS User Guide: Array Partitioning. <sup>6</sup>     |

|               |                                            |                                                                                                                                                                                                                            |                                                                             |
|---------------|--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| <b>Day 14</b> | <b>Attention Mechanism: Matrix Math</b>    | Implement the core matrix-vector multiplications required for the attention mechanism's scoring function. <sup>24</sup> Optimize these operations using loop unrolling and pipelining for GEMM acceleration. <sup>25</sup> | HLS Matrix Multiplication/GEMM tutorials. <sup>25</sup>                     |
| <b>Day 15</b> | <b>Attention Mechanism: Softmax/Output</b> | Implement the <b>Softmax</b> function using optimized fixed-point methods (e.g., CORDIC/LUTs). <sup>27</sup> Package the combined BiLSTM-Attention as an HLS IP core. <sup>20</sup>                                        | Vitis HLS Tutorials (Beamformer Analysis for DSP techniques). <sup>24</sup> |

## Week 4: Spatial Processing and DPU Compilation

Develop the Direction of Arrival (DOA) kernel and finalize the preparation of the CNN component for the target FPGA.

| Day           | Topic Focus                                 | Key Learning Objectives                                                        | Free Resource / Tutorial                                  |
|---------------|---------------------------------------------|--------------------------------------------------------------------------------|-----------------------------------------------------------|
| <b>Day 16</b> | <b>DOA Algorithm Selection &amp; Theory</b> | Select a suitable real-time algorithm like Beamscan or GCC-PHAT. <sup>28</sup> | Research: Beamscan and GCC-PHAT algorithms. <sup>28</sup> |

|               |                                     |                                                                                                                                                                                                  |                                                    |
|---------------|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|
|               |                                     | Understand the core DSP requirements (FFT, matrix operations). <sup>30</sup>                                                                                                                     |                                                    |
| <b>Day 17</b> | <b>DOA HLS: FFT Implementation</b>  | Implement a fixed-point Fast Fourier Transform (FFT) kernel in HLS, a core component of both DOA algorithms. Focus on achieving optimal pipeline Initiation Interval (II). <sup>6</sup>          | Vitis HLS Introductory Examples: FFT. <sup>7</sup> |
| <b>Day 18</b> | <b>DOA HLS: Matrix Operations</b>   | Implement the high-level matrix/vector mathematics (e.g., spatial covariance estimation) required by the DOA algorithm, applying optimization techniques from the Attention layer. <sup>25</sup> | HLS Matrix Multiplication Tutorials. <sup>25</sup> |
| <b>Day 19</b> | <b>Vitis AI Compilation (VAI_C)</b> | Use the Vitis AI Compiler (VAI_C) to compile the quantized CNN14 model into the DPU executable binary (XMODEL) for the target platform. <sup>32</sup>                                            | Vitis AI Tutorials: DPU Integration. <sup>32</sup> |
| <b>Day 20</b> | <b>Final IP Core</b>                | Perform final                                                                                                                                                                                    | Vitis HLS IP                                       |

|  |                   |                                                                                                                                                                                    |                                 |
|--|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|
|  | <b>Generation</b> | synthesis and IP packaging for the DOA Engine HLS core. Ensure all custom IP cores (CIC, BiLSTM, DOA) have robust AXI4-Stream data and AXI4-Lite control interfaces. <sup>20</sup> | Packaging Review. <sup>20</sup> |
|--|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|

## Phase III: System Integration and Finalization (Weeks 5–6)

The final phase brings all the accelerated kernels together into a single hardware design, followed by software development for control, and multi-modal sensor fusion.

### Week 5: Vivado System Integration

Assemble all hardware components in the Vivado Block Design and generate the final hardware file (bitstream).

| Day    | Topic Focus                 | Key Learning Objectives                                                                                          | Free Resource / Tutorial                                        |
|--------|-----------------------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| Day 21 | <b>Vivado Project Setup</b> | Create the Vivado project, instantiate the target Zynq Processing System (PS), and configure the memory systems. | Implementing a Vitis HLS RTL IP in Xilinx Vivado. <sup>21</sup> |

|               |                                                     |                                                                                                                                                                                                                |                                                                              |
|---------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
| <b>Day 22</b> | <b>PL Integration:<br/>DPU and Custom<br/>Cores</b> | Integrate the Vitis AI DPU IP core and the custom HLS IP cores (CIC, BiLSTM, DOA) into the Programmable Logic (PL). <sup>32</sup>                                                                              | Integrate DPU-TRD PL Acceleration Kernel. <sup>32</sup>                      |
| <b>Day 23</b> | <b>AXI<br/>Interconnection</b>                      | Connect the data path using <b>AXI4-Stream</b> (e.g., CIC $\rightarrow$ DOA $\rightarrow$ BiLSTM/DPU). Use the Vivado <b>Run Connection Automation</b> tool to link AXI4-Lite control to the PS. <sup>20</sup> | Vitis HLS IP Integration Tutorial (Connecting AXI Interfaces). <sup>20</sup> |
| <b>Day 24</b> | <b>Thermal Camera<br/>Interface</b>                 | Implement the simpler <b>SPI/I2C</b> controller logic in the PL for the thermal camera, minimizing interface complexity to focus on core audio processing. <sup>33</sup>                                       | I2C and SPI communication on FPGA example. <sup>33</sup>                     |
| <b>Day 25</b> | <b>Hardware<br/>Bitstream<br/>Generation</b>        | Run the full synthesis, place, and route process in Vivado to generate the final hardware bitstream (.bit file) and the XSA hardware definition file. Understand the HLS                                       | Vivado Implementation (Vitis HLS Documentation). <sup>34</sup>               |

|  |  |                                                                          |  |
|--|--|--------------------------------------------------------------------------|--|
|  |  | implementation report for resource utilization and timing. <sup>34</sup> |  |
|--|--|--------------------------------------------------------------------------|--|

## Week 6: Software, Verification, and Fusion

Develop the host application on the Processing System (PS) to manage the accelerators and perform the multi-modal fusion logic.

| Day    | Topic Focus                          | Key Learning Objectives                                                                                                                                                         | Free Resource / Tutorial                                                           |
|--------|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| Day 26 | <b>PS Software Development Setup</b> | Set up the Vitis Software Platform project using the generated XSA file. <sup>35</sup> Access the auto-generated driver files for the custom AXI-Lite IP cores. <sup>20</sup>   | Vitis Tutorials: Getting Started. <sup>35</sup>                                    |
| Day 27 | <b>Hardware Control Software</b>     | Write the C/C++ application on the PS to control the flow: reading sensor data and using AXI4-Lite control registers to start and manage the DPU and HLS kernels. <sup>20</sup> | Vitis HLS IP Integration Tutorial (Creating a Software Application). <sup>20</sup> |
| Day 28 | <b>Multi-Modal Fusion Logic</b>      | Implement the sensor fusion algorithm in the PS                                                                                                                                 | Custom C/C++ application development using                                         |

|               |                                         |                                                                                                                                                                                               |                                                       |
|---------------|-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|
|               |                                         | software. Correlate the acoustic classification (BiLSTM output) with the DOA localization estimate and the thermal camera confirmation to confirm the source location. <sup>6</sup>           | PS resources. <sup>6</sup>                            |
| <b>Day 29</b> | <b>Real-Time Simulation &amp; Debug</b> | Load the final application onto the target FPGA/SoC. Debug the data flow and verify the low-latency inference performance (target: 100-200ms per segment) using the timer code. <sup>20</sup> | Measuring the run time on the hardware. <sup>20</sup> |
| <b>Day 30</b> | <b>Final Testing and Documentation</b>  | Conduct comprehensive end-to-end testing with audio and thermal inputs. Document the system performance, resource utilization, and lessons learned.                                           | Project Finalization and Reporting.                   |

### Works cited

1. Vitis High-Level Synthesis User Guide - AMD, accessed October 25, 2025, [https://www.xilinx.com/support/documents/sw\\_manuals/xilinx2022\\_2/ug1399-vitis-hls.pdf](https://www.xilinx.com/support/documents/sw_manuals/xilinx2022_2/ug1399-vitis-hls.pdf)

2. Vivado Design Suite User Guide: High-Level Synthesis, accessed October 25, 2025,  
[https://www.xilinx.com/support/documents/sw\\_manuals/xilinx2020\\_1/ug902-vivado-high-level-synthesis.pdf](https://www.xilinx.com/support/documents/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf)
3. Vitis: HLS • BYU Computing Bootcamp, accessed October 25, 2025,  
[https://byu-cpe.github.io/ComputingBootCamp/tutorials/vitis\\_hls/](https://byu-cpe.github.io/ComputingBootCamp/tutorials/vitis_hls/)
4. Webinar: HLS - What Is It and When Do You Use It? - YouTube, accessed October 25, 2025, <https://www.youtube.com/watch?v=RLCpw7RyhZM>
5. Vitis HLS — Vitis™ Tutorials 2021.1 documentation - GitHub Pages, accessed October 25, 2025,  
[https://xilinx.github.io/Vitis-Tutorials/2021-1/build/html/docs/Getting\\_Started/Vitis\\_HLS/Getting\\_Started\\_Vitis\\_HLS.html](https://xilinx.github.io/Vitis-Tutorials/2021-1/build/html/docs/Getting_Started/Vitis_HLS/Getting_Started_Vitis_HLS.html)
6. Tutorials and Examples - 2025.1 English - UG1399, accessed October 25, 2025,  
<https://docs.amd.com/r/en-US/ug1399-vitis-hls/Tutorials-and-Examples>
7. Xilinx/Vitis-HLS-Introductory-Examples - GitHub, accessed October 25, 2025,  
<https://github.com/Xilinx/Vitis-HLS-Introductory-Examples>
8. Vitis AI - YouTube, accessed October 25, 2025,  
<https://www.youtube.com/playlist?list=PLx15eYqzJiffUiCaTxzFFwJSKT7eKnfq>
9. Introduction to Vitis AI - YouTube, accessed October 25, 2025,  
<https://www.youtube.com/watch?v=Z6jkUz0Re7Q>
10. Xilinx/Vitis-AI-Tutorials - GitHub, accessed October 25, 2025,  
<https://github.com/Xilinx/Vitis-AI-Tutorials>
11. [2401.17544] Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs, accessed October 25, 2025, <https://arxiv.org/abs/2401.17544>
12. HIGH-ACCURACY ACOUSTIC SENSING SYSTEM WITH A 2D TRANSCEIVER ARRAY: AN FPGA-BASED DESIGN, accessed October 25, 2025,  
<https://hammer.purdue.edu/n downloader/files/44937940>
13. M3-AC: A Multi-Mode Multithread SoC FPGA Based Acoustic Camera - MDPI, accessed October 25, 2025, <https://www.mdpi.com/2079-9292/10/3/317>
14. Design and Implementation of a Decimation Filter For High Performance Audio Applications, accessed October 25, 2025,  
[https://www.researchgate.net/publication/224311738\\_Design\\_and\\_Implementation\\_of\\_a\\_Decimation\\_Filter\\_For\\_High\\_Performance\\_Audio\\_Applications](https://www.researchgate.net/publication/224311738_Design_and_Implementation_of_a_Decimation_Filter_For_High_Performance_Audio_Applications)
15. Master theses CIC filter design with HLS - Webthesis - Politecnico di Torino, accessed October 25, 2025, <https://webthesis.biblio.polito.it/11004/1/tesi.pdf>
16. Matched Filter: Vitis HLS Implementation - YouTube, accessed October 25, 2025, <https://www.youtube.com/watch?v=hvtVcxL1-xE>
17. HLS Programmers Guide - 2025.1 English - UG1399, accessed October 25, 2025, <https://docs.amd.com/r/en-US/ug1399-vitis-hls/HLS-Programmers-Guide>
18. Introduction to High-Level Synthesis with Vivado HLS, accessed October 25, 2025,  
[https://users.ece.utexas.edu/~gerstl/ee382v\\_f14/soc/vivado\\_hls/VivadoHLS\\_Overview.pdf](https://users.ece.utexas.edu/~gerstl/ee382v_f14/soc/vivado_hls/VivadoHLS_Overview.pdf)
19. AMD Vitis™ HLS, accessed October 25, 2025,  
<https://www.amd.com/en/products/software/adaptive-socs-and-fpgas/vitis-vitis->

[hls.html](#)

20. Vitis HLS Integration Tutorial • ECEn 625 - GitHub Pages, accessed October 25, 2025, <https://byu-cpe.github.io/ecen625/hls-integration-tutorial/>
21. From Xilinx Vitis HLS to FPGA IP - YouTube, accessed October 25, 2025, <https://www.youtube.com/watch?v=bHig4zQpq2o>
22. ntampouratzis/FPGA-based-LSTM - GitHub, accessed October 25, 2025, <https://github.com/ntampouratzis/FPGA-based-LSTM>
23. The community version of HLS\_BLSTM (A BLSTM FPGA accelerator of an OCR application, using CAPI/SNAP) - GitHub, accessed October 25, 2025, [https://github.com/oprecomp/HLS\\_BLSTM](https://github.com/oprecomp/HLS_BLSTM)
24. These tutorials offer a broader introduction to the Vitis Unified IDE, in addition to briefly describing the most simple HLS flows and use cases. - XD261, accessed October 25, 2025, <https://docs.amd.com/r/en-US/Vitis-Tutorials-Vitis-HLS>
25. spcl/gemm\_hls: Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs. - GitHub, accessed October 25, 2025, [https://github.com/spcl/gemm\\_hls](https://github.com/spcl/gemm_hls)
26. Matrix-vector multiplication - SLING user documentation, accessed October 25, 2025, [https://doc.sling.si/en/workshops/hls-for-fpga/04-programming/02-matv\\_mult/](https://doc.sling.si/en/workshops/hls-for-fpga/04-programming/02-matv_mult/)
27. HLS FPGA Design Guide - Siemens HLS Academy, accessed October 25, 2025, <https://hls.academy/topics/hls-fpga/>
28. Beamscan Direction of Arrival Estimation Using FPGA - MATLAB & Simulink - MathWorks, accessed October 25, 2025, <https://www.mathworks.com/help/phased/ug/direction-of-arrival-beamformer.html>
29. Beamforming & DOA | PySDR: A Guide to SDR and DSP using Python, accessed October 25, 2025, <https://pysdr.org/content/doa.html>
30. Digital Beamforming Implementation on an FPGA Platform - UPCommons, accessed October 25, 2025, <https://upcommons.upc.edu/bitstreams/df9e1453-da34-4d71-a712-0d0d444cb25c/download>
31. FPGA-Based Architectures for Acoustic Beamforming with Microphone Arrays: Trends, Challenges and Research Opportunities - MDPI, accessed October 25, 2025, <https://www.mdpi.com/2073-431X/7/3/41>
32. Vitis AI Tutorial – Part 4 - Octavo Systems, accessed October 25, 2025, [https://octavosystems.com/app\\_notes/vitis-ai-tutorial-part-4/](https://octavosystems.com/app_notes/vitis-ai-tutorial-part-4/)
33. I2C and SPI communication on FPGA, accessed October 25, 2025, <https://forums.ni.com/t5/Example-Code/I2C-and-SPI-communication-on-FPGA/t-a-p/3823804>
34. Running Implementation - 2025.1 English - UG1399, accessed October 25, 2025, <https://docs.amd.com/r/en-US/ug1399-vitis-hls/Running-Implementation>
35. Xilinx/Vitis-Tutorials: Vitis In-Depth Tutorials - GitHub, accessed October 25, 2025, <https://github.com/Xilinx/Vitis-Tutorials>