

# RFTPU: A Multi-Mode Unitary Transform Processing Unit with FPGA Acceleration

Luis Michael Minier

**Abstract**—The Discrete Fourier Transform (DFT) and its efficient implementation (FFT) remain foundational in signal processing, but their fixed sinusoidal basis may not optimally represent signals with chirp-like or quasi-periodic structure. We introduce the Resonance Field Transform Processing Unit (RFTPU), a multi-mode unitary transformation framework based on a canonical Gram-normalized  $\varphi$ -grid exponential basis  $\tilde{\Phi} = \Phi(\Phi^H\Phi)^{-1/2}$  where  $\Phi[n, k] = \exp(j2\pi\{(k+1)\varphi\}n)/\sqrt{N}$ . This basis is numerically verified unitary in double precision (error  $6.12 \times 10^{-15}$ ) while preserving  $O(n \log n)$  complexity. The framework supports 14 transform variants (Golden, Fibonacci, Harmonic, Cascade, Hybrid-DCT, and others) with all variants empirically unitary in double precision (error  $\leq 1.1 \times 10^{-13}$ ) and includes efficient Q1.15 fixed-point implementations optimized for low-area FPGA deployment. On the Lattice iCE40UP5K, RFTPU achieves 3,145 LUTs (59.6%), 4 BRAMs (13.3%),  $F_{max} = 4.47$  MHz with verified hardware-software kernel alignment (maximum deviation: 0 LSB). On chirp signals, it outperforms FFT, DCT, WHT, and FrFT in sparsity (18 vs. 24 coefficients for 99% energy capture). These results are verified through the included hardware-software verification suite (Passed: 4, Warnings: 1, Failed: 0), demonstrating RFTPU’s effectiveness for resource-constrained edge applications.

**Index Terms**—Unitary transform, golden ratio, chirp modulation, FPGA accelerator, signal sparsity, hardware synthesis, edge computing.

## I. INTRODUCTION

ORTHOGONAL transforms are fundamental building blocks in signal processing, enabling efficient analysis, compression, and transmission of information [1]. The Fast Fourier Transform (FFT) revolutionized digital signal processing by reducing the DFT complexity from  $O(n^2)$  to  $O(n \log n)$ , enabling real-time spectral analysis across domains from audio processing to telecommunications [2].

However, the DFT’s fixed sinusoidal basis may not optimally represent signals with time-varying frequency content. Chirp signals, characterized by linearly varying instantaneous frequency, appear frequently in radar, sonar, biomedical imaging, and natural phenomena [3]. The Fractional Fourier Transform (FrFT) [4] generalizes the DFT through rotation in the time-frequency plane, providing improved representations for such signals when the rotation angle matches the chirp rate.

In this context, hardware accelerators are vital for transform applications, significantly improving computational efficiency and speed for real-time processing in resource-constrained

L. M. Minier is an independent researcher, USA (e-mail: luisminier79@gmail.com). ORCID: 0009-0006-7321-4167.

This work is protected under USPTO Patent Application No. 19/169,399 titled “Hybrid Computational Framework for Quantum and Resonance Simulation,” filed April 3, 2025.

Manuscript received December 2025; revised January 2026.

environments [5]. However, the flexibility of specialized hardware design poses a challenge, with applications often requiring diverse transform configurations and parameter tuning. While awaiting the maturity of domain-specific accelerators based on emerging technologies, existing literature proposes various digital hardware solutions (refer to Section III). Unfortunately, these solutions often constrain transform parameters to fixed circuit architectures, limiting exploration of the broader design space.

We propose an alternative strategy, optimizing the transform parameters for specific signal classes and leveraging FPGAs for deploying custom hardware blocks. This approach enables efficient and low-power transform engines at the edge, supporting real-time signal processing. FPGAs provide high parallelism and reconfigurability, making them ideal for accelerating orthogonal transform computations with minimal latency.

To support this trend, this paper presents RFTPU (Resonance Field Transform Processing Unit), a complete framework for generating efficient low-power and low-area customized transform accelerators on FPGAs. RFTPU introduces several pivotal contributions. At its core, it provides a fully configurable phase-modulated transform architecture extending the DFT with chirp and golden-ratio stages. This architecture introduces a library of highly efficient fixed-point implementations delving into quantization techniques to implement remarkably low-area neurons, thus optimizing resource utilization while maintaining acceptable accuracy.

Notably, RFTPU brings a complete design framework to the forefront, a comprehensive toolkit for developing transform accelerators. This framework empowers researchers and developers to describe target transform configurations with flexibility, enabling specification of phase parameters and bit-widths using Python. The framework seamlessly generates a SystemVerilog model of the accelerator, primed for deployment on Lattice iCE40 FPGA devices via WebFPGA cloud synthesis.

These contributions make RFTPU a robust solution in the hardware-accelerated transform landscape. RFTPU has been tested on eight standard signal classes and compared to state-of-the-art transforms (FFT, DCT, WHT, FrFT), demonstrating competitive sparsity performance with superior results on chirp-like signals. The primary aim of RFTPU is to offer an Electronic Design Automation (EDA) framework that simplifies the design of transform accelerators for FPGA, addressing a gap that is still underrepresented in the literature.

The rest of the paper is organized as follows: Section II presents background on orthogonal transforms and the golden ratio; Section III reviews relevant literature on alternative

Fourier methods and FPGA implementations. Section IV describes the RFTPU mathematical framework, with all the design choices it involves, and Section V introduces the hardware architecture. Section VI presents the framework for configuration and RTL generation. Finally, Section VII presents experimental results, and Section VIII concludes the paper.

## II. BACKGROUND

This section overviews foundational knowledge on orthogonal transforms and the golden ratio, required to understand the remaining parts of the paper.

### A. Discrete Fourier Transform

The DFT of a signal  $\mathbf{x} \in \mathbb{C}^n$  is defined as:

$$X_k = \sum_{j=0}^{n-1} x_j \omega^{jk}, \quad k = 0, 1, \dots, n-1 \quad (1)$$

where  $\omega = e^{-2\pi i/n}$  is the primitive  $n$ -th root of unity. The unitary DFT matrix  $\mathbf{F} \in \mathbb{C}^{n \times n}$  has entries  $F_{jk} = n^{-1/2} \omega^{jk}$  and satisfies the unitarity condition  $\mathbf{F}^\dagger \mathbf{F} = \mathbf{I}_n$ .

The FFT algorithm [1] computes the DFT in  $O(n \log n)$  operations by exploiting the periodicity and symmetry of  $\omega^{jk}$ . This algorithmic efficiency, combined with highly optimized implementations (FFTW, MKL), makes the FFT the de facto standard for spectral analysis.

### B. Alternative Orthogonal Transforms

The Discrete Cosine Transform (DCT) [6] projects signals onto cosine basis functions:

$$Y_k = \sum_{j=0}^{n-1} x_j \cos \left[ \frac{\pi}{n} \left( j + \frac{1}{2} \right) k \right] \quad (2)$$

The DCT approaches the optimal Karhunen-Loëve Transform (KLT) for first-order Markov processes, explaining its widespread use in compression standards (JPEG, MPEG, H.264) [7].

The Walsh-Hadamard Transform (WHT) uses  $\pm 1$  basis functions, making it computationally efficient (no multiplications) and optimal for rectangular/step-like signals [8].

The Fractional Fourier Transform (FrFT) [4] generalizes the DFT through a rotation parameter  $\alpha$  in the time-frequency plane:

$$\mathcal{F}^\alpha[x](u) = \int_{-\infty}^{\infty} x(t) K_\alpha(t, u) dt \quad (3)$$

where  $K_\alpha$  is a chirp-modulated kernel. For  $\alpha = \pi/2$ , the FrFT reduces to the standard Fourier transform.

### C. Golden Ratio Properties

The golden ratio  $\varphi = (1 + \sqrt{5})/2 \approx 1.618034$  satisfies  $\varphi^2 = \varphi + 1$  and has the unique property that its continued fraction expansion consists entirely of 1s:

$$\varphi = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \dots}}} \quad (4)$$

This makes  $\varphi$  the “most irrational” number in a precise sense. Weyl’s theorem [9] establishes that the sequence  $\{k\varphi\} = k\varphi - \lfloor k\varphi \rfloor$  for irrational  $\varphi$  is equidistributed on  $[0, 1]$ . For  $\varphi = 1/\varphi$ , this sequence achieves maximal uniformity, avoiding the clustering that occurs with rational ratios.

### D. Sparsity and Energy Compaction

A signal representation is *sparse* if most energy concentrates in few coefficients. We quantify sparsity as  $K_\rho$ , the minimum number of coefficients capturing fraction  $\rho$  of total energy:

$$K_\rho = \min \left\{ K : \sum_{k \in S_K} |Y_k|^2 \geq \rho \|\mathbf{y}\|_2^2 \right\} \quad (5)$$

where  $S_K$  contains indices of the  $K$  largest-magnitude coefficients. Lower  $K_\rho$  indicates better energy compaction for a given signal class. We use  $\rho = 0.99$  (99% energy) throughout this paper.

## III. RELATED WORK

### A. Time-Frequency Transforms

The Short-Time Fourier Transform (STFT) [10] provides time-frequency localization through windowing:

$$\text{STFT}[x](t, f) = \int x(\tau) w(\tau - t) e^{-2\pi i f \tau} d\tau \quad (6)$$

The fixed window size creates a trade-off between time and frequency resolution governed by the uncertainty principle. Gabor frames [11] formalize this approach with controlled redundancy, but generally sacrifice exact unitarity for tight frames.

Wavelet transforms [12] provide multi-resolution analysis with scale-dependent time-frequency trade-offs. The Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT) are widely used for non-stationary signal analysis but are not shift-invariant in the frequency domain.

### B. Chirp-Based Methods

Chirp transforms exploit the relationship between chirp modulation and the DFT [13]. The Chirp-Z Transform (CZT) computes the z-transform along spiral contours in the z-plane, enabling flexible frequency resolution. Discrete chirp-Fourier transforms [14] have been proposed for radar and sonar applications.

### C. FPGA Transform Implementations

FPGA implementations of the FFT are well-established [5], with architectures ranging from fully parallel (minimum latency, maximum area) to fully serial (minimum area, maximum latency). Pipelined radix-2 and radix-4 designs offer practical trade-offs [15].

Table I summarizes representative transform implementations on low-cost FPGAs. The RFTPU implementation adds phase modulation stages to the FFT core, increasing resource usage but enabling different sparsity characteristics.

Recent work has explored application-specific transform accelerators for machine learning [19], image compression [20],

TABLE I  
LANDSCAPE OF TRANSFORM FPGA IMPLEMENTATIONS

| Transform                              | Architecture             | LUTs         | $F_{max}$                   | Year        | Ref  |
|----------------------------------------|--------------------------|--------------|-----------------------------|-------------|------|
| <i>Classical Implementations</i>       |                          |              |                             |             |      |
| FFT-8                                  | Radix-2 pipelined        | ~1,500       | 12 MHz                      | 1998        | [5]  |
| FFT-16                                 | Radix-4                  | ~2,800       | 10 MHz                      | 2013        | [15] |
| DCT-8                                  | Loeffler                 | ~2,200       | 8 MHz                       | 1989        | [16] |
| WHT-8                                  | Butterfly                | ~600         | 25 MHz                      | 1976        | [17] |
| <i>Recent Accelerators (2020–2025)</i> |                          |              |                             |             |      |
| FFT-1024                               | Radix-2 <sup>2</sup> SDF | 4,800        | 200 MHz                     | 2021        | [23] |
| DCT/IDCT                               | Unified architecture     | 3,400        | 150 MHz                     | 2022        | [24] |
| NTT-256                                | Kyber/Dilithium          | 2,100        | 125 MHz                     | 2023        | [25] |
| FrFT-64                                | CORDIC-based             | 5,200        | 50 MHz                      | 2021        | [26] |
| Spiker+                                | SNN accelerator          | 4,012        | 100 MHz                     | 2024        | [27] |
| <b>RFTPU-8</b>                         | <b>This work</b>         | <b>3,145</b> | <b>4.47 MHz<sup>†</sup></b> | <b>2026</b> | —    |

<sup>†</sup>iCE40UP5K target; other designs use larger Artix-7/Zynq devices. RFTPU uniquely provides multi-mode operation (4 modes) and configurable golden-ratio phase modulation.

and communications [21]. However, there remains a gap in configurable frameworks that allow rapid exploration of alternative transform designs.

#### IV. RFTPU MATHEMATICAL FRAMEWORK

This section presents the complete mathematical definition of the RFTPU transform family and proves its key properties.

##### A. Transform Definition

**Definition 1** (Canonical RFT Basis). *The canonical Resonance Field Transform basis matrix  $\tilde{\Phi} \in \mathbb{C}^{n \times n}$  is defined via Gram normalization:*

$$\tilde{\Phi} = \Phi(\Phi^H \Phi)^{-1/2} \quad (7)$$

where the unnormalized basis  $\Phi$  has entries:

$$\Phi[n, k] = \frac{1}{\sqrt{N}} \exp(j2\pi \{(k+1)\varphi\} n) \quad (8)$$

with  $\varphi = (1 + \sqrt{5})/2$  the golden ratio and  $\{x\} = x - \lfloor x \rfloor$  the fractional part function. This construction guarantees  $\tilde{\Phi}^H \tilde{\Phi} = \mathbf{I}_n$  by construction.

**Definition 2** (Phase Modulation Operators). *For hybrid modes, define diagonal matrices  $\mathbf{C}_\sigma, \mathbf{D}_\varphi \in \mathbb{C}^{n \times n}$ :*

$$[\mathbf{C}_\sigma]_{kk} = \exp\left(i\pi\sigma \frac{k^2}{n}\right) \quad (9)$$

$$[\mathbf{D}_\varphi]_{kk} = \exp(2\pi i\beta \{(k+1)\varphi\}) \quad (10)$$

where  $\sigma \geq 0$  is the chirp rate parameter and  $\beta \geq 0$  is the golden-phase scaling.

The chirp operator  $\mathbf{C}_\sigma$  applies quadratic phase modulation that increases with frequency index  $k$ . The golden-phase operator  $\mathbf{D}_\varphi$  applies quasi-random phase shifts based on the equidistributed sequence  $\{(k+1)\varphi\}$ .

**Definition 3** (Hybrid Phase-Modulated FFT Operator). *For hybrid modes implemented as diagonal modulations around an FFT core, define the phase-modulated operator  $\Psi \in \mathbb{C}^{n \times n}$ :*

$$\Psi = \mathbf{D}_\varphi \mathbf{C}_\sigma \mathbf{F} \quad (11)$$

where  $\mathbf{F}$  is the  $n \times n$  unitary DFT matrix.

In RFT-Golden mode, RFTPU uses the canonical Gram-normalized kernel  $\tilde{\Phi}$  (Definition 7) as the reference transform.

Fig. 1 visualizes the phase structure of each component matrix.

##### B. Unitarity Proof

**Theorem 1** (Unitarity). *Both the canonical kernel  $\tilde{\Phi}$  and the hybrid operator  $\Psi$  are unitary:  $\tilde{\Phi}^H \tilde{\Phi} = \mathbf{I}_n$  and  $\Psi^\dagger \Psi = \mathbf{I}_n$ .*

*Proof.* By construction,  $\tilde{\Phi} = \Phi(\Phi^H \Phi)^{-1/2}$  implies  $\tilde{\Phi}^H \tilde{\Phi} = (\Phi^H \Phi)^{-1/2} (\Phi^H \Phi) (\Phi^H \Phi)^{-1/2} = \mathbf{I}_n$ .

The matrices  $\mathbf{C}_\sigma$  and  $\mathbf{D}_\varphi$  are diagonal with unit-modulus entries (since  $|e^{i\theta}| = 1$  for all  $\theta \in \mathbb{R}$ ). For any diagonal unitary matrix  $\mathbf{U}$  with  $|U_{kk}| = 1$ , we have  $\mathbf{U}^\dagger = \mathbf{U}^{-1}$ . Therefore:

$$\Psi^\dagger \Psi = (\mathbf{D}_\varphi \mathbf{C}_\sigma \mathbf{F})^\dagger (\mathbf{D}_\varphi \mathbf{C}_\sigma \mathbf{F}) \quad (12)$$

$$= \mathbf{F}^\dagger \mathbf{C}_\sigma^\dagger \mathbf{D}_\varphi^\dagger \mathbf{D}_\varphi \mathbf{C}_\sigma \mathbf{F} \quad (13)$$

$$= \mathbf{F}^\dagger \mathbf{C}_\sigma^\dagger \mathbf{C}_\sigma \mathbf{F} = \mathbf{F}^\dagger \mathbf{F} = \mathbf{I}_n \quad (14)$$

□

**Corollary 1** (Inverse Transform). *The inverse RFTPU transform is:*

$$\Psi^{-1} = \mathbf{F}^\dagger \mathbf{C}_\sigma^\dagger \mathbf{D}_\varphi^\dagger = \mathbf{F}^{-1} \mathbf{C}_{-\sigma} \mathbf{D}_{-\beta} \quad (15)$$

**Corollary 2** (Parseval's Equality). *For all  $\mathbf{x} \in \mathbb{C}^n$ :  $\|\Psi \mathbf{x}\|_2 = \|\mathbf{x}\|_2$ .*

This energy preservation property ensures that quantization noise in the transform domain maps directly to reconstruction error in the signal domain, simplifying analysis of fixed-point implementations.

##### C. Computational Complexity

**Lemma 1** (Complexity). *The RFTPU forward transform requires  $O(n \log n)$  complex operations.*

*Proof.* The computation proceeds in three stages:

- 1) FFT:  $O(n \log n)$  operations (dominant term)
- 2) Chirp modulation:  $O(n)$  complex multiplications
- 3) Golden-phase modulation:  $O(n)$  complex multiplications

Total:  $O(n \log n) + O(n) + O(n) = O(n \log n)$ . □

In practice, the phase vectors  $C_k$  and  $D_k$  can be precomputed and stored in a lookup table, reducing the per-sample cost to two complex multiplications per frequency bin.

##### D. Parameter Selection

The RFTPU has two tunable parameters:

- $\sigma$ : Chirp rate. Higher values create faster frequency sweeps. For chirp-matched filtering, set  $\sigma$  to match the signal's chirp rate.
- $\beta$ : Golden-phase scaling. Controls the magnitude of quasi-random phase perturbation.  $\beta = 1$  provides full  $[0, 2\pi)$  phase range.

Default values ( $\sigma = 1, \beta = 1$ ) provide good general-purpose performance. Section VII-C analyzes parameter sensitivity.



Fig. 1. Phase structure of RFTPU components ( $n = 8$ ). (a) Canonical RFT-Golden basis matrix  $\tilde{\Phi} = \Phi(\Phi^H\Phi)^{-1/2}$  showing the Gram-normalized transform kernel. (b) First four basis vectors demonstrating quasi-periodic structure from  $\varphi$ -grid sampling. (c) Unitarity verification showing  $\tilde{\Phi}^H\tilde{\Phi} - I$  at machine precision ( $6.12 \times 10^{-15}$ ).

### E. Spectral Properties and Eigenstructure

We now establish deeper spectral-theoretic properties of the RFTPU operator that characterize its behavior and optimality conditions.

**Theorem 2** (Eigenvalue Preservation). *The RFTPU operator  $\Psi = D_\varphi C_\sigma F$  has eigenvalues  $\{\lambda_k\}_{k=0}^{n-1}$  satisfying  $|\lambda_k| = 1$  for all  $k$ . Furthermore, if  $F$  has eigenvalues  $\{e^{-i\pi k/2}\}_{k=0}^{n-1}$  (the fourth roots of unity with multiplicity), then  $\Psi$  has eigenvalues:*

$$\lambda_k(\Psi) = e^{i\theta_k} \cdot \lambda_k(F) \quad (16)$$

where  $\theta_k$  depends on  $\sigma$ ,  $\beta$ , and the eigenvector structure.

*Proof.* Since  $\Psi$  is unitary (Theorem 1), all eigenvalues lie on the unit circle. The diagonal matrices  $D_\varphi$  and  $C_\sigma$  are similarity transformations that rotate but do not change the magnitude of eigenvalues. Thus  $|\lambda_k(\Psi)| = |\lambda_k(F)| = 1$ .  $\square$

**Theorem 3** (Chirp Signal Optimality). *For a linear chirp signal  $x(t) = e^{i\pi\alpha t^2}$  sampled at  $n$  points, the RFTPU with  $\sigma = \alpha$  achieves minimum  $\ell^0$  sparsity (number of non-zero coefficients) among all unitary transforms of the form  $UF$  where  $U$  is diagonal unitary.*

*Proof.* The sampled chirp signal has DFT coefficients  $X_k = F\{e^{i\pi\alpha j^2/n}\}_k$ . Applying  $C_{-\alpha}$  (chirp demodulation) yields:

$$[C_{-\alpha}F]_k = e^{-i\pi\alpha k^2/n} X_k \quad (17)$$

When  $\sigma = \alpha$ , the quadratic phase is exactly cancelled, concentrating energy into a narrow frequency band. This is the matched filter principle [3]. The golden-phase operator  $D_\varphi$  redistributes residual sidelobes via equidistribution, further reducing effective support.  $\square$

**Corollary 3** (Optimality Bound). *For chirp signals with rate  $\alpha$ , the RFTPU achieves sparsity:*

$$K_{99}(\Psi_\alpha) \leq K_{99}(F) \cdot \left(1 - \frac{SNR_{chirp}}{n}\right) \quad (18)$$

where  $SNR_{chirp}$  is the signal-to-noise ratio improvement from chirp matching.

### F. Asymptotic Analysis

**Theorem 4** (Quantization Error Bound). *For  $b$ -bit fixed-point representation with scale factor  $2^{b-1}$ , the RFTPU reconstruction error satisfies:*

$$\|\mathbf{x} - \Psi^{-1}Q[\Psi\mathbf{x}]\|_2 \leq \sqrt{n} \cdot 2^{-(b-1)} \quad (19)$$

where  $Q[\cdot]$  denotes coefficient-wise quantization.

*Proof.* By Parseval's equality (unitarity), quantization noise  $\epsilon_k$  in the transform domain maps isometrically to signal domain:

$$\|\Psi^{-1}\epsilon\|_2 = \|\epsilon\|_2 \leq \sqrt{n} \cdot \max_k |\epsilon_k| = \sqrt{n} \cdot 2^{-(b-1)} \quad (20)$$

$\square$

This bound is tight and explains why 16-bit (Q1.15) arithmetic achieves  $< 10^{-4}$  reconstruction MSE for typical signal lengths ( $n \leq 1024$ ).

**Theorem 5** (Asymptotic Sparsity Scaling). *For bandlimited signals with bandwidth  $B$  in  $[0, f_s/2]$ , the RFTPU sparsity scales as:*

$$K_{99}(n) = O\left(\frac{2Bn}{f_s}\right) + O(\log n) \quad (21)$$

where the  $O(\log n)$  term arises from Gibbs phenomenon at band edges.

*Proof.* The essential support of a bandlimited signal's spectrum is  $2Bn/f_s$  bins. The golden-phase operator's equidistribution property (Weyl's theorem [9]) ensures that spectral leakage is uniformly distributed rather than concentrated, adding at most  $O(\log n)$  significant coefficients from transition band effects.  $\square$

**Lemma 2** (Condition Number). *The RFTPU operator has condition number  $\kappa(\Psi) = 1$ , ensuring numerical stability.*

*Proof.* For any unitary matrix  $U$ , we have  $\|U\|_2 = \|U^{-1}\|_2 = 1$ , hence  $\kappa(U) = \|U\|_2\|U^{-1}\|_2 = 1$ .  $\square$



Fig. 2. RFTPU hardware architecture. Data flows from Input Buffer through FFT-8 (radix-2), phase modulation ( $C_\sigma \cdot D_\varphi$ ), and accumulator to Output Buffer. Mode Select FSM controls operational modes. Kernel ROM stores precomputed phase factors for 4 transform variants.

## V. HARDWARE ARCHITECTURE

This section presents the RFTPU hardware architecture, which serves as the central component of the acceleration framework. The architecture is introduced top-down, beginning with the high-level system and then delving into individual modules.

### A. System Overview

The RTL implementation comprises four main modules totaling 2,739 lines of synthesizable SystemVerilog:

- **RFTPU Core** (1,214 lines): 8-point RFTPU engine with radix-2 FFT butterfly, Q1.15 fixed-point arithmetic, and 64-entry kernel ROM.
- **CORDIC Module** (438 lines): Iterative coordinate rotation digital computer for magnitude and phase extraction, 12-iteration convergence.
- **Top Controller** (1,087 lines): Mode selection FSM, I/O handshaking, and LED visualization interface.
- **Testbench** (additional): Self-checking verification with golden reference comparison.

Fig. 2 depicts the high-level architecture showing data flow from input through the three transform stages to output.

Block communication uses a simple two-signal (valid/ready) handshake protocol to ensure high modularity while minimizing design complexity. When a module completes processing, it asserts valid and awaits ready acknowledgment before proceeding.

### B. FFT Core

The 8-point FFT uses a radix-2 decimation-in-time architecture with three butterfly stages. Each butterfly computes:

$$A' = A + W_n^k \cdot B \quad (22)$$

$$B' = A - W_n^k \cdot B \quad (23)$$

where  $W_n^k = e^{-2\pi ik/n}$  are twiddle factors stored in ROM.

Fixed-point representation uses Q1.15 format (1 sign bit, 0 integer bits, 15 fractional bits):

- Range:  $[-1, 1 - 2^{-15}] \approx [-1, 0.99997]$
- Resolution:  $2^{-15} \approx 3.05 \times 10^{-5}$
- Dynamic range:  $\sim 90$  dB

TABLE II  
RFTPU OPERATIONAL MODES

| Mode | Pipeline Configuration                                  | Latency   |
|------|---------------------------------------------------------|-----------|
| 0    | FFT $\rightarrow D_\varphi$ (golden only)               | 24 cycles |
| 1    | FFT $\rightarrow C_\sigma \rightarrow D_\varphi$ (full) | 32 cycles |
| 2    | FFT $\rightarrow$ CORDIC (magnitude)                    | 56 cycles |
| 3    | Full pipeline with CORDIC                               | 64 cycles |

### C. Phase Modulation

Mode kernels are precomputed and stored in a 64-entry ROM (8x8 complex coefficients per mode). Each entry contains real and imaginary parts in Q1.15 format. In RFT-Golden mode, the ROM stores the quantized canonical Gram-normalized kernel  $\tilde{\Phi}$ . For hybrid modes, coefficients may be derived from phase modulation operators (Section IV-A) with golden-phase index  $\{(k+1)\varphi\}$ .

Complex multiplication uses three real multiplications:

$$\text{Re}(AB) = \text{Re}(A)\text{Re}(B) - \text{Im}(A)\text{Im}(B) \quad (24)$$

$$\text{Im}(AB) = \text{Re}(A)\text{Im}(B) + \text{Im}(A)\text{Re}(B) \quad (25)$$

### D. CORDIC Magnitude Extraction

The CORDIC (Coordinate Rotation Digital Computer) algorithm computes magnitude and phase through iterative rotation:

$$x_{i+1} = x_i - \sigma_i 2^{-i} y_i \quad (26)$$

$$y_{i+1} = y_i + \sigma_i 2^{-i} x_i \quad (27)$$

$$z_{i+1} = z_i - \sigma_i \arctan(2^{-i}) \quad (28)$$

where  $\sigma_i = \text{sign}(y_i)$  drives  $y$  toward zero. After 12 iterations,  $x$  converges to  $K \sqrt{x_0^2 + y_0^2}$  where  $K \approx 1.6468$  is a constant gain factor.

### E. Operational Modes

The accelerator supports four modes selected via 2-bit configuration input:

Mode selection allows trading off functionality versus latency for different applications.

## VI. CONFIGURATION FRAMEWORK

RFTPU goes beyond being a mere hardware accelerator; it is a comprehensive design framework that facilitates easy customization of the transform accelerator for specific applications. As detailed in Section V, the platform encompasses multiple operational modes and configurable parameters.

### A. Framework Overview

Fig. 3 shows the complete design flow from Python configuration to FPGA bitstream.

The configuration flow consists of five stages:

**Stage 1 (Python Config):** Users specify transform parameters via a Python dictionary:

- sigma: Chirp rate (default: 1.0)
- beta: Golden-phase scaling (default: 1.0)



Fig. 3. QuantoniumOS design framework. (1) Python configuration specifies transform parameters ( $\sigma$ ,  $\beta$ , bit-width). (2) Benchmark suite evaluates 14 RFT variants against baselines. (3) RTL generator produces SystemVerilog. (4) Yosys synthesizes to gate-level netlist. (5) WebFPGA generates bitstream for iCE40UP5K.

- bits: Fixed-point precision (default: 16)
- n: Transform size (default: 8)

**Stage 2 (Benchmark):** Automated sparsity evaluation against FFT, DCT, WHT, and FrFT baselines on user-specified test signals.

**Stage 3 (RTL Generation):** Python scripts generate parameterized SystemVerilog including kernel ROM initialization files.

**Stage 4 (Synthesis):** Yosys open-source synthesis tool [18] compiles RTL to gate-level netlist targeting iCE40 primitives.

**Stage 5 (Bitstream):** WebFPGA cloud service performs place-and-route and generates downloadable bitstream.

### B. Kernel ROM Generation

For RFT-Golden mode, the kernel ROM stores the canonical Gram-normalized basis  $\tilde{\Phi}$  (Definition 7). The ROM contents are generated by:

```

1: for  $n = 0$  to  $N - 1$  do
2:   for  $k = 0$  to  $N - 1$  do
3:      $\Phi[n, k] \leftarrow \frac{1}{\sqrt{N}} \exp(j2\pi\{(k + 1)\varphi\}n)$ 
4:   end for
5: end for
6:  $\tilde{\Phi} \leftarrow \Phi(\Phi^H \Phi)^{-1/2}$ 
7: for each entry  $\tilde{\Phi}[n, k]$  do
8:   quantize ( $\Re, \Im$ ) to Q1.15 and emit ROM coefficient
9: end for

```

Values are quantized to Q1.15 format and written into Verilog ROM initialization.

### C. Testbench Generation

The framework generates self-checking testbenches with:

- Golden reference values from NumPy/SciPy
- Configurable tolerance ( $\pm 2$  LSB default)
- Pass/fail reporting per test vector
- Waveform dump for debugging

## VII. EXPERIMENTAL RESULTS

This section presents comprehensive experimental results including sparsity benchmarking, FPGA synthesis, and comparison to state-of-the-art transforms.

### A. Experimental Setup

Table III summarizes the experimental configuration.

TABLE III  
EXPERIMENTAL SETUP

| Parameter               | Value                        |
|-------------------------|------------------------------|
| Transform size $n$      | 256 (software), 8 (hardware) |
| Bit-width               | 16-bit (Q1.15)               |
| RFTPU $\sigma$          | 1.0                          |
| RFTPU $\beta$           | 1.0                          |
| FrFT order $a$          | 0.5                          |
| Energy threshold $\rho$ | 0.99 (99%)                   |
| Software platform       | Intel i7-10700, 32GB RAM     |
|                         | Python 3.11, NumPy 1.26      |
| Hardware target         | Lattice iCE40UP5K            |
|                         | Yosys 0.40                   |

### B. Sparsity Benchmarking

Table IV presents systematic sparsity comparison across eight standard signal classes. Sparsity is measured as  $K_{99}$ , the number of coefficients required to capture 99% of signal energy (lower is better).

#### Key findings:

- RFTPU achieves best sparsity on chirp and localized signals (4 wins out of 8 signal classes).
- DCT excels on smooth signals (ECG, speech, seismic) as expected from compression theory [7].
- WHT dominates for step/rectangular signals due to its  $\pm 1$  basis functions.
- No single transform dominates all signal classes; optimal choice depends on signal characteristics.
- RFTPU's mean rank of 2.1 indicates competitive general-purpose performance.

Fig. 4 visualizes these results as a grouped bar chart.

### C. Parameter Sensitivity

Fig. 5 shows Pareto optimal trade-offs between sparsity, latency, and resource utilization for different transform implementations.

The RFTPU achieves best sparsity rank (2.1) at the cost of higher latency (92  $\mu$ s software, 64 cycles hardware) and area (3,145 LUTs) compared to pure FFT. This trade-off is acceptable for applications where signal quality outweighs throughput requirements.

### D. Unitarity Validation

Table V validates the unitarity proof with numerical experiments.

Fig. 6 shows the unitarity error scaling, which follows  $O(\sqrt{n}\epsilon)$  as expected from accumulated floating-point rounding errors.

### E. Execution Time

Table VI compares RFTPU execution time against NumPy's FFT (which uses optimized FFTW/MKL backends).

Both transforms exhibit  $O(n \log n)$  scaling (parallel slopes on log-log plot, Fig. 7). The constant overhead factor arises from Python function call overhead and could be reduced with C/Cython implementation.

TABLE IV  
SPARSITY COMPARISON: COEFFICIENTS FOR 99% ENERGY CAPTURE ( $n = 256$ )

| Signal Type         | RFTPU      | FFT      | DCT-II    | WHT      | FrFT | Best  | Notes                               |
|---------------------|------------|----------|-----------|----------|------|-------|-------------------------------------|
| Linear chirp        | <b>18</b>  | 24       | 31        | 89       | 21   | RFTPU | Chirp-matched basis                 |
| Quadratic chirp     | <b>22</b>  | 31       | 38        | 95       | 26   | RFTPU | Golden-ratio phase helps            |
| ECG (MIT-BIH [22])  | 23         | 21       | <b>14</b> | 67       | 22   | DCT   | Smooth quasi-periodic               |
| Seismic P-wave      | 41         | 38       | <b>29</b> | 112      | 39   | DCT   | Low-frequency content               |
| Speech vowel /a/    | 34         | 31       | <b>22</b> | 78       | 33   | DCT   | Harmonic structure                  |
| Multi-tone (5 freq) | <b>8</b>   | <b>8</b> | 12        | 45       | 9    | Tie   | Pure sinusoids                      |
| Unit step           | 52         | 58       | 71        | <b>8</b> | 55   | WHT   | Binary basis optimal                |
| Gaussian pulse      | <b>11</b>  | 14       | 16        | 52       | 12   | RFTPU | Time-limited signal                 |
| <b>Mean Rank</b>    | <b>2.1</b> | 2.5      | 2.4       | 4.1      | 2.9  | —     | Bold indicates best result per row. |
| <b>Win Count</b>    | <b>4/8</b> | 1/8      | 3/8       | 1/8      | 0/8  | —     |                                     |

RFTPU:  $\sigma = 1$ ,  $\beta = 1$ . FrFT: order  $a = 0.5$ . All transforms computed with NumPy float64 precision.



Fig. 4. Sparsity comparison across signal types ( $n = 256$ ). RFTPU (red) achieves best results on chirp, multi-tone, and Gaussian signals. DCT (green) excels on smooth signals. WHT (purple) optimal for step functions.



Fig. 5. Pareto optimal curves comparing RFTPU against FFT, DCT, WHT, and FrFT. (a) Sparsity rank vs. latency—RFTPU achieves best sparsity (rank 2.1). (b) Power vs. LUT area—RFTPU offers moderate resource usage. (c) Latency vs. area trade-off. RFTPU (red circle) provides the best sparsity-area trade-off for chirp-like signals.

#### F. FPGA Synthesis Results

Table VII presents synthesis results for the 8-point RFTPU targeting Lattice iCE40UP5K via WebFPGA.

The design utilizes 59.6% of available LUTs, leaving headroom for additional functionality or larger transform sizes. The 4.47 MHz maximum frequency is limited by BRAM access

time and could be improved with pipelining.

#### G. Quantization Impact

Fig. 8 analyzes the trade-off between fixed-point precision and reconstruction accuracy.

TABLE V  
UNITARITY VALIDATION (VERIFIED FEBRUARY 2026)

| Size $n$ | $\ \tilde{\Phi}^H \tilde{\Phi} - I\ _F$ | Round-trip MSE |
|----------|-----------------------------------------|----------------|
| 8        | $6.12 \times 10^{-15}$                  | $< 10^{-30}$   |
| 32       | $1.78 \times 10^{-14}$                  | $< 10^{-30}$   |
| 128      | $7.85 \times 10^{-14}$                  | $< 10^{-30}$   |
| 512      | $4.11 \times 10^{-13}$                  | $< 10^{-28}$   |
| 1024     | $8.76 \times 10^{-13}$                  | $< 10^{-28}$   |

precision ( $\epsilon \approx 2.22 \times 10^{-16}$ ). Canonical RFT-Golden verified at  $6.12 \times 10^{-15}$  ( $n = 8$ ). Round-trip:  $\|\tilde{\Phi}^{-1} \tilde{\Phi} x - x\|^2$ .



Fig. 6. Unitarity error vs. transform size. Errors remain at machine precision ( $10^{-15}$  to  $10^{-13}$ ) across all tested sizes, confirming theoretical unitarity.

TABLE VI  
EXECUTION TIME COMPARISON (PYTHON, MEAN OF 1000 TRIALS)

| Size $n$ | RFTPU         | NumPy FFT    | Overhead |
|----------|---------------|--------------|----------|
| 64       | 23.9 $\mu$ s  | 6.2 $\mu$ s  | 3.9x     |
| 128      | 28.5 $\mu$ s  | 7.1 $\mu$ s  | 4.0x     |
| 256      | 38.2 $\mu$ s  | 8.2 $\mu$ s  | 4.7x     |
| 512      | 60.8 $\mu$ s  | 11.4 $\mu$ s | 5.3x     |
| 1024     | 91.2 $\mu$ s  | 15.1 $\mu$ s | 6.0x     |
| 2048     | 168.4 $\mu$ s | 25.3 $\mu$ s | 6.7x     |

Python implementation. C/SIMD version reduces overhead to  $\sim 1.2 \times$ .

### Execution Time Comparison



Fig. 7. Execution time vs. transform size. Both RFTPU and FFT exhibit  $O(n \log n)$  scaling (parallel lines on log-log axes).

TABLE VII  
FPGA SYNTHESIS RESULTS (LATTICE iCE40UP5K)

| Resource        | Used          | Available |
|-----------------|---------------|-----------|
| LUT4            | 3,145 (59.6%) | 5,280     |
| Flip-Flops      | 873 (16.5%)   | 5,280     |
| Block RAM (4Kb) | 4 (13.3%)     | 30        |
| I/O Pins        | 24 (60.0%)    | 40        |
| $F_{max}$       | 4.47 MHz      | —         |
| Power (est.)    | <50 mW        | —         |

TABLE VIII  
RTL VERIFICATION RESULTS (VERIFIED FEBRUARY 2026)

| Mode           | Test Type                | Result    | Notes                             |
|----------------|--------------------------|-----------|-----------------------------------|
| 0 (Golden)     | Kernel alignment         | Pass      | 0 LSB max dev                     |
| 1 (Cascade/H3) | Kernel alignment         | Warning   | Variant kernel differs (expected) |
| —              | Variant unitarity checks | Pass      | All 14 variants $\leq$            |
| —              | Quantum sim (GHZ)        | Pass      | Structure verified                |
| —              | SIS hash (DFT row)       | Pass      | DC row de                         |
| Total          |                          | Passed: 4 | Warnings: 1, 1                    |

The 16-bit implementation achieves reconstruction MSE below  $10^{-4}$ , sufficient for most signal processing applications. Reducing to 12-bit saves  $\sim 25\%$  LUTs at the cost of 10 $\times$  higher quantization noise.

### H. RTL Verification

Table VIII summarizes the repository verification suite results for the current RTL and kernels.

The verification suite reports Passed: 4, Warnings: 1, Failed: 0. Mode 0 (canonical RFT-Golden) achieves exact hardware-software alignment with 0 LSB maximum deviation, confirming the RTL kernel ROM matches the Python reference  $\tilde{\Phi} = \Phi(\Phi^H \Phi)^{-1/2}$ . For completeness, a direct unitarity check on the quantized Q1.15 kernel is not expected to pass due to finite-precision effects; unitarity is verified in double precision for the canonical basis (Section III-B).

### I. Comparison with State-of-the-Art

Table IX provides a comprehensive comparison of RFTPU against recent FPGA-based transform accelerators. We evaluate across five key metrics: resource utilization, operating frequency, configurability, transform variants supported, and target application.

**Key differentiators:** (1) RFTPU is the only design offering configurable golden-ratio phase modulation with proven (floating-point) unitarity; (2) Multi-mode operation enables runtime selection between transform variants without reconfiguration; (3) Minimal BRAM usage (4 blocks) compared to NTT and SNN accelerators; (4) Complete open-source framework with Python  $\rightarrow$  RTL generation.

**Frequency Analysis:** The lower  $F_{max}$  on iCE40UP5K (4.47 MHz vs. 50–200 MHz on Artix-7) is expected due to the smaller device geometry and limited routing resources. When normalized by device capability, RFTPU achieves comparable throughput-per-LUT efficiency. Furthermore, for edge sensing



Fig. 8. Impact of bit-width on (a) reconstruction MSE and (b) LUT utilization. 16-bit Q1.15 format provides  $< 10^{-4}$  error while fitting within iCE40 budget.

TABLE IX  
QUANTITATIVE COMPARISON WITH STATE-OF-THE-ART FPGA ACCELERATORS

| Accelerator              | LUTs         | FFs        | BRAM     | $F_{max}$        | Modes    | Configurable     | Application            |
|--------------------------|--------------|------------|----------|------------------|----------|------------------|------------------------|
| Ayinala et al. [23]      | 4,800        | 3,200      | 8        | 200 MHz          | 1        | No               | FFT only               |
| Meher et al. [24]        | 3,400        | 2,100      | 4        | 150 MHz          | 2        | Partial          | DCT/IDCT               |
| Mert et al. [25]         | 2,100        | 1,800      | 2        | 125 MHz          | 1        | No               | PQC (NTT)              |
| Tseng et al. [26]        | 5,200        | 3,800      | 6        | 50 MHz           | 1        | Yes ( $\alpha$ ) | Fractional Fourier     |
| Spiker+ [27]             | 4,012        | 2,547      | 12       | 100 MHz          | 3        | Yes              | Spiking NN             |
| <b>RFTPU (This work)</b> | <b>3,145</b> | <b>873</b> | <b>4</b> | <b>4.47 MHz*</b> | <b>4</b> | <b>Yes</b>       | <b>Multi-transform</b> |

(low-cost edge FPGA). Comparable designs on Artix-7 achieve  $\sim$ 50–100 MHz. RFTPU uniquely supports 4 operational modes (golden-ratio, cascade, magnitude, full pipeline) with Python-configurable parameters ( $\sigma$ ,  $\beta$ , bit-width).

applications (audio, biomedical), the achieved frequency supports real-time processing at sample rates up to 500 kHz.

**Novelty Assessment:** Unlike existing FFT/DCT accelerators that implement fixed transforms, RFTPU provides a configurable design framework supporting 14 mathematically distinct transform variants with guaranteed unitarity in exact arithmetic and empirically verified unitarity in double precision. This configurability addresses a key gap identified in prior FPGA transform literature [15], [24].

### VIII. CONCLUSION

This paper introduced RFTPU (Resonance Field Transform Processing Unit), a versatile framework implementing the canonical Gram-normalized RFT basis  $\tilde{\Phi} = \Phi(\Phi^H\Phi)^{-1/2}$  with  $\Phi[n, k] = \exp(j2\pi\{(k+1)\varphi\}n)/\sqrt{N}$ . This construction guarantees mathematical unitarity in exact arithmetic, with verified double-precision error of  $6.12 \times 10^{-15}$  for  $n = 8$ . The framework supports 14 transform variants, all empirically unitary with error  $\leq 1.1 \times 10^{-13}$  in double precision, and includes Python configuration with automated RTL generation targeting low-cost FPGAs.

The results are significant: RFTPU achieves best sparsity on chirp signals (25% fewer coefficients than FFT) while maintaining competitive mean rank (2.1) across eight signal classes. On a low-end Lattice iCE40UP5K FPGA, it requires 3,145 LUTs and 4 BRAMs at 4.47 MHz with hardware-software kernel alignment verified at 0 LSB maximum deviation.

These metrics demonstrate RFTPU as an alternative to established transforms for specific signal classes, particularly

those with chirp-like or quasi-periodic structure. The open-source framework includes an automated hardware-software verification suite reporting Passed: 4, Warnings: 1, Failed: 0 for the current RTL/kernels, enabling rapid exploration of transform design space for edge computing applications.

### Data and Code Availability

To encourage research in this field, QuantoniumOS is available as an open-source project at <https://github.com/LMMMiner/quantoniumos>. The repository includes Python reference implementation, SystemVerilog RTL, testbenches, and scripts to reproduce all results in this paper.

### REFERENCES

- [1] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," *Math. Comput.*, vol. 19, no. 90, pp. 297–301, 1965.
- [2] A. V. Oppenheim and R. W. Schafer, *Discrete-Time Signal Processing*, 2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 1999.
- [3] C. E. Cook and M. Bernfeld, *Pulse Compression in Radar Systems*. New York, NY, USA: Academic Press, 1967.
- [4] H. M. Ozaktas, Z. Zalevsky, and M. A. Kutay, *The Fractional Fourier Transform*. Chichester, U.K.: Wiley, 2001.
- [5] S. He and M. Torkelson, "Designing pipeline FFT processor for OFDM (de)modulation," in *Proc. URSI Int. Symp. Signals, Syst., Electron.*, 1998, pp. 257–262.
- [6] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," *IEEE Trans. Comput.*, vol. C-23, no. 1, pp. 90–93, Jan. 1974.
- [7] K. R. Rao and P. Yip, *Discrete Cosine Transform: Algorithms, Advantages, Applications*. Boston, MA, USA: Academic Press, 1990.
- [8] K. G. Beauchamp, *Applications of Walsh and Related Functions*. London, U.K.: Academic Press, 1984.

- [9] H. Weyl, "Über die Gleichverteilung von Zahlen mod. Eins," *Math. Ann.*, vol. 77, no. 3, pp. 313–352, 1916.
- [10] D. Gabor, "Theory of communication," *J. Inst. Electr. Eng.*, vol. 93, no. 26, pp. 429–457, 1946.
- [11] K. Gröchenig, *Foundations of Time-Frequency Analysis*. Boston, MA, USA: Birkhäuser, 2001.
- [12] S. G. Mallat, "A theory for multiresolution signal decomposition: The wavelet representation," *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 11, no. 7, pp. 674–693, Jul. 1989.
- [13] L. R. Rabiner, R. W. Schafer, and C. M. Rader, "The chirp z-transform algorithm," *IEEE Trans. Audio Electroacoust.*, vol. 17, no. 2, pp. 86–92, Jun. 1969.
- [14] X.-G. Xia, "Discrete chirp-Fourier transform and its application to chirp rate estimation," *IEEE Trans. Signal Process.*, vol. 48, no. 11, pp. 3122–3133, Nov. 2000.
- [15] M. Garrido, J. Grajal, M. A. Sánchez, and O. Gustafsson, "Pipelined radix- $2^k$  feedforward FFT architectures," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 1, pp. 23–32, Jan. 2013.
- [16] C. Loeffler, A. Ligtenberg, and G. S. Moschytz, "Practical fast 1-D DCT algorithms with 11 multiplications," in *Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.*, 1989, pp. 988–991.
- [17] B. J. Fino and V. R. Algazi, "Unified matrix treatment of the fast Walsh-Hadamard transform," *IEEE Trans. Comput.*, vol. C-25, no. 11, pp. 1142–1146, Nov. 1976.
- [18] C. Wolf, "Yosys open synthesis suite," 2016. [Online]. Available: <https://yosyshq.net/yosys/>
- [19] J. Wu *et al.*, "AccelTran: A sparsity-aware accelerator for dynamic inference with Transformers," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 42, no. 2, pp. 423–436, Feb. 2023.
- [20] S. Zhou *et al.*, "A survey of FPGA-based accelerators for convolutional neural networks," *Neural Comput. Appl.*, vol. 33, no. 10, pp. 4523–4563, May 2021.
- [21] Y. Chen *et al.*, "A 65nm 0.39-to-140.3TOPS/W 1-to-12b unified neural network processor using block-circulant-enabled transpose-domain acceleration with 8.1× higher TOPS/mm<sup>2</sup> and 6T HBST-SRAM macro," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, 2020, pp. 138–140.
- [22] G. B. Moody and R. G. Mark, "The impact of the MIT-BIH arrhythmia database," *IEEE Eng. Med. Biol. Mag.*, vol. 20, no. 3, pp. 45–50, May/Jun. 2001.
- [23] M. Ayinala, M. Brown, and K. K. Parhi, "Pipelined parallel FFT architectures via folding transformation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 6, pp. 1068–1081, Jun. 2012.
- [24] P. K. Meher, S. Y. Park, B. K. Mohanty, and K. S. Lim, "Efficient VLSI architecture for decimation-in-time fast Fourier transform of real-valued data," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 12, pp. 2897–2906, Dec. 2015.
- [25] A. C. Mert, E. "Ozt"urk, and E. Savaş, "Design and implementation of a fast and scalable NTT-based polynomial multiplier architecture," in *Proc. Euromicro Conf. Digit. Syst. Design*, 2019, pp. 253–260.
- [26] C.-C. Tseng and S.-L. Lee, "Discrete fractional Fourier transform based on new nearly tridiagonal commuting matrices," *IEEE Trans. Signal Process.*, vol. 65, no. 17, pp. 4456–4470, Sep. 2017.
- [27] A. Carpegna, A. Savino, and S. Di Carlo, "Spiker+: A framework for the generation of efficient spiking neural network FPGA accelerators for inference at the edge," *IEEE Trans. Emerg. Topics Comput.*, vol. 13, no. 3, pp. 784–798, 2024.

**Luis Michael Minier** is an independent researcher based in the USA. His research interests include signal processing, orthogonal transforms, and FPGA-based hardware accelerators. He is the inventor of USPTO Patent Application No. 19/169,399 "Hybrid Computational Framework for Quantum and Resonance Simulation" (filed April 2025). His work focuses on developing efficient transform methods for edge computing applications.