

# Vincent Hwang

[Email](#) | [Github](#) | [Personal Website](#) | [Google Scholar](#) | [DBLP](#)

Date of this document: 2025-10-31.

## 1 Education

|                                                                                                                                         |                                                                |
|-----------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
| PhD, Cryptographic Engineering<br>Max Planck Institute for Security and Privacy<br>Advisor: Peter Schwabe                               | Germany   Jan. 2023 – Feb. 2026 (expected, defense scheduling) |
| MSc., Department of Computer Science and Information Engineering<br>National Taiwan University<br>Advisors: Yen-Huan Li and Bo-Yin Yang | Taiwan   Sept. 2021 - Jun. 2022                                |
| BSc., Department of Computer Science and Information Engineering<br>National Taiwan University                                          | Taiwan   Sept. 2016 - Jun. 2021                                |

## 2 Programming Skills

- Assembly (very familiar): Armv7-M, Armv7E-M, Armv8-A, AVX2. Low-level optimizations.
- Assembly (somewhat familiar): Armv9-A, AVX-512. Low-level optimizations. Ongoing research.
- C (very familiar): Primary for interfacing between the assembly and the high-level api. Sometimes I use function pointers for unit tests.
- C++ (somewhat familiar): Primary about templates for scalability. I rarely use standard libraries as they are not built for cryptographic uses where various secure programming practice must be employed.
- CUDA (somewhat familiar): Ongoing research.
- Haskell (some experience): Scripts for generating some programs and constants used in low-level optimizations.

## 3 Research Interests

I'm doing research in cryptographic engineering. Cryptographic engineering is a field exploring how the mathematical constructs in cryptography could/should be best implemented in real-world computing devices. I position myself as a computer scientist turning the high-level ideas into optimized computer programs. Frequently, this amounts to several iterations of refinements for the high-level ideas and experiments through assembly programming. I programmed the computationally-intensive polynomial arithmetic in lattice-based cryptosystems for embedded devices (Armv7-M onward) and high performance processors (Armv8-A onward for `aarch64`, AVX2 extension onward for `x86-64`). I coauthored implementation papers targeting lattice-based candidates NTRU, NTRU Prime, and Saber, and standards `m1-kem` and `m1-dsa` (previously known as Kyber and Dilithium) of the Post-Quantum Cryptography Standardization by the National Institute of Standards and Technology (NIST). Optimizing polynomial arithmetic for cryptography usually involves some basics of algebra and various assembly languages. I'm also working on other topics like formal verification, and exploring interesting computing devices from time to time. Recently, I'm mainly programming for the elliptic-curve discrete logarithm on the H100 GPU, and also optimizing the mathematically-heavy NIST standard `fn-dsa` (previously known as Falcon) in assembly.

**Key words:** Cryptographic engineering, assembly programming, post-quantum cryptography, and practical integer and polynomial multiplications.

## 4 Services

- 2026: Reviewer of TCCHES 2026 ( $\times 1$ ), (Incoming) Artifact Committee Member of TCCHES 2026.
- 2025: Reviewer of TCCHES 2025 ( $\times 8$ ), ArcticCrypt 2025 ( $\times 1$ ), CT-RSA 2025 ( $\times 2$ ), JCEN ( $\times 1$ ); Artifact Committee Member of TCCHES 2025 ( $\times 5$ ).
- 2024: Reviewer of Crypto 2024 ( $\times 1$ ), TCCHES 2024 ( $\times 3$ ).
- 2023: Artifact Committee Member of TCCHES 2023 ( $\times 2$ ).

TCCHES = Transactions on Cryptographic Hardware and Embedded Systems;

CT-RSA = The Cryptographers' Track at RSA Conference;

JCEN = Journal of Cryptographic Engineering;

( $\cdot$ ) = The number of reviews submitted.

## 5 Publications (Reversed Chronological Order)

Author names in alphabetical order. \* = Contributions included in the PhD thesis. \*\* = Contributions included in the Master's thesis.

---

PhD program ongoing.

---

- 15\*. Gilles Barthe, Gustavo Xavier Delerue Marinho Alves, Hugo Pacheco, José Bacelar Almeida, Luís Esquivel, Manuel Barbosa, Peter Schwabe, Pierre-Yves Strub, Tiago Oliveira, and Vincent Hwang. Faster Verification of Faster Implementations: Combining Deductive and Circuit-Based Reasoning in EasyCrypt. In *2025 IEEE Symposium on Security and Privacy (SP)*, pages 3526–3544. IEEE Computer Society, 2025. Paper. IACR ePrint. Reference [AAB<sup>+</sup>25].  
**Summary:** We combined deductive reasoning and circuit-based reasoning in EasyCrypt, verified the AVX2-optimized rejection sampling in Kyber for the first time and the equivalences of the optimized implementations of the compression functions, and simplified the verification of the AVX2-optimized Keccak permutation. I proposed the optimizations for the compression functions and wrote the corresponding part in the paper.
- 14\*. Vincent Hwang, YoungBeom Kim, and Seog Chung Seo. Multiplying Polynomials without Powerful Multiplication Instructions. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2025(1):160–202, 2024. Paper. Artifact. Slides. IACR ePrint. Reference [HKS24].  
**Summary:** We optimized Dilithium and Saber on Cortex-M3 and 8-bit AVR. For the Dilithium coefficient ring modulo a prime, we generalized Barrett multiplication and proposed several versions suitable for multi-limb arithmetic on platforms without powerful multiplication instructions. For the power-of-two-coefficient ring case, we showed that Nussbaumer performs the best if there are no precision issues. I proposed the ideas, implemented the Cortex-M3 optimizations, and wrote the corresponding parts of the paper.
- 13\*. Vincent Hwang. Formal Verification of Emulated Floating-Point Arithmetic in Falcon. In *International Workshop on Security*, pages 125–141. Springer, 2024. Paper. Artifact. Slides. IACR ePrint. Reference [Hwa24b].  
**Summary:** This paper verified the functional equivalences of the software-emulated floating-point additions/multiplications and the range of the intermediate floating-point values of the FFT computations in the signature generation of Falcon with the domain-specific language CryptoLine. I proposed the ideas, carried out the verification, and wrote the paper.
- 12\*. Vincent Hwang. A Survey of Polynomial Multiplications for Lattice-Based Cryptosystems. *IACR Communications in Cryptology*, 1(2), 2024. Paper. IACR ePrint. Reference [Hwa24a].  
**Summary:** This paper reviewed several techniques for multiplying polynomials in polynomial rings of the form  $\mathbb{Z}_q[x]/\langle x^n - \alpha x - \beta \rangle$  in practice. There are three emphases: modular arithmetic, homomorphisms, and vectorization. I wrote the paper.
- 11\*. Vincent Hwang. Pushing the Limit of Vectorized Polynomial Multiplication for NTRU Prime. In *Australasian Conference on Information Security and Privacy*, pages 84–102. Springer, 2024. Paper. Artifact. Slides. IACR ePrint. Reference [Hwa24c].  
**Summary:** We furthered the NTRU Prime optimizations by [HLY24] and optimized on Cortex-A72 with Armv8-A Neon and on Haswell with AVX2. The paper systematically reviewed the notion of vectorization on various platforms, and replaced Rader's FFT with truncated Rader's FFT. I proposed the ideas, implemented the optimizations, and wrote the paper.
- 10\*. Vincent Hwang, Chi-Ting Liu, and Bo-Yin Yang. Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime. In *International Conference on Applied Cryptography and Network Security*, pages 24–46. Springer, 2024. Paper. Artifact. Slides. IACR ePrint. Reference [HLY24].  
**Summary:** We optimized NTRU Prime on Cortex-A72 with Armv8-A Neon. We explored Schönhage and Bruun's FFTs, integrated them with Good–Thomas and Rader's FFTs, and proposed two approaches for multiplying polynomials in NTRU Prime. Our approaches significantly reduced the number of small-dimensional polynomial multiplications while vectorizing for NTRU Prime. I proposed the optimizations, implemented the Bruun's FFT and the fastest approach, and wrote the paper.
- 9\*. Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, and Bo-Yin Yang. Algorithmic Views of Vectorized Polynomial Multipliers – NTRU. In Anupam Chattopadhyay, Shivam Bhasin, Stjepan Picek, and Chester Rebeiro, editors, *Progress in Cryptology – INDOCRYPT 2023*, pages 177–196. Springer, 2024. Paper. Artifact. Slides. IACR ePrint. Reference [CCHY24].  
**Summary:** We optimized NTRU on Cortex-A72 with Armv8-A Neon. We improved the design choice of Toom–Cook and implemented the Toeplitz matrix-vector product approach for the polynomial multiplications in NTRU. I proposed the improvement of Toom–Cook, integrated the polynomial multipliers by coauthors, implemented the then-not-yet-deployed obvious optimizations, wrote the non-implementation part of the paper, and refined the implementation part of the paper.

- 8\*. Vincent Hwang, Jiaxiang Liu, Gregor Seiler, Xiaomu Shi, Ming-Hsien Tsai, Bow-Yaw Wang, and Bo-Yin Yang. Verified NTT Multiplications for NISTPQC KEM Lattice Finalists: Kyber, SABER, and NTRU. 2022. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(4):718–750, 2022. Paper. Reference [HLS<sup>+</sup>22].  
**Summary:** We verified the assembly-optimized NTT-based polynomial multiplications in Kyber, Saber, and NTRU on Cortex-M4 and on Skylake with AVX2 with the domain-specific language CryptoLine. I rewrote the assembly implementation of the polynomial multiplication for NTRU on Cortex-M4, explained the program structure of the assembly implementations on Cortex-M4, and drew some figures in the paper.
- 7\*\*. Erdem Alkim, Vincent Hwang, and Bo-Yin Yang. Multi-Parameter Support with NTTs for NTRU and NTRU Prime on Cortex-M4. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(4):349–371, 2022. Paper. Artifact. Talk. IACR ePrint. Reference [AHY22].  
**Summary:** We revisited NTRU and NTRU Prime on Cortex-M4. We improved the initial butterflies of the Good–Thomas FFT, and the odd-radix butterflies, and proposed incomplete Good–Thomas FFT for code-size optimization. I proposed the optimizations, implemented the optimizations, and wrote the paper.
- 6\*. Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Lorenz Panny, and Bo-Yin Yang. Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms. In *International Workshop on Security*, pages 3–23. Springer, 2022. Paper. Artifact. IACR ePrint. Reference [BHK<sup>+</sup>22].  
**Summary:** We explored the FFT-based integer multiplications on Cortex-M3 and Cortex-M55, and found that the crossover point of the bit-size/performance of the FFT-based and the  $\Theta(n^2)$  non-FFT-based integer multiplications is around 2048 bits. I was involved in the Cortex-M3 implementation.
- 5\*. Amin Abdulrahman, Vincent Hwang, Matthias J. Kannwischer, and Amber Sprinkels. Faster Kyber and Dilithium on the Cortex-M4. In *International Conference on Applied Cryptography and Network Security*, pages 853–871. Springer, 2022. Paper. Artifact. IACR ePrint. Reference [AHKS22].  
**Summary:** We revisited the optimization of Dilithium and Kyber on Cortex-M4. We incorporated assembly optimizations from prior Cortex-M4 works [ACC<sup>+</sup>20, CHK<sup>+</sup>21, ACC<sup>+</sup>21] and Cortex-A72 work [BHK<sup>+</sup>21], and optimized the challenge polynomial multiplications in Dilithium. I proposed the Fermat number transform for the challenge polynomial multiplication in Dilithium, an optimization for the base multiplication in Kyber, and wrote the corresponding part of the paper with other coauthors.
- 4\*\*. Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang, and Shang-Yi Yang. Neon NTT: Faster Dilithium, Kyber, and Saber on Cortex-A72 and Apple M1. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(1):221–244, 2021. Paper. IACR ePrint. Reference [BHK<sup>+</sup>21].  
**Summary:** We rediscovered the multiplicative form of Barrett reduction, discovered a correspondence between Montgomery and Barrett multiplications, implemented the optimizations on Cortex-A72 and Apple M1 with Armv8-A Neon, and integrated the optimizations to Dilithium, Kyber, and Saber. I implemented the optimizations and wrote the corresponding parts of the paper.
- 3\*\*. Amin Abdulrahman, Jiun-Peng Chen, Yu-Jia Chen, Vincent Hwang, Matthias J. Kannwischer, and Bo-Yin Yang. Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(1):127–151, 2021. Paper. Artifact. Talk. Slides. IACR ePrint. Reference [ACC<sup>+</sup>21].  
**Summary:** Continuing the Saber optimizations on Cortex-M4 by [CHK<sup>+</sup>21], we further optimized Saber on Cortex-M3, the memory footprint on Cortex-M3 and Cortex-M4, and the masked implementation on Cortex-M4, and concluded that NTT is the fastest approach for Saber. I proposed the ideas, implemented the Cortex-M4 optimizations and the fastest approach on Cortex-M3, and wrote the corresponding parts of the paper.

- 2\*. Chi-Ming Marvin Chung, Vincent Hwang, Matthias J. Kannwischer, Gregor Seiler, Cheng-Jhih Shih, and Bo-Yin Yang. NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2021(2):159–188, 2021. Paper. Artifact. Talk. Slides. IACR ePrint. Reference [CHK<sup>+</sup>21].  
**Summary:** We migrated and implemented the Good–Thomas approach by [ACC<sup>+</sup>20] to all the parameter sets of NTRU on Cortex-M4. Additionally, we also optimized Saber on Cortex-M4 and NTRU and Saber on Skylake with AVX2. I was deeply involved in the Cortex-M4 implementations.
- 1\*. Erdem Alkim, Dean Yun-Li Cheng, Chi-Ming Marvin Chung, Hülya Evkan, Leo Wei-Lun Huang, Vincent Hwang, Ching-Lin Trista Li, Ruben Niederhagen, Cheng-Jhih Shih, Julian Wälde, and Bo-Yin Yang. Polynomial Multiplication in NTRU Prime: Comparison of Optimization Strategies on Cortex-M4. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2021(1):217–238, 2020. Paper. Artifact. Talk. Slides. IACR ePrint. Reference [ACC<sup>+</sup>20].  
**Summary:** We proposed three assembly-optimized NTT-based implementations for the polynomial multiplication in NTRU Prime on Cortex-M4. I was deeply involved in the Good–Thomas implementation.

## References

- [AAB<sup>+</sup>25] José Bacelar Almeida, Gustavo Xavier Delerue Marinho Alves, Manuel Barbosa, Gilles Barthe, Luís Esquivel, Vincent Hwang, Tiago Oliveira, Hugo Pacheco, Peter Schwabe, and Pierre-Yves Strub. Faster Verification of Faster Implementations: Combining Deductive and Circuit-Based Reasoning in EasyCrypt. In *2025 IEEE Symposium on Security and Privacy (SP)*, pages 3820–3838. IEEE, 2025.
- [ACC<sup>+</sup>20] Erdem Alkim, Dean Yun-Li Cheng, Chi-Ming Marvin Chung, Hülya Evkan, Leo Wei-Lun Huang, Vincent Hwang, Ching-Lin Trista Li, Ruben Niederhagen, Cheng-Jhih Shih, Julian Wälde, and Bo-Yin Yang. Polynomial Multiplication in NTRU Prime: Comparison of Optimization Strategies on Cortex-M4. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2021(1):217–238, 2020. <https://tches.iacr.org/index.php/TCHES/article/view/8733>.
- [ACC<sup>+</sup>21] Amin Abdulrahman, Jiun-Peng Chen, Yu-Jia Chen, Vincent Hwang, Matthias J. Kannwischer, and Bo-Yin Yang. Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(1):127–151, 2021. <https://tches.iacr.org/index.php/TCHES/article/view/9292>.
- [AHKS22] Amin Abdulrahman, Vincent Hwang, Matthias J. Kannwischer, and Amber Sprenkels. Faster Kyber and Dilithium on the Cortex-M4. In *Applied Cryptography and Network Security: 20th International Conference, ACNS 2022, Rome, Italy, June 20–23, 2022, Proceedings*, pages 853–871. Springer, 2022. [https://link.springer.com/chapter/10.1007/978-3-031-09234-3\\_42](https://link.springer.com/chapter/10.1007/978-3-031-09234-3_42).
- [AHY22] Erdem Alkim, Vincent Hwang, and Bo-Yin Yang. Multi-Parameter Support with NTTs for NTRU and NTRU Prime on Cortex-M4. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(4):349–371, 2022. <https://tches.iacr.org/index.php/TCHES/article/view/9823>.
- [BHK<sup>+</sup>21] Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang, and Shang-Yi Yang. Neon NTT: Faster Dilithium, Kyber, and Saber on Cortex-A72 and Apple M1. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2022(1):221–244, 2021. <https://tches.iacr.org/index.php/TCHES/article/view/9295>.
- [BHK<sup>+</sup>22] Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Lorenz Panny, and Bo-Yin Yang. Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms. In *International Workshop on Security*, pages 3–23. Springer, 2022. [https://link.springer.com/chapter/10.1007/978-3-031-15255-9\\_1](https://link.springer.com/chapter/10.1007/978-3-031-15255-9_1).
- [CCHY24] Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, and Bo-Yin Yang. Algorithmic Views of Vectorized Polynomial Multipliers – NTRU. In Anupam Chattopadhyay, Shivam Bhasin, Stjepan Picek, and Chester Rebeiro, editors, *Progress in Cryptology – INDOCRYPT 2023*, pages 177–196. Springer, 2024. [https://link.springer.com/chapter/10.1007/978-3-031-56235-8\\_9](https://link.springer.com/chapter/10.1007/978-3-031-56235-8_9).
- [CHK<sup>+</sup>21] Chi-Ming Marvin Chung, Vincent Hwang, Matthias J. Kannwischer, Gregor Seiler, Cheng-Jhih Shih, and Bo-Yin Yang. NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2021(2):159–188, 2021. <https://tches.iacr.org/index.php/TCHES/article/view/8791>.
- [HKS24] Vincent Hwang, YoungBeom Kim, and Seog Chung Seo. Multiplying Polynomials without Powerful Multiplication Instructions. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2025(1):160–202, 2024. <https://tches.iacr.org/index.php/TCHES/article/view/11926>. Extended from <https://eprint.iacr.org/2023/1955>. Full version available at <https://eprint.iacr.org/2024/1649>.
- [HLS<sup>+</sup>22] Vincent Hwang, Jiaxiang Liu, Gregor Seiler, Xiaomu Shi, Ming-Hsien Tsai, Bow-Yaw Wang, and Bo-Yin Yang. Verified NTT Multiplications for NISTPQC KEM Lattice Finalists: Kyber, SABER, and NTRU. *IACR Transactions on Cryptographic Hardware and Embedded Systems*, pages 718–750, 2022. <https://tches.iacr.org/index.php/TCHES/article/view/9838>.
- [HLY24] Vincent Hwang, Chi-Ting Liu, and Bo-Yin Yang. Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime. In *International Conference on Applied Cryptography and Network Security*, pages 24–46. Springer, 2024. [https://link.springer.com/chapter/10.1007/978-3-031-54773-7\\_2](https://link.springer.com/chapter/10.1007/978-3-031-54773-7_2).
- [Hwa24a] Vincent Hwang. A Survey of Polynomial Multiplications for Lattice-Based Cryptosystems. *IACR Communications in Cryptology*, 1(2), 2024. <https://cic.iacr.org/p/1/2/1>.
- [Hwa24b] Vincent Hwang. Formal Verification of Emulated Floating-Point Arithmetic in Falcon. In *International Workshop on Security*, pages 125–141. Springer, 2024. [https://dl.acm.org/doi/10.1007/978-981-97-7737-2\\_7](https://dl.acm.org/doi/10.1007/978-981-97-7737-2_7).
- [Hwa24c] Vincent Hwang. Pushing the Limit of Vectorized Polynomial Multiplication for NTRU Prime. In *Australasian Conference on Information Security and Privacy*, pages 84–102. Springer, 2024. [https://link.springer.com/chapter/10.1007/978-981-97-5028-3\\_5](https://link.springer.com/chapter/10.1007/978-981-97-5028-3_5).