

# A Dataflow Overlay for Monte Carlo Multi-Asset Option Pricing on AMD Versal AI Engines

**Mohamed Bouaziz**<sup>1</sup>, Michael Samet<sup>2</sup>, Suhaib A. Fahmy<sup>1</sup>

<sup>1</sup>: KAUST, Saudi Arabia, <sup>2</sup>: RWTH Aachen University, Germany



# Multi-Asset Option Pricing



Fig. : Asset price over time.

Price evolution  
(GBM model)  
*Multiple Assets*

$$\left. \begin{array}{l} S_0(T) = S_0(0) \exp \left( \left[ r - \frac{1}{2} \sigma_0^2 \right] T + \sigma_0 \sqrt{T} \sum_{1 \leq j \leq d} L_{0,j} Z_j \right) \\ S_1(T) = S_1(0) \exp \left( \left[ r - \frac{1}{2} \sigma_1^2 \right] T + \sigma_1 \sqrt{T} \sum_{1 \leq j \leq d} L_{1,j} Z_j \right) \\ \vdots \\ S_d(T) = S_d(0) \exp \left( \left[ r - \frac{1}{2} \sigma_d^2 \right] T + \sigma_d \sqrt{T} \sum_{1 \leq j \leq d} L_{d,j} Z_j \right) \end{array} \right\}$$

*Correlated variates of the option's multiple assets*

# Multi-Asset Option Pricing

*Brownian motion (source of randomness)*

Price  
evaluation  
(GBM model)

$$S_i(T) = S_i(0) \exp \left( \left[ r - \frac{1}{2} \sigma_i^2 \right] T + \sigma_i \sqrt{T} \sum_{1 \leq j \leq d} L_{i,j} Z_j \right)$$



- Monte Carlo Simulation ✓  
 → Compute intensive ✗

Existing work:

- Mostly focuses on pricing many single-asset options in parallel.
- Lacks a proper, scalable dataflow design.

→ ***Need for a highly parallel dataflow design of the MC-based pricer***

# AMD Versal SoC



# Dataflow Overlay Architecture



**Fig. : Pricer flow.**



**Fig. : Pricer flow mapping to single CU.**

# Single Compute Unit Design



- 128-bit (4 x 32-bit) based SIMD operations.
- Long period generator.

# Single Compute Unit Design



- 128-bit (4 x 32-bit) based SIMD operations:
- Bitwise logical operations.
  - Per-lane shifting operations.
  - Vector shifting operations.

*These are not natively supported!*

*Emulating these operations by  
composing supported ones.*



**Fig. : 128-bit vector shift left operation**

For more PRNG exploration on AI Engines: *M. Bouaziz, S. A. Fahmy: “[PRNGine](#): Massively Parallel Pseudo-Random Number Generation and Probability Distribution Approximations on AMD AI Engines”, IPDPSW’25*

# Single Compute Unit Design



# Single Compute Unit Design



# The Overlay Architecture

- 3-AIE – based compute units are replicated throughout the entire array
- Simulation parameters and the results are streamed in and out.



Fig.: Mapping of 133 CUs to AIE array.



Fig.: Tasks allocation on Versal SoC.

# Experimental Setup

## FPGA

- Dataflow design using the AMD Vitis quant. finance library.
- 1—**12 CUs** dataflow.



## CPU

- C++ based multi-threaded implementation using OpenMP.
- 1—**128 parallel threads** on AMD EPYC 7763.



## GPU

- CUDA-based implementation on Nvidia RTX A6000 GPU.

# Performance result



Fig. : Execution time over multiple paths.



Fig. : Execution time over multiple time steps.



12. 9—25. 7x faster than FPGA, 10. 66—13. 41x faster multi-threaded CPU,  
0. 45—0. 73x as fast as GPU but 1. 82x more energy efficient.

# Performance projection

For a comparable-sized device, what would performance look like?

$$\rightarrow [maximum\ speedup]_{ij} = speedup_{ij} \times \frac{para\_cu(design_i)}{para\_cu(design_j)}$$

$$para\_cu(AIE) = 133 \quad \begin{cases} para\_cu(CPU) = 128 & \rightarrow maximum\ speedup = 10.26 - 12.9x \\ para\_cu(FPGA) = 12 & \rightarrow maximum\ speedup = 1.16 - 2.32x \end{cases}$$

$$para\_cu(AIE) = 3192 \quad para\_cu(GPU) = 10752 \quad \rightarrow \quad maximum\ speedup = 1.52 - 2.29x$$

# Conclusion

- We leverage the network of high-performance stream interconnects and the SIMD capabilities of the AMD AI Engines to compose a dataflow of overlay for Monte Carlo multi-asset option pricing.
- We propose a design that efficiently makes use of the heterogeneous components (AIEs, FPGA) to run the accelerator at high performance.
- We effectively outperform FPGA and CPU in speedup and GPU in energy efficiency.

# A Dataflow Overlay for Monte Carlo Multi-Asset Option Pricing on AMD Versal AI Engines



[Github Repository](#)



[Reach out via email](#)



[Connect on LinkedIn](#)