

### An Integrated FPGA Implementation of Complete GNN-Based Trajectory Reconstruction

Yun-Chen Yang\*, Hao-Chun Liang\*, Hsuan-Wei Yu\*,

Bo-Cheng Lai\*, Shih-Chieh Hsu†, Mark Neubauer‡, Santosh Parajuli‡

National Yang Ming Chiao Tung University, Hsinchu, Taiwan\*, University of Washington, USA† University of Illinois Urbana-Champaign, USA‡

#### Motivation & Challenge



- Collaborate with CERN
  - Proton-Beam Collisions
  - Detector Produce Hits
- High-Volume Collision Hits
  - L1T System Filtering
  - via Track Reconstruction
- HL-LHC Requirement
  - Latency: 4  $\mu$ s
  - Throughput: 2.22MHz

- Accuracy-Focused, Execution-Speed-Agnostic Approach [7, 8]
  - CPU-Based Limits Hindering Task-Specific Tuning
  - GPU-Based Inefficiency in Latency-Critical Scenarios
    - Missed Microsecond Targets By Milliseconds
- FPGA-Accelerated Framework
  - Scope Constrained to GNN Edge-Classification Stage [9,10]
    - Host-to-Device Data Transfers Latency
    - FPGA Resources Underutilization
  - Throughput-Driven Processing In Minor Graph Subregions<sup>[11]</sup>
    - Small Subgraph Lowers Accuracy

#### Trajectory Reconstruction



- Graph Construction – Map Hits to Nodes and Filter Directed Edges
- Edge Classification – Assign Probabilities to Edges
- Track Construction – Integrate Edges with High Probabilities



- Interaction Network (IN) Framework<sup>[6]</sup>
  - Graph Neural Network for Object-Object Interaction Modeling

#### Configuration



- Cross-Sectional View of Cylindrical Collider Detector Architecture
  - 4x Cylindrical Barrels and 14x Planar Endcaps
- Hit Distribution across Segment
  - Longitudinal Segmentation along the Z Axis
  - Azimuthal Segmentation along the  $\Phi$ -Axis into 2/4/8 Sectors
    - Adapted to Support Three Distinct Design Variants
  - Spatial-Multiplexed FPGA for Parallel Processing

#### Architecture



- Proposed LLGNNNT10
  - Edge Classification from Batch Processing<sup>[10]</sup> to Data Streaming
    - 52.3% Latency Reduction from 2.86  $\mu$ s to 1.365  $\mu$ s

#### Algorithm & Conclusion

- Geometry-Aware Edge Pruning in Graph Construction
  - Edge Count Reduction Achieving 37.5%–71.0% Pruning<sup>[7]</sup>
- Without  $\phi$ -Subdivision
  -
- With  $\phi$ -Subdivision
  -
- # Edge Compound (K)
 

| Phi Range | Proposed | Baseline |
|-----------|----------|----------|
| 1/4 $\pi$ | 37.5%    | 55.1%    |
| 1/2 $\pi$ | 55.1%    | 71.0%    |
| $\pi$     | 71.0%    | 71.0%    |
- Probability-Based Sequential Building in Track Construction
  - LUT-Based Mapping from  $B_0$  (Blue),  $E_0$  (Brown), Other (Green)
  - Latency Reduction Ranging from 107x to 641x<sup>[7]</sup>
- DBSCAN-Based<sup>[7]</sup>
  -
- First End-to-End FPGA of GNN-Based Trajectory Reconstruction
  - 65024x Acceleration with Enhanced Accuracy Metrics<sup>[7,9]</sup>
  - Throughput 2.35 MHz with Latency 2.36  $\mu$ s Meets L1T Criteria