

# Research Timeline of Master Thesis

Hao-Chun Liang

November 2024 – January 2026

|                |                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2024.11–2025.1 | <ul style="list-style-type: none"><li>• Established the traccc (GPU track reconstruction) environment</li></ul>                                                                                                                                                                                                                                                                                                                                    |
| 2025.2–2025.3  | <ul style="list-style-type: none"><li>• Studied traccc code and algorithms (over 70,000 lines)</li><li>• Analyzed next steps and reported to parallelization meeting</li><li>• Revised the manuscript for YunChen's paper</li></ul>                                                                                                                                                                                                                |
| 2025.4         | <ul style="list-style-type: none"><li>• Set up the Nvidia Nsight Systems (profiler) environment</li><li>• Profiled traccc to identify bottlenecks</li><li>• Created figures for YunChen's paper</li></ul>                                                                                                                                                                                                                                          |
| 2025.5         | <ul style="list-style-type: none"><li>• Analyzed bottlenecks; attempted code modifications and debugging</li><li>• Split the fit kernel, increasing throughput by <b>10%</b></li><li>• Assisted YunChen with the <b>VLSICAD</b> (conference) submission</li></ul>                                                                                                                                                                                  |
| 2025.6         | <ul style="list-style-type: none"><li>• Replaced Kalman gain matrix (track fitting computation) operations with INT8 MLP</li><li>• Achieved <b>186%</b> speedup but observed <b>physics accuracy degradation</b></li><li>• Reported results to parallelization meeting</li><li>• Assisted YunChen with the <b>TJCAS</b> (conference) submission</li><li>• Attempted Nsight Compute (kernel profiler) setup (severe environmental issues)</li></ul> |
| 2025.7         | <ul style="list-style-type: none"><li>• Prepared slides &amp; scripts (EN/CN) for YunChen's <b>VLSICAD 2025</b> oral</li><li>• Prepared slides &amp; scripts for the <b>TJCAS</b> oral presentation</li><li>• Successfully established the Nvidia Nsight Compute (NCU) environment</li></ul>                                                                                                                                                       |
| 2025.8         | <ul style="list-style-type: none"><li>• Created posters for <b>TJCAS</b> and <b>FastML</b></li><li>• Attended <b>VLSICAD</b> and <b>TJCAS</b></li></ul>                                                                                                                                                                                                                                                                                            |

— Continued on next page —

- 
- 2025.9**
- Attended **FastML**; started planning for Weak Lensing competition
  - Assisted Edwin with NCU profiling of Token Reduction
  - Established basic NCU Profile Flow Optimization for large profile data
  - → *Begin: Ongoing NCU Profile Flow Optimization*
- 2025.10**
- Put coursework aside for Weak Lensing competition (solo sprint)
- 2025.11**
- Continued Weak Lensing sprint until mid-November; caught up on coursework
  - Conducted NCU profiling on traccc; identified potential for batching
- 2025.12**
- Completed multi-event batching optimization (mid-month)
  - Achieved **93%** speedup with no physics accuracy loss
  - Researched next steps; attempted several incorrect approaches
  - Recorded an algorithm tutorial video for junior students
- 2026.1**
- Implemented conditional Jacobian matrix aggregation (**18%** speedup, no degradation)
  - Observation: Batching shifted find/fit (track finding and fitting) from memory-bound to compute-bound
  - Analysis: FPGA infeasible for high FLOPS tasks (strict FP64 requirements)
  - Future: Thesis focuses on GPU register pressure optimization

### Legend

• Technical/Optimization    ● Conference/Paper    • Collaboration/Other

**Green Badge** = Performance Achievement    **Red Text** = Accuracy Concern    **Blue Sidebar** = NCU Flow Optimization Period

### Key Performance Achievements

|             |                                                     |         |
|-------------|-----------------------------------------------------|---------|
| <b>10%</b>  | Fit kernel splitting                                | 2025.5  |
| <b>186%</b> | INT8 MLP replacement (with accuracy trade-off)      | 2025.6  |
| <b>93%</b>  | Multi-event batching (no accuracy loss)             | 2025.12 |
| <b>18%</b>  | Conditional Jacobian aggregation (no accuracy loss) | 2026.1  |