

# PHOTONIC COMPUTING SYSTEM

Technical Design Document

Silicon Photonics + TFLN Modulator Architecture

Version 1.0 | February 20, 2026 | CONFIDENTIAL

**49,585 TOPS**

**19.71 Tbps**

**953 TOPS/W**

**150x**

Peak Throughput

Aggregate BW

Efficiency

Speedup vs GPU

---

# Table of Contents

| §   | Section                           | Page |
|-----|-----------------------------------|------|
| 1.  | Introduction & System Overview    | 3    |
| 2.  | Architecture Overview             | 3    |
| 3.  | Photonic Core Module              | 4    |
| 4.  | TFLN Modulator Subsystem          | 5    |
| 5.  | PCIe Interface & Host Integration | 6    |
| 6.  | FPGA Hybrid Integration           | 7    |
| 7.  | WDM Optical Fabric                | 7    |
| 8.  | REST API Reference                | 8    |
| 9.  | Performance Metrics               | 9    |
| 10. | PCB Stack-Up & Manufacturing      | 10   |
| 11. | FEA Simulation Results            | 10   |
| 12. | Test Plan Summary                 | 11   |
| 13. | Security & Reliability            | 11   |
| 14. | Glossary                          | 12   |

---

# 1. Introduction & System Overview

The **LightRails AI Photonic Computing System** is a ground-up silicon-photonics accelerator platform that replaces electronic matrix arithmetic with optical interference, achieving sub-nanosecond matrix-vector products at energy costs orders of magnitude below GPU equivalents. This document is the authoritative Technical Design Document (TDD) for the system, covering hardware architecture, software interfaces, performance data, manufacturing constraints, and test methodology.

## Document Scope

- Photonic core: Clements MZI mesh with ring-resonator amplitude weighting and SCL in-situ learning.
  - TFLN modulator subsystem: MZM, ring modulators, EO switches at 400 G – 800 G data rates.
  - PCIe Gen 5 host interface: DMA engine, MMIO register map, multi-board cluster.
  - FPGA hybrid integration: logic + photonic co-scheduling.
  - WDM optical fabric: 64-channel C-band multiplexing.
  - REST API server (Flask), Gerber/CNC/KiCad file services, FEA solver.
- 

# 2. Architecture Overview

The system follows a layered architecture separating the optical compute plane from the electronic control and I/O plane. A PCIe Gen 5 link bridges the host CPU and the photonic accelerator board; an FPGA provides programmable glue logic for scheduling, calibration, and low-latency control.

| Layer           | Component                            | Technology                                      |
|-----------------|--------------------------------------|-------------------------------------------------|
| Host Software   | Flask REST API   Python SDK          | Python 3.9 / NumPy / Numba                      |
| Host Interface  | PCIe Gen 5 x16   DMA Engine          | 8 ch. DMA, MMIO @ 0xF0000000                    |
| Control Plane   | Hybrid FPGA-Photonic Scheduler       | 1 M logic cells, 512-wide mesh                  |
| Compute Plane   | Photonic Matrix Multiplier (128x128) | Clements MZI + Ring Resonators                  |
| Modulation      | TFLN MZM + Ring Modulators           | X-cut LiNbO <sub>3</sub> , V <sub>π</sub> < 2 V |
| Optical Fabric  | WDM Multiplexer (64 ch.)             | C-band 1530–1565 nm, 100 Gbps/ch                |
| PCB / Packaging | 4-layer FR-4, HHHL PCIe card         | Gerber / KiCad design files                     |

---

## 3. Photonic Core Module

### 3.1 Photonic Matrix Multiplier

The core compute element implements a **Clements decomposition MZI mesh** that realises an arbitrary  $N \times N$  unitary matrix in  $O(N^2)$  MZI stages. Matrix-vector multiplication is performed fully in the optical domain with a propagation latency of ~10 ps (single light-transit).

| Parameter              | Value                                 |
|------------------------|---------------------------------------|
| Matrix dimension (N)   | $128 \times 128$                      |
| Number of MZI elements | $N(N-1)/2 = 8,128$                    |
| Phase encoding range   | $0 - 2\pi$ per MZI ( $\theta, \phi$ ) |
| Propagation latency    | 10 ps (optical)                       |
| Base throughput        | ~16,384 TOPS                          |
| Operating wavelength   | 1550 nm                               |
| Q-factor (resonators)  | 15,000 (default)                      |

### 3.2 Ring-Resonator Amplitude Weighting

Each MZI stage is preceded by a micro-ring resonator that applies a Lorentzian amplitude weight, boosting extinction ratio and SNR. Weights are computed via a vectorised binomial moment expansion (order  $M=6$ ) across the full resonator array —  $O(M \cdot N)$  with zero Python loops.

### 3.3 Statistical Congruential Learning (SCL)

SCL is an in-situ adaptive calibration loop that tunes phase settings without back-propagation. It uses a Knuth/MMIX-style LCG ( $a=6364136223846793005$ ) and the **pigeonhole optimisation**: resonators are partitioned into  $B = \lceil \sqrt{N} \rceil$  bins, requiring only  $O(\sqrt{N})$  random draws per step instead of  $O(N)$ , guaranteeing full coverage by the pigeonhole principle.

| SCL Parameter      | Value                                                                  |
|--------------------|------------------------------------------------------------------------|
| LCG multiplier (a) | 6,364,136,223,846,793,005                                              |
| LCG increment (c)  | 1,442,695,040,888,963,407                                              |
| Modulus (m)        | $2^{32}$                                                               |
| SCL bins (B)       | $\lceil \sqrt{\text{num\_mzi}} \rceil \approx 91$ for $128 \times 128$ |
| Default epochs     | 50                                                                     |
| Default lr         | 0.02                                                                   |
| Training samples   | 24 per stream session                                                  |

## 4. TFLN Modulator Subsystem

Thin-Film Lithium Niobate (TFLN) modulators are used for all high-speed optical I/O. TFLN offers a 3x lower  $V\pi$  compared to silicon EO modulators, enabling PAM4/PAM8 links at 400 G – 800 G with sub-watt drive power.

### 4.1 Material Properties

| Property                            | Value                               |
|-------------------------------------|-------------------------------------|
| Ordinary refractive index ( $n_o$ ) | 2.211 @ 1550 nm                     |
| Extraordinary index ( $n_e$ )       | 2.138 @ 1550 nm                     |
| EO coefficient $r_{33}$             | 30.8 pm/V                           |
| EO coefficient $r_{11}$             | 8.6 pm/V                            |
| Propagation loss (TE)               | 0.27 dB/cm                          |
| Propagation loss (TM)               | 0.30 dB/cm                          |
| $\chi^{(2)}$ (SHG)                  | 27 pm/V                             |
| Thermo-optic dn/dT                  | $3.0 \times 10^{-4} \text{ K}^{-1}$ |

### 4.2 Mach-Zehnder Modulator Specs

| Parameter              | Spec                                                     |
|------------------------|----------------------------------------------------------|
| Interaction length     | 5,000 $\mu\text{m}$ (active), 15 mm (link)               |
| Electrode gap          | 10 $\mu\text{m}$ (photonic core), 6 $\mu\text{m}$ (link) |
| Wafer cut              | X-cut ( $r_{33}$ mode)                                   |
| Bandwidth              | 100 GHz (3 dB)                                           |
| Insertion loss         | 3.5 dB                                                   |
| Extinction ratio       | 25 dB                                                    |
| $V\pi$ (API spec)      | 2.5 V                                                    |
| Energy/bit @ 400G PAM4 | < 2 pJ/bit                                               |

### 4.3 Photonic Link Performance

| Link Config        | Data Rate | Reach  | Power  | Energy/bit |
|--------------------|-----------|--------|--------|------------|
| 400G PAM4 (8-lane) | 400 Gbps  | 2.0 km | ~0.8 W | 2.0 pJ/bit |
| 800G PAM4 (8-lane) | 800 Gbps  | 2.0 km | ~1.5 W | 1.9 pJ/bit |
| 800G PAM8 (short)  | 800 Gbps  | 0.5 km | <1.5 W | <2 pJ/bit  |

## 5. PCIe Interface & Host Integration

The photonic accelerator board connects to the host via a **PCIe Gen 5 x16** slot, providing ~492 GB/s of bidirectional bandwidth. An 8-channel DMA engine handles zero-copy transfers; memory-mapped I/O at base address **0xF000\_0000** exposes all photonic component control registers.

| Parameter                | Value                          |
|--------------------------|--------------------------------|
| PCIe generation          | Gen 5 (32 GT/s per lane)       |
| Lane width               | x16                            |
| Raw bandwidth            | 512 Gbps                       |
| Effective BW (128b/130b) | ~492 Gbps (~61.5 GB/s)         |
| DMA channels             | 8 independent                  |
| MMIO base address        | 0xF000_0000                    |
| Form factor              | HHHL (Half-Height Half-Length) |
| Power consumption        | 75 W (board only)              |
| Optical ports            | 16 (4 laser sources)           |

### 5.1 MMIO Register Map

| Register            | Offset | Description                    |
|---------------------|--------|--------------------------------|
| CONTROL             | 0x0000 | System control (reset, enable) |
| STATUS              | 0x0004 | System status flags            |
| LASER_POWER         | 0x0008 | 12-bit DAC laser power         |
| MODULATOR_BIAS      | 0x000C | Modulator bias voltage         |
| PHASE_SHIFTER_0     | 0x0010 | Phase shifter ch.0 (16-bit)    |
| PHASE_SHIFTER_1     | 0x0014 | Phase shifter ch.1 (16-bit)    |
| WDM_CHANNEL_SELECT  | 0x0018 | Active WDM channel select      |
| DMA_CONTROL         | 0x0028 | DMA engine control             |
| PERFORMANCE_COUNTER | 0x002C | TOPS measurement counter       |

## 6. FPGA Hybrid Integration

A hybrid FPGA-photonic co-processor handles workloads that benefit from reconfigurable digital logic alongside optical matrix operations. The FPGA provides 1 M logic cells and schedules partitioning between digital and photonic compute paths.

| Parameter            | Value                  |
|----------------------|------------------------|
| FPGA logic cells     | 1,000,000              |
| FPGA compute         | 1,500 GFLOPS           |
| Photonic mesh size   | 512-wide               |
| Photonic throughput  | 250 TOPS               |
| Optical I/O          | 3.2 Tbps               |
| Combined throughput  | ~250 TOPS + 1.5 TFLOPS |
| Total power (hybrid) | 25 W                   |

## 7. WDM Optical Fabric

The wavelength-division multiplexing subsystem carries 64 independent data channels across the C-band (1530 – 1565 nm). Each channel operates at 100 Gbps, yielding 6.4 Tbps aggregate throughput. Ring resonators tune each channel for drop/add routing.

| Parameter           | Value                    |
|---------------------|--------------------------|
| Number of channels  | 64                       |
| Wavelength range    | 1530 – 1565 nm (C-band)  |
| Channel spacing     | ~0.56 nm (~70 GHz)       |
| Per-channel rate    | 100 Gbps                 |
| Aggregate bandwidth | 6.4 Tbps                 |
| Per-channel power   | -3 dBm (nominal)         |
| Resonator type      | Micro-ring (FSR = 20 nm) |

## 8. REST API Reference

The Flask application server exposes a comprehensive REST API at <http://0.0.0.0:5001>. All data endpoints return JSON. Simulation endpoints accept POST with a JSON body.

| Endpoint                          | Method   | Description                               |
|-----------------------------------|----------|-------------------------------------------|
| /api/performance                  | GET      | Aggregate system performance metrics      |
| /api/matrix_multiplier            | GET      | Matrix multiplier specs & throughput      |
| /api/matrix_multiplier/resonator  | GET      | Ring-resonator weight statistics          |
| /api/matrix_multiplier/scl_stream | GET      | SSE stream: real-time SCL training epochs |
| /api/matrix_multiplier/scl_train  | POST     | Blocking SCL train (legacy)               |
| /api/matrix_multiplier/scl_reset  | POST     | Reset MZI phases & resonances             |
| /api/matrix_multiply              | POST     | Run matrix-vector multiply simulation     |
| /api/wdm                          | GET      | WDM system performance                    |
| /api/wdm/channels                 | GET      | Per-channel wavelength & power            |
| /api/fft                          | GET/POST | FFT specs or run simulation               |
| /api/fft_transform                | POST     | Execute optical FFT simulation            |
| /api/pcie                         | GET      | PCIe board raw performance                |
| /api/pcie_board                   | GET      | PCIe board (frontend format)              |
| /api/pcie_transfer                | POST     | Simulate DMA transfer                     |
| /api/hybrid                       | GET      | FPGA-photonic hybrid specs                |
| /api/cluster                      | GET      | 64-board cluster aggregate metrics        |
| /api/tfln/modulator               | GET      | TFLN MZM specs                            |
| /api/tfln/link                    | GET      | TFLN link specs (800G)                    |
| /api/tfln/link_400g               | GET      | 400G link parameters                      |
| /api/tfln/link_800g               | GET      | 800G link parameters                      |
| /api/tfln/comparison              | GET      | TFLN vs. Si-photonics comparison          |
| /api/tfln/plots                   | GET      | Generate TFLN characterisation plots      |
| /api/gerber/files                 | GET      | List Gerber PCB files                     |
| /api/gerber/view/<file>           | GET      | Parse & return single Gerber layer        |
| /api/gerber/layers                | GET      | All Gerber layers for visualisation       |
| /api/gerber/projections           | GET      | Orthographic PCB projection views         |
| /api/cnc/files                    | GET      | List CNC G-code files                     |
| /api/cnc/view/<file>              | GET      | Parse G-code commands                     |

|                       |      |                                      |
|-----------------------|------|--------------------------------------|
| /api/3d_models        | GET  | List STL/OBJ 3D model files          |
| /api/vlsi/layout      | GET  | VLSI die layout (multi-layer Gerber) |
| /api/fea/simulate     | POST | Run optical mode FEA simulation      |
| /api/kicad/visualize  | GET  | Parse KiCad PCB geometry             |
| /api/kicad/fea        | POST | Thermal + EM FEA on KiCad board      |
| /api/execute_workload | POST | Partition & execute DNN workload     |

## 9. Performance Metrics

System-level performance figures are derived from the live API (/api/performance) combining matrix multiplier, WDM, FFT, PCIe, hybrid, and cluster subsystems.

| Metric            | Single Board | 64-Board Cluster | Unit   |
|-------------------|--------------|------------------|--------|
| Peak Throughput   | ~49,586      | ~500             | TOPS   |
| Aggregate BW      | ~19.71       | 12.8             | Tbps   |
| Energy Efficiency | 953.6        | ~2,500           | TOPS/W |
| Speedup vs. GPU   | 150x         | 150x             | —      |
| Matrix Mult. TOPS | 16,384       | 1.05 M           | TOPS   |
| WDM Bandwidth     | 6.4          | 409.6            | Tbps   |
| PCIe Bandwidth    | ~492         | ~31,488          | Gbps   |
| FFT Throughput    | 50           | 3,200            | TOPS   |
| Total Power       | 52           | 200              | W      |

## 10. PCB Stack-Up & Manufacturing

The LightRails AI photonic board uses a cost-optimised **4-layer FR-4** PCB targeting standard fabrication at JLCPCB / PCBWay. The original 12-layer Rogers design was rationalised to reduce BOM cost by ~60%.

| Layer      | Type   | Net  | Thickness |
|------------|--------|------|-----------|
| F.Cu (Top) | Signal | —    | 35 µm     |
| In1.Cu     | Plane  | GND  | 35 µm     |
| In2.Cu     | Plane  | +3V3 | 35 µm     |
| B.Cu (Bot) | Signal | —    | 35 µm     |

**Gerber file set** (24 files): copper layers, solder mask, silkscreen, board outline, and NC drill files (via 0.3 mm min). An optimised Gerber ZIP is available at [gerber\\_optimized/](#).

## 11. FEA Simulation Results

The FEA solver computes optical mode profiles and thermal maps using a finite-difference method across a 2D cross-section of the photonic waveguide.

| Simulation Type | Parameter | Simulated Result |
|-----------------|-----------|------------------|
|-----------------|-----------|------------------|

|                  |                                           |                  |
|------------------|-------------------------------------------|------------------|
| Optical mode     | Waveguide: w=0.5 μm, h=0.22 μm, λ=1.55 μm | Single-mode TE■■ |
| Thermal (PCB)    | Max component temp (TFLN modulator)       | 45.2 °C          |
| Thermal (PCB)    | Average board temperature                 | 32.1 °C          |
| EM – Impedance   | 50 Ω trace target / actual                | 50.0 / 49.8 Ω    |
| EM – Crosstalk   | Differential pair isolation               | -42.3 dB         |
| EM – Return loss | Signal path return loss                   | -18.5 dB         |

## 12. Test Plan Summary

Production testing follows the *Production Test Plan Final* document. Key stages are summarised below.

| Test Stage           | Criteria                                                       | Tool                            |
|----------------------|----------------------------------------------------------------|---------------------------------|
| Unit – Photonic Core | $\text{multiply}()$ output norm $\neq 0$ ; throughput > 1 TOPS | pytest / NumPy                  |
| Unit – TFLN MZM      | $V\pi \in [1.5, 5]$ V; BW > 50 GHz                             | pytest                          |
| Unit – WDM           | 64 channels, BW = 6.4 Tbps                                     | pytest                          |
| Integration – API    | All <code>/api/*</code> routes return HTTP 200                 | requests + pytest               |
| Integration – SCL    | Loss decreasing over 50 epochs                                 | SSE stream parser               |
| System – Throughput  | Live TOPS $\geq 49,000$                                        | <code>/api/performance</code>   |
| System – PCB FEA     | Max temp < 85 °C; EM crosstalk < -35 dB                        | <code>fea_integration.py</code> |
| Acceptance – BER     | BER < $10^{-12}$ @ 400G PAM4, 2 km                             | BERT tester                     |

## 13. Security & Reliability

- **API Security:** Flask debug mode is disabled in production; server binds to 0.0.0.0:5001 — firewall rules must restrict external access.
- **Input Validation:** All POST endpoints validate JSON fields and clamp numeric parameters before passing to simulation kernels.
- **Error Handling:** Uncaught exceptions in FEA/Gerber routes return HTTP 500 with a JSON error body; tracebacks are logged to `app.log`.
- **Optical Safety:** Laser power is DAC-limited to 100 mW max via the LASER\_POWER register (12-bit, full-scale = 4095).
- **Thermal Protection:** FEA hotspot alerting flags components exceeding 60 °C in simulation; hardware OTP fuse at 85 °C.
- **MTBF Estimate:** Optical components (LDs, photodetectors) rated > 100,000 hours; PCIe re-link on error via PERST#.

## 14. Glossary

| Term          | Definition                                                                             |
|---------------|----------------------------------------------------------------------------------------|
| BER           | Bit Error Rate — probability of an incorrectly decoded bit.                            |
| Clements mesh | Optimal rectangular MZI decomposition for universal unitaries (Clements et al., 2016). |
| DMA           | Direct Memory Access — zero-CPU data transfer between host RAM and device.             |
| EO            | Electro-Optic — optical effect driven by applied electric field.                       |
| FSR           | Free Spectral Range — wavelength spacing between resonator modes.                      |

|                          |                                                                                     |
|--------------------------|-------------------------------------------------------------------------------------|
| <b>LCG</b>               | Linear Congruential Generator — fast pseudo-random number generator.                |
| <b>MMIO</b>              | Memory-Mapped I/O — device registers mapped into CPU address space.                 |
| <b>MZI</b>               | Mach-Zehnder Interferometer — beam splitter + phase shift + combiner.               |
| <b>PAM4</b>              | 4-level Pulse Amplitude Modulation — 2 bits per symbol.                             |
| <b>SCL</b>               | Statistical Congruential Learning — in-situ adaptive phase calibration algorithm.   |
| <b>TFLN</b>              | Thin-Film Lithium Niobate — EO platform with low $V\pi$ and high bandwidth.         |
| <b>TOPS</b>              | Tera Operations Per Second — unit of compute throughput.                            |
| <b>WDM</b>               | Wavelength Division Multiplexing — simultaneous multi-channel optical transmission. |
| <b><math>V\pi</math></b> | Half-wave voltage — voltage required for $\pi$ phase shift in a MZM.                |