

# Runyang Tian

📍 San Diego, CA 📩 r3tian@ucsd.edu ☎ 858-405-7762 🌐 Homepage

## Education

|                                                                                                          |                        |
|----------------------------------------------------------------------------------------------------------|------------------------|
| <b>University of California San Diego</b>                                                                | <i>09/2024-06/2026</i> |
| ◦ M.S. in Electrical Engineering, GPA: 3.5/4.0                                                           |                        |
| ◦ Courses: Computer Architecture, VLSI for ML, VLSI Physical Design, VLSI Testing, CMOS IC Lab           |                        |
| <b>Xi'an Jiaotong University</b>                                                                         | <i>09/2020-06/2024</i> |
| ◦ B.E. in Microelectronic, GPA: 3.5/4.0                                                                  |                        |
| ◦ Courses: Digital Integrated Circuits, Analog Integrated Circuits, C Programming, Computer Architecture |                        |

## Skills

**Languages:** Verilog, System Verilog, Python, C/C++, MATLAB, SPICE, TCL

**Tools:** Cadence Virtuoso, Cadence Innovus, Synopsys DC, Altium, Xcelium, Vivado, HSpice, Linux

## Publications

- Y. Chen\*, R. Tian\*, Y. Pan, Z. Li, W. Xu, T. Rosing, “CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM Inference,” DATE, 2026.
- Y. Chen\*, Z. Li\*, K. Fan, R. Tian, J. Hsu, M. Zhou, T. Rosing, “RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graph,” DATE, 2026.

## Research Experience

- Heterogeneous Near-Memory Acceleration for Edge LLM Inference** Advisor: Tajana Rosing  
◦ Proposed a compiler-supported near-memory accelerator that integrates monolithic 3D DRAM (low-latency, high-endurance) and 3D RRAM (high-density, low-refresh-energy) chiplets.  
◦ Proposed a mapping and scheduling framework for heterogeneous near-memory kernels.  
◦ Built a in-house cycle- and energy-accurate simulator to evaluate performance.
- Sparse-Aware LUT-Based NMP Acceleration for Edge LLM Inference** Advisor: Tajana Rosing  
◦ Simulated and analyzed performance of MAC-basd/LUT-basd NMP/PIM acceleration methods.  
◦ Developed LUT-based Processing Engine (LUT-PE) in NMP accelerator that leverages mixed-precision lookup to avoid redundant operations.  
◦ Designed Progressive Token Elimination Unit (PTEU) to dynamically prune tokens during inference.
- Recursive All-Pairs Shortest Paths Using PIM on Graph** Advisor: Tajana Rosing  
◦ Developed RTL modules and synthesized using 40nm CMOS technology, including a custom permutation unit, min-comparator tree, and controller.  
◦ Contributed to a PIM-based system for all-pairs shortest paths with algorithm–architecture co-optimization.
- In-Storage Database Search Using HDC on Ferroelectric NAND Flash** Advisor: Tajana Rosing  
◦ Design peripherals to implement binding (XOR) and arithmetic operations (SUM, AVG, MIN, MAX).  
◦ Modeled and optimized the energy efficiency of the peripherals.
- Design and Optimization of Optoelectronic Interface Driver** Advisor: Prof. Dan Li  
◦ Designed a MZI-based optical DAC with binary-weighted phase control and LUT-based linearity optimization unit  
◦ Developed a linear driver array driving an 8-bit optical DAC to operate at 1GHz, while achieving electronic-optical speed matching with TSMC 28nm CMOS process.  
◦ Reduced linearity from 10.8% to 1.8%, improved INL from 26 to 5 LSB, the power was 9.81mW.  
◦ Outstanding Graduation Thesis Nomination.

## Projects

---

|                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                     |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|
| <b>9-bit SAR ADC Design and Tape-Out</b>                                                                                                                                                                                                                                                                                                                                                                                                                        | 03/2025-05/2025,<br>09/2025-11/2025 |
| <ul style="list-style-type: none"><li>○ Designed the schematic and layout applying dummy devices, common centroid pattern, and power shielding to minimize mismatch and noise. Monte Carlo simulation showed 2mV offset and 0.01% capacitor mismatch.</li><li>○ Developed the SAR controller in Verilog, synthesized using Genus and PnR with Innovus.</li><li>○ Designed PCB for testing the chip, includes a Anti-Alias Filter, SE to DE converter.</li></ul> |                                     |
| <b>Reconfigurable 2D Systolic-Array NPU Design</b>                                                                                                                                                                                                                                                                                                                                                                                                              | 01/2025-03/2025                     |
| <ul style="list-style-type: none"><li>○ FIFO, SRAM, and special function unit (SFU), supporting convolution, matrix multiplication, normalization, and ReLU operators.</li><li>○ FIFOs, and developed control logic to switch between weight-stationary and output-stationary modes.</li><li>○ Completed synthesis, STA, and place-and-route (PnR) in TSMC 65 nm technology, optimizing PPA through multi-cycle paths, pipelining, and clock gating.</li></ul>  |                                     |
| <b>TinyCPU: Single-Stage Single-Issue 9-bit Microprocessor</b>                                                                                                                                                                                                                                                                                                                                                                                                  | 03/2025-06/2025                     |
| <ul style="list-style-type: none"><li>○ Developed an accumulator-based architecture to meet compact 9-bit instruction encoding constraint.</li><li>○ Defined a custom ISA that supports arithmetic, logic, branch, jump and load-store operations in six instruction formats (SR, LS, SI, DR, RI, J).</li><li>○ Designed control logic and datapath, implementing floating-point to fixed-point conversion program on TinyCPU.</li></ul>                        |                                     |
| <b>FAST-9 Corner Detection Algorithm Accelerator</b>                                                                                                                                                                                                                                                                                                                                                                                                            | 07/2023-08/2023                     |
| <ul style="list-style-type: none"><li>○ Developed RTL design for buffer, detection and non-maximum suppression modules to process image.</li><li>○ Designed corner detection as a 4-stage pipeline (difference value, absolute value, threshold compare, result) to improve throughput.</li></ul>                                                                                                                                                               |                                     |