

# RAHUL RAJ D N

+1 (404) 452-4865 | [r84@gatech.edu](mailto:r84@gatech.edu) | <https://www.linkedin.com/in/rahulrajdn/>

## Objective

Seeking Full-time Opportunities starting May 2026 in CPU/GPU/ASIC Architecture and RTL Design.

## Education

### Georgia Institute of Technology | Atlanta, GA

Master of Science in Electrical and Computer Engineering, **GPA: 4.00/4.00**

August 2024 - Present

**Expected Graduation: May 2026**

**Concentration: VLSI Systems & Digital Design, Computer Systems and Software**

**Courses:** VLSI Design: Theory-to-Tapeout, Advanced Computer Architecture, Hardware Software Co-Design for ML Systems, Silicon Validation, Advanced VLSI Systems, GPU Hardware & Software, Advanced Programming Techniques

### R V College of Engineering | Bengaluru, India

August 2017 – May 2021

Bachelor of Engineering in Electronics and Communication Engineering, **CGPA: 9.32/10**

**Courses:** Microprocessors and Microcontrollers, Analysis and Design of Digital Circuits, Advanced Digital System Design using Verilog HDL, Digital VLSI Design, Low Power VLSI Design, VLSI Testing for ICs

## Skills

**Languages:** C / C++, Assembly Language, **Verilog / System Verilog, Python**, Perl, MATLAB, Bash scripting, CUDA

**Tools:** Synopsys VCS, Verdi, Xilinx Vivado, Cadence Virtuoso, Intel ModelSim

## Work Experience

### Wi-Fi MAC Design Intern | Qualcomm, Santa Clara, CA

[ May 2025 – August 2025 ]

- RTL implementation of Wi-Fi MAC based on a novel architecture to reduce protocol dependency on hardware.
- Segregated and achieved RX packet transactions via ring interface.
- Proposed & implemented few hardware architectural optimizations leading to Area reduction (~18% at block level)

### ASIC Engineer | Cisco Systems, Bengaluru, India

[ February 2023 – July 2024 ]

- RTL implementation of submodules in the Packet parsing block of **SmartNIC**.
- Devised **micro-architecture** to meet system specifications in collaboration with the Architecture team.
- Owned one of the pivotal blocks in the networking SoC and created verification test plans for the same.
- Added feature-directed and random test cases to drive constrained traffic using UVM methodology.
- Automated regression flow and proactively debugged failures in regressions.
- Comprehend Cisco specific Bus protocols, different tools and industry practices in ASIC Design flow.

### Software Engineer | Cisco Systems, Bengaluru, India

[ October 2021 – February 2023 ]

- Added a toggle feature to switch between different transport protocols in the B2BUA which offers media service.
- Generation of an event description at every TLS handshake; Validation of end-to-end TLS calls.
- Integration of LibSRTP library into the application for encryption and decryption of RTP packets.

### Technical Undergraduate Intern | Cisco Systems, Bengaluru, India

[ February 2021 – July 2021 ]

- Developed a decoupled architecture for Signaling & Media planes using gRPC for inter-process communication

## Academic Projects

### Design and Tapeout of CORDIC module with TSMC 65nm Process Node

RTL Design of COordinate Rotation DIgital Computer module. Carried out Synthesis, APR and verified compliance with target specifications at each stage. Post-Silicon Validation of taped-out chip using Scan chain tests controlled by Raspberry Pi. [System Verilog / Tcl] (Sponsored by Apple in collaboration with Georgia Tech)

### Integration of Streaming Buffer into Vortex GPGPU Architecture

RTL implementation of a BRAM-based MMIO-accessible streaming buffer integrated into Vortex GPGPU. Designed address decoding and AXI-based routing logic for core-to-buffer communication. Verified functionality on U50 FPGA using Xilinx XRT simulation. [System Verilog] (Research under Prof. Hyesoon Kim at Georgia Tech)

- **Performance Evaluation of 5-stage Pipelined Superscalar architecture**  
Created hazard detection unit which supports stalling for RAW hazards and **forwarding** of data across the pipelines; Integrated Always-Taken and G-Share **Branch Predictors**. Further, incorporated Tomasulo algorithm to implement **Out-Of-Order execution** by employing register renaming & Reorder Buffer strategy. [C++]
- **Multi-Level Cache and Memory System Design Simulator**  
Developed and optimized a multi-level cache simulator with configurable **cache sizes** and **replacement policies**. Modeled DRAM memory with open and close page policies, improving memory access latency and performance. Tested and benchmarked against SPEC2006 benchmarks to evaluate cache performance. [C++]
- **RTL Design of Ethernet address swapping module and testing using System Verilog**  
Designed a module for swapping source and destination addresses in ethernet packets. Developed a **System Verilog**-based test bench environment for verifying the functional correctness of the block using constrained random verification on the ModelSim tool. [System Verilog]
- **RTL implementation of Error Correction Techniques: Convolution Encoder with Viterbi Decoder and LDPC Encoder with bit-flipping algorithm**  
Designed encoder/decoder for each of the above techniques and verified functionality on Vivado (Xilinx) simulator. Comparative study was made in terms of Power, Area and Resource usage. [Verilog]
- **GPU Warp Scheduling and Compute Core Simulation**  
Implemented GPU simulator with advanced warp scheduling algorithms (**RR**, **GTO**, **CCWS**) and modeled compute / tensor core pipelines with realistic latency and dependency handling. Analyzed performance trade-offs across varying tensor latencies and execution widths using benchmark traces. [C++]
- **GPU-Accelerated Bitonic Sort using CUDA**  
Developed and optimized a parallel Bitonic sort on NVIDIA H100 using shared memory tiling, coarsened merging, and multi-stream transfers. Achieved ~147x CPU speedup, 90% occupancy, and ~ 800M elements/sec throughput through memory coalescing and kernel tuning. [CUDA]
- **SRAM Design, Layout, Extraction and Simulation**  
Creation of SRAM peripherals (Decoder Logic, WL generation, Column MUX); Design of SRAM cell, array, read / write circuit; Layout of the design and timing from the extracted circuit. [Cadence]
- **Collective Communication Optimization using MSCC-Lang on Hierarchical Mesh Topologies**  
Implemented and optimized hierarchical all-reduce collectives using MSCC-Lang. Designed multi-phase communication patterns on 2D mesh topologies and validated performance using XML generation. Focused on ring-based collective strategies with buffer management and chunk indexing. [Python]

## Technical Publications

---

- Udit Subramanya, Rahul Raj, Jeff Young, and Hyesoon Kim, Deploying Vortex FPGA development environment with Apptainer, Open-Source Computer Architecture Research Workshop (OSCAR 2025)
- Sowmya, K. B., Rahul Raj, D. N., & Shetty, S. K. (2021). Error Correction Technique Using Convolution Encoder with Viterbi Decoder. In P. Karuppusamy, I. Perikos, F. Shi, & T. N. Nguyen (Eds.), *Sustainable Communication Networks and Application* (pp. 243–252). Springer Singapore.
- V, Dr. Kiran & D.N, Rahul Raj. (2021). Segregating Signaling and Media Planes into Different containers. Journal of University of Shanghai for Science and Technology. 23. 1373-1382. 10.51201/JUSST/21/06452.