Denton, Texas, USA
When a large language model confidently returns a wrong answer to a high-stakes clinical or financial query, who bears the cost — and how do we prevent it? This question has defined my research trajectory. Working at the intersection of machine learning and real-world data infrastructure — from building financial data pipelines at scale to studying the failure modes of generative AI — I have come to focus on a single, urgent problem: how do we build AI systems that know what they don’t know? I am fascinated to develop scalable frameworks for uncertainty quantification and trustworthiness in AI systems operating on high-volume, real-world data.
- Reliability and Trustworthiness of AI Systems
- Machine Learning and Statistical Learning Theory
- Scalable Data Systems and Large-Scale Data Analytics
- Uncertainty Quantification in Predictive Models
University of North Texas — Master of Science in Advanced Data Analytics
May 2024 • GPA 3.636 / 4.0
Gokaraju Rangaraju Institute of Engineering and Technology — B.Tech in Electronics and Communication Engineering
May 2022 • GPA 3.36 / 4.0
Graduate Research Affiliate — Reliability of Large Language Models
Computational Healthcare & BioTechnology Lab, Dr. Mohammed Aledhari, University of North Texas • Jan 2025 – Present
- Investigating failure modes and uncertainty-aware decision-making processes in large language models (LLMs) to address the critical problem of confident yet inaccurate AI outputs.
- Designing experimental protocols to evaluate model calibration and reliability across diverse query domains, contributing to the development of trustworthy AI evaluation frameworks.
- Analyzing patterns of model hallucination and overconfidence, identifying systematic failure categories that inform uncertainty quantification strategies for real-world deployment.
Capstone Research — Energy Consumption Forecasting
University of North Texas • Fall 2023 – Spring 2024
- Built predictive machine learning pipelines to forecast U.S. energy consumption using large-scale time series datasets, implementing preprocessing, feature engineering, and anomaly detection workflows.
- Evaluated multiple model architectures using RMSE and MSE metrics, revealing how temporal dependencies and distributional shift degrade prediction accuracy in long-horizon forecasting.
- Produced research-style technical reports and visualizations supporting sustainable energy planning applications.
Transportation Safety and Spatiotemporal Analytics Project
National Student Data Corps • Jan 2025
- Conducted end-to-end geospatial and temporal analysis of large-scale NYC traffic collision data, including schema validation, missing-value diagnostics, and bias-aware preprocessing.
- Applied time-series decomposition, seasonality extraction, anomaly detection, and geospatial hotspot analysis to identify temporal risk patterns and spatial crash clusters.
- Synthesized findings into a professional research poster presented to the U.S. Department of Transportation Federal Highway Administration.
Independent Research — Structural and Demographic Patterns in Custodial Arrests
University of North Texas • May 2024
- Performed large-scale exploratory data analysis on Dallas Police Department arrest records using statistical aggregation, categorical encoding, and spatial pattern detection.
- Uncovered non-intuitive patterns in high-density residential environments and weapon involvement prevalence, generating evidence-based insights for municipal resource allocation policy.
- B. V. Kumar, A. Bharat, G. V. Eathamokkala, Y. S. S. Harsha, A. U. Sree, "Analysis of an IoT based Water Quality Monitoring System," I-SMAC 2022. DOI:
10.1109/I-SMAC55078.2022.9987360. - Al-Edhari, A., G. V. Eathamokkala, & Rahouti, M. (2026). Response drift across frontier large language models. Manuscript under review at Nature Machine Intelligence.
- Spatiotemporal Analysis of New York City Traffic Crash Data — Research Poster, National Student Data Corps; presented to USDOT FHWA, Jan 2025.
Data Engineer — Vsion Technologies, Austin TX • Sep 2024 – Present
- Design scalable data pipelines integrating Kafka streaming with PostgreSQL analytical storage layers for large-scale structured financial datasets, identifying computational bottlenecks that motivate research in scalable data systems.
- Develop optimized relational data models and layered analytical views enabling downstream statistical analysis and machine learning workflows, with emphasis on reproducibility and experimental consistency.
- Implement feature engineering pipelines and apply query optimization and modular schema design to improve computational efficiency for large-scale ML deployment.
Data Analyst — Zetatek Technologies Pvt Ltd, Hyderabad, India • Jan 2022 – Dec 2022
- Analyzed operational and financial datasets using SQL Server and SSIS to identify statistical trends and optimize resource allocation strategies across business units.
- Developed automated analytical dashboards and integrated SQL-based data pipelines for structured reporting and visual analytics communication.
TriSQL Framework — Text-to-SQL Research Implementation
Independently implemented a three-stage Text-to-SQL framework inspired by the TriSQL architecture (Nature Scientific Reports, 2026), converting plain English questions into executable SQL queries using open-source tools running entirely on local hardware.
- Designed and built a semantic schema selector using sentence-transformers (all-MiniLM-L6-v2) to filter relevant database tables via cosine similarity, reducing prompt noise and improving generation quality.
- Developed a two-step structured SQL generator that first identifies required SQL clauses (JOIN, GROUP BY, WHERE) before generating the complete query — improving syntactic correctness over single-prompt approaches.
- Implemented a complexity-aware refinement stage that classifies generated SQL as Easy, Medium, or Hard and applies tiered error correction including execution feedback loops for hard queries.
- Evaluated on the Spider benchmark dataset (Yale University) — achieving 70% Execution Accuracy and 100% Executability Rate using SQLCoder via Ollama with no GPU or API costs.
- Deployed a FastAPI web interface enabling non-technical users to query any SQLite database in plain English and view results directly in a browser.
GitHub: https://github.com/Gowthamch9/trisql-framework
IoT-Based Water Quality Monitoring System (E-Aqua)
- Engineered an IoT-enabled prototype integrating pH, turbidity, TDS, temperature, and flow sensors with an Arduino-based controller for real-time environmental data acquisition.
- Implemented wireless data transmission via ESP8266 Wi-Fi module, enabling continuous remote monitoring through a cloud-connected dashboard and mobile interface.
- Conducted comparative analysis of monitoring technologies, demonstrating a low-cost, scalable system for municipal, aquaculture, and agricultural applications. Published in IEEE.
- Programming and Data Science: Python (NumPy, Pandas, scikit-learn, PyTorch, TensorFlow), SQL, PySpark, R
- Machine Learning and Statistics: Regression, Classification, Clustering, Neural Networks, Time Series Forecasting, PCA, Random Forests, Support Vector Machines, Cross-Validation, Model Evaluation, Uncertainty Quantification
- Big Data and Cloud Systems: Apache Spark, Kafka, Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud SQL, Vertex AI), Snowflake
- Databases: PostgreSQL, MySQL, Microsoft SQL Server
- Visualization and Tools: Matplotlib, Seaborn, Power BI, Tableau, Git, Jupyter Notebook, LaTeX
- Coding Tutor: Taught Python and Scratch to middle and high school students through project‑based learning (~25 students).
- Volunteer Data Science Mentor: National Student Data Corps — guided teams on preprocessing, analysis, and poster development.
Certifications:
- Google Cloud Data Engineering and ML Specialization
- Google Advanced Data Analytics Professional Certificate
- Microsoft Power BI for Data Analysts
- Advanced SQL for Data Engineering.
Honors:
- Selected Participant, National Student Data Corps
- Graduate GPA 3.636 / 4.0, University of North Texas.
- English: Professional proficiency
- Telugu: Native
- Hindi: Conversational proficiency
- Tamil: Conversational Proficiency
