Gowtham Venkat Eathamokkala Gowthamch9

Denton, Texas, USA

Research Agenda

When a large language model confidently returns a wrong answer to a high-stakes clinical or financial query, who bears the cost — and how do we prevent it? This question has defined my research trajectory. Working at the intersection of machine learning and real-world data infrastructure — from building financial data pipelines at scale to studying the failure modes of generative AI — I have come to focus on a single, urgent problem: how do we build AI systems that know what they don’t know? I am fascinated to develop scalable frameworks for uncertainty quantification and trustworthiness in AI systems operating on high-volume, real-world data.

Research Interests

Reliability and Trustworthiness of AI Systems
Machine Learning and Statistical Learning Theory
Scalable Data Systems and Large-Scale Data Analytics
Uncertainty Quantification in Predictive Models

Education

University of North Texas — Master of Science in Advanced Data Analytics
May 2024 • GPA 3.636 / 4.0

Gokaraju Rangaraju Institute of Engineering and Technology — B.Tech in Electronics and Communication Engineering
May 2022 • GPA 3.36 / 4.0

Research Experience

Graduate Research Affiliate — Reliability of Large Language Models
Computational Healthcare & BioTechnology Lab, Dr. Mohammed Aledhari, University of North Texas • Jan 2025 – Present

Investigating failure modes and uncertainty-aware decision-making processes in large language models (LLMs) to address the critical problem of confident yet inaccurate AI outputs.
Designing experimental protocols to evaluate model calibration and reliability across diverse query domains, contributing to the development of trustworthy AI evaluation frameworks.
Analyzing patterns of model hallucination and overconfidence, identifying systematic failure categories that inform uncertainty quantification strategies for real-world deployment.

Capstone Research — Energy Consumption Forecasting
University of North Texas • Fall 2023 – Spring 2024

Built predictive machine learning pipelines to forecast U.S. energy consumption using large-scale time series datasets, implementing preprocessing, feature engineering, and anomaly detection workflows.
Evaluated multiple model architectures using RMSE and MSE metrics, revealing how temporal dependencies and distributional shift degrade prediction accuracy in long-horizon forecasting.
Produced research-style technical reports and visualizations supporting sustainable energy planning applications.

Transportation Safety and Spatiotemporal Analytics Project
National Student Data Corps • Jan 2025

Conducted end-to-end geospatial and temporal analysis of large-scale NYC traffic collision data, including schema validation, missing-value diagnostics, and bias-aware preprocessing.
Applied time-series decomposition, seasonality extraction, anomaly detection, and geospatial hotspot analysis to identify temporal risk patterns and spatial crash clusters.
Synthesized findings into a professional research poster presented to the U.S. Department of Transportation Federal Highway Administration.

Independent Research — Structural and Demographic Patterns in Custodial Arrests
University of North Texas • May 2024

Performed large-scale exploratory data analysis on Dallas Police Department arrest records using statistical aggregation, categorical encoding, and spatial pattern detection.
Uncovered non-intuitive patterns in high-density residential environments and weapon involvement prevalence, generating evidence-based insights for municipal resource allocation policy.

Publications and Manuscripts

B. V. Kumar, A. Bharat, G. V. Eathamokkala, Y. S. S. Harsha, A. U. Sree, "Analysis of an IoT based Water Quality Monitoring System," I-SMAC 2022. DOI: 10.1109/I-SMAC55078.2022.9987360.
Al-Edhari, A., G. V. Eathamokkala, & Rahouti, M. (2026). Response drift across frontier large language models. Manuscript under review at Nature Machine Intelligence.

Presentations and Posters

Spatiotemporal Analysis of New York City Traffic Crash Data — Research Poster, National Student Data Corps; presented to USDOT FHWA, Jan 2025.

Professional Experience

Data Engineer — Vsion Technologies, Austin TX • Sep 2024 – Present

Design scalable data pipelines integrating Kafka streaming with PostgreSQL analytical storage layers for large-scale structured financial datasets, identifying computational bottlenecks that motivate research in scalable data systems.
Develop optimized relational data models and layered analytical views enabling downstream statistical analysis and machine learning workflows, with emphasis on reproducibility and experimental consistency.
Implement feature engineering pipelines and apply query optimization and modular schema design to improve computational efficiency for large-scale ML deployment.

Data Analyst — Zetatek Technologies Pvt Ltd, Hyderabad, India • Jan 2022 – Dec 2022

Analyzed operational and financial datasets using SQL Server and SSIS to identify statistical trends and optimize resource allocation strategies across business units.
Developed automated analytical dashboards and integrated SQL-based data pipelines for structured reporting and visual analytics communication.

Academic Projects

TriSQL Framework — Text-to-SQL Research Implementation

Independently implemented a three-stage Text-to-SQL framework inspired by the TriSQL architecture (Nature Scientific Reports, 2026), converting plain English questions into executable SQL queries using open-source tools running entirely on local hardware.

Designed and built a semantic schema selector using sentence-transformers (all-MiniLM-L6-v2) to filter relevant database tables via cosine similarity, reducing prompt noise and improving generation quality.
Developed a two-step structured SQL generator that first identifies required SQL clauses (JOIN, GROUP BY, WHERE) before generating the complete query — improving syntactic correctness over single-prompt approaches.
Implemented a complexity-aware refinement stage that classifies generated SQL as Easy, Medium, or Hard and applies tiered error correction including execution feedback loops for hard queries.
Evaluated on the Spider benchmark dataset (Yale University) — achieving 70% Execution Accuracy and 100% Executability Rate using SQLCoder via Ollama with no GPU or API costs.
Deployed a FastAPI web interface enabling non-technical users to query any SQLite database in plain English and view results directly in a browser.

GitHub: https://github.com/Gowthamch9/trisql-framework

IoT-Based Water Quality Monitoring System (E-Aqua)

Engineered an IoT-enabled prototype integrating pH, turbidity, TDS, temperature, and flow sensors with an Arduino-based controller for real-time environmental data acquisition.
Implemented wireless data transmission via ESP8266 Wi-Fi module, enabling continuous remote monitoring through a cloud-connected dashboard and mobile interface.
Conducted comparative analysis of monitoring technologies, demonstrating a low-cost, scalable system for municipal, aquaculture, and agricultural applications. Published in IEEE.

Technical Skills

Programming and Data Science: Python (NumPy, Pandas, scikit-learn, PyTorch, TensorFlow), SQL, PySpark, R
Machine Learning and Statistics: Regression, Classification, Clustering, Neural Networks, Time Series Forecasting, PCA, Random Forests, Support Vector Machines, Cross-Validation, Model Evaluation, Uncertainty Quantification
Big Data and Cloud Systems: Apache Spark, Kafka, Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud SQL, Vertex AI), Snowflake
Databases: PostgreSQL, MySQL, Microsoft SQL Server
Visualization and Tools: Matplotlib, Seaborn, Power BI, Tableau, Git, Jupyter Notebook, LaTeX

Teaching and Mentoring

Coding Tutor: Taught Python and Scratch to middle and high school students through project‑based learning (~25 students).
Volunteer Data Science Mentor: National Student Data Corps — guided teams on preprocessing, analysis, and poster development.

Certifications and Honors

Certifications:

Google Cloud Data Engineering and ML Specialization
Google Advanced Data Analytics Professional Certificate
Microsoft Power BI for Data Analysts
Advanced SQL for Data Engineering.

Honors:

Selected Participant, National Student Data Corps
Graduate GPA 3.636 / 4.0, University of North Texas.

Languages

English: Professional proficiency
Telugu: Native
Hindi: Conversational proficiency
Tamil: Conversational Proficiency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly