I'm an AI & Data Engineer based in India, with over 4 years of experience bridging large-scale data engineering with production AI systems. I work across Python (PySpark), SQL, Databricks, and cloud-native architectures on Azure / GCP / AWS, with a focus on data quality and real-time pipelines.
I've delivered onsite with ministry stakeholders in Dubai and Abu Dhabi, translating government-level requirements into production data systems across UAE / UK / India time zones.
name: Yashi Mishra
role: AI & Data Engineer
location: India → delivering globally (UAE • UK • IN)
focus: Data Quality • Real-Time Pipelines • LLM & Voice AI
education: B.Tech, Computer Science — GLA University (81.9%)
currently: Researching data · Creating architectures · Medallion pipelines
ask-me-about: Microsoft Fabric, Databricks, PySpark, RAG, Whisper/Deepgram/ElevenLabs- 🏛️ Authored 300+ data-quality rules for UAE Official Development Assistance pipelines on Microsoft Fabric + Azure Databricks — audit-grade ministry reporting with Unity Catalog lineage.
- ⚡ Cut SQL latency 30%+ to under 2 seconds for live-ops dashboards by rewriting queries and stored procedures across PostgreSQL + BigQuery.
- 🧠 Fine-tuned multilingual LLMs on the Hugging Face stack with parallel and batch training pipelines on distributed Azure + Ray infrastructure.
- 🎙️ Prototyped a real-time Voice AI interview assistant — Whisper + Deepgram + ElevenLabs — with full production architecture and sizing presented to the client.
- ❄️ Built a Bronze → Diamond medallion architecture on GCP for a cold-chain IoT logistics client, with N8N / Make.com alerting on reefer-temperature thresholds.
- 💼 Designed the backend data architecture for ZipApply's candidate-recruiter matching using a life-quality scoring signal, HPCC ECL ETL pipelines, and OpenAI-powered resume tooling.
AI / Voice / LLM Systems
Data Engineering
Backend & Languages
Cloud, DevOps & Viz
- 🎙️ Production architectures for low-latency Voice AI (sub-300ms end-to-end)
- 🧪 Distributed fine-tuning patterns for multilingual LLMs on Ray + Azure
- 🏗️ Lakehouse design with Unity Catalog and lineage-first data governance
- 🔄 Event-driven medallion pipelines that hold up under audit
"Data that ships, AI that talks, pipelines that hold under audit."
