portfolio
Cost-Aware Automatic Prompt Optimization (APO)
Hybrid APE-OPRO prompt optimization with 18% lower API cost and 0.84 F1.
AI Evaluation Specialist — Content Quality at Viator
End-to-end LLM evaluation framework for product descriptions: layered metrics (ROUGE, BERTScore, NLI, G-Eval, SelfCheckGPT), model introspection tooling, and CI/CD integration.
Safety & Evaluation Framework for AI Agents
End-to-end safety and behavioural evaluation framework for AI agents: trajectory scoring, safety gates, adversarial red-teaming, and continuous production monitoring.
Media Optimisation — GPT-4V + Multi-Armed Bandit Hero Image Selection
Combined GPT-4 Vision shortlisting and Bayesian Multi-Armed Bandit to replace static hero image selection with adaptive, CTR-driven optimisation at product scale.
AI Customer Service Agent for Travel (System Design)
End-to-end agent architecture with GraphRAG knowledge, multi-tier memory, HITL escalation, and sub-1.5s P95 latency.
Review Summarization at Scale
Multilingual ABSA pipeline cutting LLM token usage by 82% and hallucinations to 1.8%.
Automated FAQ Extraction (Knowledge Governance System)
GraphRAG + agentic validation system cutting hallucinations to 2.1%.
Active Learning for Traveler Tips Extraction
Fine-tuned SLM with 0.84 F1 and 99.3% lower inference cost.
Enterprise AI Governance Framework (Responsible AI)
Safety gate with 100% coverage and <10ms validation latency.
Cross-Portfolio Engineering Practices
Standardized experimentation, monitoring, and reproducibility across ML systems.
Attribution Modelling
Slideshare deck on multi-touch attribution, data quality, and modeling trade-offs.