Hargurjeet
Singh Ganger

Principal GenAI Architect & Data Scientist

Enterprise RAG•Agentic Workflows•MLOps & Scale

I bridge the gap between proof-of-concept AI models and resilient, production-grade Generative AI architectures. Specializing in deploying high-fidelity enterprise RAG systems, orchestrating autonomous multi-agent workflows, and implementing robust MLOps guardrails that ensure safety, accuracy, and performance at scale.

LangChainCrewAIAWS BedrockRAGLangGraphFastAPIPython

View My Work Resume LinkedIn GitHub

BT RAG

70%

Reduction in manual document extraction time via production LLM pipelines.

Shell ML

30%

Cut in oil refinery maintenance costs through predictive modeling.

BT Recs

30%

Increase in Value-Added Service sales using market-basket models.

Scale

90%+

Accuracy in multimodal document processing on 100K+ PDFs.

Value I Bring

What I Can Solve

Scalable RAG pipelines

Design and ship hybrid retrieval systems (BM25 + dense vectors + reranker) that process 100K+ multimodal documents with 90%+ accuracy in production.

Agentic AI workflows

Architect multi-agent systems with CrewAI and LangGraph — tool-augmented, guardrailed, and evaluated with Ragas before they touch production.

LLM evaluation frameworks

Build frameworks that measure faithfulness, relevancy, hallucination, toxicity, and bias — so model upgrades don't silently degrade your product.

Research → production AI

Take proof-of-concept LLM experiments and harden them: containerise, add CI/CD gates, instrument with latency and drift monitoring on AWS.

AWS Bedrock & GenAI infra

Deploy secure, cost-efficient GenAI systems using Bedrock, Textract, OpenSearch, and Step Functions — with CloudWatch observability built in.

ML-driven business outcomes

Translate messy enterprise data into XGBoost, Random Forest, or deep-learning models that move real metrics: 30% fewer maintenance incidents, 10% budget saved.

About Me

From IT Analyst to AI Systems Builder

At British Telecom I'm leading the Generative AI charge — deploying RAG-powered chatbots on AWS Bedrock, building multi-agent workflows with CrewAI, and designing LLM evaluation frameworks that catch hallucinations before they reach production. My focus is always the same: AI that works in the real world, not just the notebook.

Before that, at Royal Dutch Shell, I moved into data science — building predictive maintenance models that cut equipment downtime by 25% across oil refineries, and forecasting dashboards that saved 10% of budget allocations across five geographies. That's where I fell in love with the gap between a working model and a working solution.

I started my career at TCS testing point-of-sale systems, spending a year in the UK guiding offshore teams through complex software rollouts. Those early years taught me how enterprise systems break under real conditions — a foundation that still shapes how I build today.

LLMsRAGAgentic AIMLOpsAWSPythonGenAI

Career History

Work Experience

Senior Data Scientist

May 2022 – Present

British Telecom (BT) · Bangalore, India

▸Architected and led delivery of an enterprise-grade conversational AI system (LLMs + RAG), reducing manual document extraction time by 70% while processing 100K+ files with 90%+ accuracy via AWS Bedrock and OpenSearch.
▸Designed and deployed multi-step agentic workflows using CrewAI and LangGraph, integrating JSON schema validation, retry loops, and custom hallucination guardrails in production.
▸Developed an LLM evaluation framework using Ragas and LLM-as-judge pipelines to assess faithfulness, toxicity, bias, and hallucination detection.
▸Built an automated email intelligence pipeline processing 6,000+ weekly escalation emails, fine-tuning a LLaMA-2 7B model locally via QLoRA for a 40% F1-score improvement.
▸Engineered recommendation systems (Random Forest + XGBoost) and market basket analysis (Apriori) increasing SD-WAN sales by 10% and Value-Added Services (VAS) sales by 30%.

Data Scientist

Sep 2016 – May 2022

Royal Dutch Shell · Bangalore, India

▸Developed and evaluated predictive maintenance models (XGBoost, Random Forest) using SHAP-based interpretability and ROC-AUC scoring, cutting equipment maintenance costs by 30% and unplanned downtime by 25%.
▸Engineered end-to-end data pipelines in Python (Pandas, NumPy) and developed Power BI dashboards to forecast materials on-time delivery across 5 geographies, saving 10% budget.
▸Acquired 5+ years of experience with data warehousing, ETL pipelines, big data analytics, and relational databases.

IT Analyst

Dec 2010 – Aug 2016

Tata Consultancy Services (TCS) · India / UK

▸Performed System Integration Testing (SIT) and User Acceptance Testing (UAT) to validate client Point-of-Sale (PoS) systems at enterprise scale.
▸Spent one year in the UK onsite guiding offshore teams through the implementation of new PoS software.
▸Acquired extensive experience working with card and payment systems, PCI standards, and ISO 8583 protocols.

Portfolio

Featured Projects

A carefully curated selection of deep-dive AI engineering projects, spanning autonomous agents, production RAG pipelines, and local SLM benchmarks.

AI Agents

Antigravity: Autonomous UI Designer

Live App

Closed-Loop Critic: Triggers autonomous redraw cycles by validating generated mockup images against guidelines.
Evaluator Scoring: Computes quantitative metrics for brand consistency, color alignment, and layout accuracy to score each generation attempt.
State & Memory: Manages agent context and state retention across iterations using a central memory store to refine subsequent image generation.
Robust Guardrails: Enforces strict boundaries via JSON schema validation, retry loops, and degrade-gracefully fallbacks via pre-trial groups.
Streaming Timeline: Traces and displays the agentic step-by-step cognitive thoughts and evaluations alongside intermediate drawing cycles.
Google Agentic Systems: Orchestrates multimodal Gemini 2.5 and Imagen 4 Ultra models to analyze briefs, map brand DNA, and generate high-fidelity images.

Outcome: A state-of-the-art design workspace demonstrating expert command over multimodal image data through self-correcting agentic loops.

AI Agents

Local Multi-Agent Folder Organizer

Open Source

Hierarchical Coordinator-Specialist Architecture: Configures a lead orchestrator agent that partitions folder listings into subtask categories, preventing context window limits.
Multi-Agent Concurrency: Spawns category-specific specialist subagents in parallel using a Python `ThreadPoolExecutor` to slash local Ollama inference latency by 60%.
Pydantic Output Validation: Enforces strict JSON schemas on local SLMs using CrewAI's output parsing, guaranteeing zero formatting errors.
Human-in-the-Loop Safe Gate: Implements a CLI preview table and user confirmation prompt before mutating any folder structure, supporting a non-destructive dry-run mode.
Transactional Rollback Log: Records all file migrations atomically in a central `history.json` transaction log, facilitating instant programmatic recovery.

Outcome: A local-first system running fully on-device via Ollama that restructures cluttered downloads folders into semantic, context-aware nested subdirectories.

AI & Benchmarks

Local AI Assistant & SLM Benchmarking

Open Source

Local SLM evaluation: Developed a FastAPI testing harness benchmarking Llama 3.2 (3B), Phi-3 Mini (3.8B), and Mistral (7B) fully on-device via Ollama.
Inference speed profiling: Measured raw performance where Phi-3 Mini led at 22.70 tokens/sec (323.99ms TTFT), followed closely by Llama 3.2 at 22.24 tokens/sec (427.29ms TTFT).
Pydantic schema enforcement: Structured LLM outputs using validation schemas. Llama 3.2 achieved 100% compliance via retry reprompts, while Mistral 7B achieved 90% compliance zero-shot.
Resource allocation tracking: Measured memory-bound constraints on Apple Silicon Mac mini (16GB RAM) where CPU load remained low (13–15%) but loaded memory hit 88.8% to 94.4% of RAM.

Outcome: Rigorous local benchmark of 30 multi-domain prompts published on Dev.to and GitHub. Proved that Llama 3.2 (3B) is the most reliable for structured JSON pipelines, while Phi-3 Mini excels in speed and latency.

MCP & Developer Tools

Generic Database MCP Server

Open Source

Zero hardcoding — connects to any DuckDB file and auto-discovers every table and column at runtime
Type-aware quality checks: numeric columns get distribution stats + Z-score; VARCHAR gets cardinality; TIMESTAMP gets gap detection
Ollama ReAct loop (llama3.2) iteratively calls MCP tools, drills into anomalies, and writes a plain-English RCA report
FastAPI REST layer exposes drag-and-drop file upload, per-table quality checks, and RCA generation as HTTP endpoints
Next.js dashboard visualises schema, null rates, distribution cards, and cardinality in a 3-step upload → inspect → report flow

Outcome: Pass any DuckDB file and get a full data-quality report + LLM-written root cause analysis in seconds — no config, no hardcoded schema.

AI Applications

AI-Powered Resume Parser

Live App

PDF → structured JSON pipeline: pdfplumber extracts text → Llama 3.3 70B parses via Fireworks AI
JSON schema enforcement: instructor library constrains LLM output to an exact Pydantic v2 model
Retry mechanism: catches invalid outputs, re-prompts the LLM once, then fails gracefully — no silent errors
Split-view UI: original PDF alongside experience timeline, color-coded skill tags, and education cards
One-click JSON export, dark/light mode, drag-and-drop upload with animated progress steps

Outcome: Live on Fly.io. Structured output guaranteed at the schema level — retry logic and graceful failure handle the edge cases that plain prompting misses.

RAG & LLMOps

Production-Grade RAG Evaluation Pipeline

Live App

Hybrid retrieval — BM25 sparse + contextual dense search — fed through a Cohere reranker
Citation enforcement grounds every answer in source documents; no hallucinated references
Prompts version-controlled in a config file — every change is tracked and reproducible
Offline RAGAS script measures faithfulness, answer relevancy, and context precision
GitHub Actions gate runs eval on every PR; merge blocked if any metric drops below threshold

Outcome: Quality regressions caught at PR stage, not in production. Stack: LangChain/LangGraph, Chroma vector store, Cohere reranker — every retrieval step traceable, every prompt change auditable.

Expertise

Technical Skills

Generative AI

RAG PatternsAgentsFine-tuningGuardrailsPII FilteringVector DBObservabilityMCP

LLM Frameworks & APIs

LangChainLangGraphCrewAIOpenAI APIAnthropic APIGemini APILangfuseRAGAS

MLOps

DockerFastAPIMLflowCI/CDGitHub ActionsGitLabAWS SageMakerFeature Store

Cloud (AWS)

BedrockTextractOpenSearchLambdaStep FunctionsCloudWatchSageMaker

Core ML & Data

XGBoostRandom ForestsPyTorchTensorFlowscikit-learnBERTPySparkSQL

Agentic Coding Tools

CursorClaude CodeKiroAmazon Q DeveloperVS CodeGitHub Copilot

15+

Years Experience

TCS → Shell → BT

$10M+

Business Value Generated

Across AI & ML initiatives

100K+

Documents Processed

Multimodal, production scale

AI Systems Shipped

In telecom, energy & IT

Academic Background

Education

2023 – 2025

Liverpool John Moores University

M.S. Machine Learning & Artificial Intelligence

Liverpool, UK

2022 – 2023

IIIT Bangalore

Executive PG in Data Science & AI

Bangalore, India

Statistics & Probability · ML · NLP · Neural Networks · MLOps

2006 – 2010

New Horizon College of Engineering

B.E. Electronics & Communication

Bangalore, India

Visvesvaraya Technological University

Get in Touch

Let's Connect

Open to senior data science and AI engineering roles. If you're building something ambitious with LLMs or agentic systems, I'd love to talk.

gurjeet333@gmail.com

LinkedIn GitHub

HargurjeetSingh Ganger

Principal GenAI Architect & Data Scientist

70%

30%

30%

90%+

What I Can Solve

Scalable RAG pipelines

Agentic AI workflows

LLM evaluation frameworks

Research → production AI

AWS Bedrock & GenAI infra

ML-driven business outcomes

From IT Analyst to AI Systems Builder

Work Experience

Senior Data Scientist

Data Scientist

IT Analyst

Featured Projects

Antigravity: Autonomous UI Designer

Local Multi-Agent Folder Organizer

Local AI Assistant & SLM Benchmarking

Generic Database MCP Server

AI-Powered Resume Parser

Production-Grade RAG Evaluation Pipeline

Technical Skills

Generative AI

LLM Frameworks & APIs

MLOps

Cloud (AWS)

Core ML & Data

Agentic Coding Tools

Education

Let's Connect

Hargurjeet
Singh Ganger