Engineering + Strategy + Business

Kaushik Sarkar

AI Engineering Leadership
Two decades building AI systems that inform consequential decisions.

// Foundation Models / Multi-Agent Systems / Enterprise Data Infrastructure / Global Programmes
0
Countries
0
End Users
0
Portfolio
0
Param LLM
0
Data Lake
0
Trained
About

Two decades of shipping AI systems
that survive contact with reality.

Most AI and digital technologies never ship. They get presented in board decks, run in IDEs, make it through pilots, and then quietly disappear when the real constraints show up: the messy data, the interoperability challenges, the governance requirements, the stakeholders who all have to say yes, the infrastructure that was never built to run a model in production.

Over the past two decades, I have come to believe that the problem is rarely the data silos, the algorithms, or the infrastructure. It is almost always the absence of one person who can sit in the board conversation about AI strategy and then go and design the architecture the next morning, without losing anything in the translation.

Someone who has held the P&L, navigated the institutional politics, put smiles on partners' faces, and still knows exactly why their team's retrieval pipeline is underperforming or why the loss function they landed on is working against them.

That person is harder to find than any technology. And every single morning, I ask myself how much of that person I still have left to become.

Director of IMACS (An AI Centre of Excellence).
Co-Principal Investigator, AI for disease forecasting using satellite data (funded by NASA Earth Sciences).
PhD Researcher in AI | MBA | MDataSci | MS AI/ML
40 Under 40 Outstanding Leadership Award Digital Transformation Leader
Capabilities

What I Lead and Build

Six domains spanning model training through production deployment.

Large Language Models

End-to-end LLM engineering: domain-adaptive pre-training, SFT on 500K+ instruction pairs, DPO alignment with hard negative mining, Mixture of LoRA Experts across 6 scientific domains.

LoRA / QLoRADPORLHFMoLEvLLM

Multi-Agent Systems

Fan-out / fan-in agentic pipelines with 6+ specialised agents. MCP server architecture. Streaming SSE output. Real-time orchestration with graceful degradation.

MCPLangChainReActSSE

RAG and Knowledge Systems

Hybrid retrieval (dense + BM25) with Reciprocal Rank Fusion. Cross-encoder re-ranking. 250M+ evidence spans in OpenSearch. GraphRAG for structured knowledge.

OpenSearchFAISSRRFGraphRAG

Enterprise Data Platforms

50TB+ S3 data lakes with Apache Iceberg, 700M+ row analytical stores, Athena serverless SQL, 24 EventBridge ingestion pipelines. Query time from 8s to <400ms.

IcebergAthenaGlueAirflow

Production ML Systems

Real-time ML inference at 30+ geography scale. P95 latency <150ms. Champion-challenger deployment. Canary rollouts with automated quality gates.

SageMakerMLOpsONNXDrift

AI Governance and Strategy

AI governance frameworks for regulated environments across 17 countries. Responsible AI embedded in pipelines. National expert committee leadership. $100M+ portfolio accountability.

GovernanceResponsible AIStrategy
Selected Work

Selected Work

Systems operating at scale across multiple countries under governance constraints.

Flagship Platform

SAGE: Scientific Advisory and Guidance Engine

A 397B-parameter domain-adapted foundation model trained through a 4-stage pipeline: domain-adaptive pre-training, supervised fine-tuning on 596K instruction pairs, DPO alignment on 192K preference pairs with 3-tier hard negative mining, Mixture of LoRA Experts across 6 scientific domains.

397BParameters
6Domain Adapters
<3sFast Path SLA
12Countries Live
Foundation 397BMoLEDPOQLoRAOpenSearch AOSSSSE Streaming
// SAGE Multi-Agent Pipeline
Router → Claude Opus 4  ·  RAG → OpenSearch, 250M+ spans  ·  Foundation → SAGE-397B
Data → 50TB+ S3, 700M+ rows Iceberg, 250M+ evidence spans, 30M+ causal relations
SLAs → fast path <3s · full report 45-90s · dashboard P95 <150ms
Upcoming 2026

ARK Platform

AI intelligence platform spanning 190+ countries for health, climate, and development finance. Funded by McGovern Foundation and Amazon.

190+Countries
750M+Records
Data Infrastructure

Global Evidence Data Platform

Enterprise-grade data lake unifying 100M+ records across 17 national health systems. Iceberg on S3. 24 EventBridge pipelines. P95 from 8s to <400ms.

50TB+Data Lake
700M+Rows
Systems

Architecture I Built

Production systems engineering: foundation model training, data platforms, and multi-agent orchestration.

Foundation Model / 397B MoE (512 experts, 17B active/token)
Six-Phase Domain Adaptation: CPT, ORPO, SPIN, RLAIF, Router Calibration
PHASE 0 Data Curation Cross-domain linking Curriculum design PHASE 1: CPT 500M+ passages 3-stage curriculum: warm-up, domain, cross-join PHASE 2: ORPO Merged SFT + preference No reference model needed 900K+ training pairs PHASE 3: SPIN Self-play fine-tuning 3 iterations, model beats its own prior outputs PHASE 4: RLAIF DAPO (group-relative) Multi-dim reward: accuracy+cite+reason PHASE 5 Expert Router Cal. MoE-Sieve profiling 512 expert activation maps BASE: Qwen 3.5-397B-A17B 512 experts, Gated DeltaNet hybrid attention, 262K context FP8 LoRA + FSDP ZeRO-3 on 8x g7e (768 GB VRAM) LOSS FUNCTIONS CPT: L = -sum log P(x_t|x_<t) | ORPO: L = L_SFT + beta.L_OR | SPIN: self-play DPO DAPO: group-relative advantage, clip-higher, token-level PG, dynamic sampling CORE INNOVATION: CROSS-DOMAIN QUANTITATIVE REASONING Model trained on climate x health x finance cross-joins from 1B+ structured records. Not just text: grounded in actual time-series data. Cross-domain experts activated simultaneously for multi-hop causal chains (e.g., temperature anomaly -> cholera incidence -> WASH funding gaps). 12 VALIDATION GATES (fail-fast at every boundary) G0: data counts G1: loss decreasing G2: perplexity -10% G3: knowledge probe 60%+ G4: reward margins G5: eval loss <2.0 G6: coherence G7: loss floor [0.15,1.5] G8: SPIN iteration gain G9: no regression G10: reward positive G11: cross-domain 70%+ RLAIF reward = 0.35 accuracy + 0.25 citation verifiability + 0.20 causal reasoning + 0.20 completeness (judged by frontier model via Bedrock)
Data Platform / 50TB+ Lakehouse / 85+ Organizations / 9 Domains
Three-Layer Architecture: Raw, Silver, Gold
150+ SOURCES (REST APIs, Bulk Files, XML, NetCDF, CDC, Event Streams) 85+ orgs / 20+ EventBridge rules Climate 1.3B+ Health 290M+ Economics 120M+ Population Nutrition WASH Governance Education Environment Extensible CANONICAL PIPELINE Typed adapters Schema validation Dedup WAL checkpoint Parallel runner Partition (source_org, year) RAW LAYER S3 landing zone Original formats preserved: JSON, CSV, XML, NetCDF, PDF Schema-on-read, full lineage, immutable audit trail partitioned by source_org / ingestion_date SILVER LAYER 180+ Glue tables Parquet + Snappy, typed schemas, deduplication Canonical columns: indicator, value, geo, period, source 700M+ rows / Athena serverless SQL GOLD LAYER Apache Iceberg ZSTD compression, ACID transactions, time-travel 40K+ indicators, 58K+ geo entities, 1800-2100 span Athena v3 / partition pruning / snapshot rollback VECTOR STORE 250M+ embeddings / 1024d / HNSW Hybrid BM25 + dense rerank, kNN retrieval KNOWLEDGE GRAPH 30M+ causal relations / entity-linked BM25 retrieval, relation triples, GraphRAG ANALYTICAL VIEWS Cross-domain materialized views Dashboards, P95 <400ms, training data gen DATA ESTATE 50TB+ S3 lakehouse across 9 domains. Temporal coverage 1800-2100. Automated weekly ingestion via EventBridge + canonical pipeline. Cross-domain joins power the foundation model: climate x health x economics x population for multi-hop causal reasoning.
AI Platform / 6-Agent System
Evidence Intelligence Orchestration
User Query INTENT ROUTER Synthesis Evidence fusion KG Reasoning Causal chains Domain Expert Specialist routing RETRIEVAL BM25 kNN Hybrid rerank KG traverse Vector + structured + graph Evidence Grounding + Citation GROUNDED RESPONSE
Infrastructure / 8-GPU Cluster
Distributed Training on AWS
torchrun / DDP
GPU 0
GPU 1
...
GPU 7
NF4 model
NF4 model
...
NF4 model
LoRA + optim
LoRA + optim
...
LoRA + optim
NCCL All-Reduce / EFA
effective_batch = per_device x grad_accum x 8
Checkpoint-resume on spot interruption
Rank 0 saves to EBS + S3
Each GPU sees 1/8 of data shards
Published Research / Deep Generative Time Series
SPECTRA: Adversarial Climate-Disease Forecasting with Transfer Learning
TRAINING PIPELINE: VAE > FEATURE ENGINEERING > GAN Climate Encoder Conditional beta-VAE Meteorological variables -> latent z beta-term penalizes deviation from N(0,1), limits noise sensitivity L = recon + beta.KL(q(z|c)||p(z)) Feature Engineering Latent variables from VAE 1, 3, 6-month lagged features Rolling means + std (3, 6 month) Seasonal (sin/cos) + country embed Linear noise projection Generator (Forecaster) Causal TCN LSTM (sequence memory) Multi-Head Self-Attention Projection Head -> P(d|c) Outputs full predictive distribution adversarial Critic LSTM + FC layer Same config as Generator Distinguishes real disease sequences from generated WGAN gradient penalty COMPOSITE LOSS (distribution-aware) L = L_NLL + lambda_q.L_quantile + lambda_f.L_feature_match + lambda_a.L_WGAN-GP NLL for likelihood, quantile for calibrated uncertainty, feature matching for stability, WGAN-GP for distribution fidelity TRANSFER LEARNING (cross-geography) Pre-trained generator weights transferred to country-specific models Validated across 8 countries. Outperforms supervised baselines on accuracy and probabilistic calibration. KEY CONTRIBUTIONS 1. Decouples representation learning (VAE) from sequence modeling (GAN) to handle ill-posed climate-disease inference 2. Addresses non-Markovian, non-stationary disease signals via causal TCN + LSTM + multi-head self-attention 3. Produces full predictive distributions (not point estimates) with calibrated uncertainty bounds
Knowledge

Engineering Perspectives

Architecture selection, loss functions, model families, and design patterns from hands-on R&D.

Architecture Selection
Generative Model Taxonomy: When to Use What
Autoregressive P(x) = prod P(x_t | x_<t) GPT, Llama, Qwen, Gemma Best: text generation, code, reasoning, chat, agents Diffusion x_t = sqrt(a_t).x_0 + noise DDPM, Stable Diffusion, DALL-E 3 Best: image synthesis, video, controllable generation GAN min_G max_D V(D,G) StyleGAN, CycleGAN, Pix2Pix Best: fast inference, style transfer, super-resolution VAE ELBO = E[log p] - KL(q||p) VAE, VQ-VAE, Beta-VAE Best: latent representation, disentangled features, compression Normalizing Flow p(x) = p(z)|det dz/dx| RealNVP, Glow, Flow Matching Best: exact likelihood, density estimation, invertible transforms SELECTION GUIDE Need text/code/reasoning? Autoregressive (decoder-only transformer). Scale with compute. Use RLHF/DPO for alignment. Need high-fidelity images? Diffusion (quality) or GAN (speed). Diffusion dominates for quality; GAN for real-time. Need structured latent space? VAE. Need exact density? Flow. Need both generation + understanding? Multimodal autoregressive. State space models (Mamba, RWKV) emerging for long-sequence efficiency. Hybrid transformer-SSM architectures gaining traction.
Optimization
Loss Function Selection
LossFormulaWhen
Cross-Entropy-sum y.log(p)Classification, LM
NLL-log P(x|theta)Sequence modeling
KL Divergencesum p.log(p/q)VAE, distillation
Wassersteininf E[|x-y|]WGAN, Earth mover
Spectral||sigma(W)||GAN stabilization
Contrastive-log(sim+/sim-)Embeddings, CLIP
Tripletmax(d+-d-+m,0)Metric learning
Focal-a(1-p)^g.log pImbalanced data
DPO-log sigma(b.dr)Preference tuning
Hingemax(0, 1-y.f)SVM, GAN variants
Choose based on the signal: classification uses CE, generation uses NLL, alignment uses DPO, representation learning uses contrastive/triplet, distribution matching uses KL/Wasserstein.
Model Selection
Open-Source FM Comparison
FamilySizesStrengthUse Case
Llama 38-405BGeneral, codeBroad reasoning
Qwen 2.50.5-72BMultilingualDomain adaptation
Mistral7-8x22BMoE efficiencyCost-sensitive
DeepSeek7-236BMath, codeTechnical tasks
Gemma 22-27BCompact, fastEdge, mobile
Phi 3/43-14BSmall but capableOn-device
Command R+35-104BRAG, groundingEnterprise RAG
Yi6-34BLong contextDocument analysis
Selection Criteria
1. Task type (reasoning, code, chat, RAG)
2. Context length requirement
3. Deployment constraint (GPU budget)
4. Multilingual needs
5. Fine-tunability (license, LoRA support)
6. MoE vs dense (latency vs throughput)
Multimodal Systems
Vision-Language Model Architecture Patterns
PATTERN 1: LATE FUSION (CLIP-style) Vision Encoder ViT / SigLIP Text Encoder Transformer contrastive PATTERN 2: CROSS-ATTENTION (Flamingo-style) Vision Enc. Frozen ViT LLM Decoder Cross-attn layers PATTERN 3: EARLY FUSION (LLaVA-style) Vision ViT patches Projector LLM Decoder Visual tokens in seq TRADE-OFFS Late fusion: Separate encoders, contrastive alignment. Fast retrieval. No deep cross-modal reasoning. Cross-attention: Vision features attend into LLM layers. Strong reasoning. Higher compute per layer. Flamingo, Idefics. Early fusion: Project visual patches into token space. Simple, scalable. State of the art. LLaVA, Qwen-VL, InternVL. Audio/video: Same patterns + temporal encoders. Whisper for audio. Frame sampling + 3D convolution for video.
Agent Engineering
Reasoning Agent Patterns
REACT LOOP Thought: reason about task Action: call tool / API Observe: parse result Repeat until done Best: tool-use agents, API orchestration, web browsing MULTI-AGENT Router: classify intent Dispatch: specialist agent Aggregate: merge results Ground: add citations Best: complex domains, evidence synthesis, QA CHAIN-OF-THOUGHT Decompose -> step by step Self-consistency: sample N Majority vote for answer Best: math, logic, planning RAG AGENT Query rewrite -> retrieve Rerank -> filter relevance Generate with context Best: grounded generation, enterprise knowledge DESIGN PRINCIPLES Agents should fail gracefully, with bounded retries and escalation paths. Separate intent routing from execution. Use typed tool schemas. Log every step.
Systems Engineering
Real-Time Data + Load Balancing
Stream Processing
Source -> Ingest (Kafka/SQS/Kinesis)
-> Transform (Flink/Lambda)
-> Serve (OpenSearch/Redis)
-> Alert (threshold + anomaly)
Back-pressure handling: producer slows when consumer queue depth exceeds threshold. Dead-letter queues for poison messages.
Load Balancing Strategies
Round-robin: equal distribution
Weighted: capacity-aware routing
Least-connections: latency-optimal
Consistent hash: session affinity
Rate limiting: token bucket / leaky
Resilience Patterns
Circuit breaker: fail-open after N errors
Bulkhead: isolate failure domains
Retry with jitter: exp backoff
Timeout cascade: strict per-hop SLAs
Graceful degradation: serve stale data
Design for failure. Every external call has a timeout. Every queue has a DLQ. Every service has health checks.
Open Source

All Repositories

56 public repositories across personal and organization accounts. Production systems, research platforms, and foundational tools.

Showing 56 of 56 repositories
Engineering Activity

Contribution History

17 years of continuous open source development across AI, epidemiology, and global health platforms.

0
Total Contributions
17
Active Years
0
Peak Year
0
Last 30 Days

Since 2009

Annual contributions. Growth from academic research through clinical AI, pandemic response, and production AI platform engineering.

Last 30 Days

Daily engineering velocity across active repositories.
Institutional Partnerships

Who I Have Built With

Technical roadmaps that became funded, multi-year programs.

NASA
Strategic Technology Partnership
The Global Fund
Health Data Infrastructure
McGovern Foundation
ARK Platform Funder
Amazon
Environmental Equity Initiative
UCL
UKRI/MRC Research, GBP 2M
Ministry of Health India
$116M National Program, 8 States
Global Reach

17 Countries

Each highlighted country represents a program I designed, built, or delivered. Technical and strategic leadership on the ground.

India
Bangladesh
Indonesia
Zambia
Rwanda
Ethiopia
DR Congo
Central African Rep.
Niger
Sierra Leone
Egypt
Jordan
Lebanon
Oman
Iraq
Chile
United Kingdom
Technical Stack

Tools I Build With

Production-grade proficiency. Primary tools highlighted.

AI / LLM
Claude / AnthropicMistralLoRA / QLoRADPO / RLHFLangChainLlamaIndexvLLMHuggingFaceONNX
Agentic / Dev Tools
Claude CodeMCP ServersCursorLovablev0 by VercelWindsurfReplitGitHub Copilot
Cloud / Infrastructure
AWS SageMakerAWS BedrockFargateGlueAthenaEventBridgeAzureKubernetesTerraformDocker
Data Engineering
Apache IcebergOpenSearchSparkAirflowFlinkFAISSKafkadbt
ML / Research
PyTorchTensorFlowXGBoostscikit-learnSHAPMLflowPythonTriton
Contact

Get in Touch

Open to conversations about AI engineering, foundation models, health systems, and climate intelligence.