Case Studies
Production AI systems — deep dives
Architecture notes and outcomes from real deployments. Engineered for recruiters and hiring managers who want depth, not just buzzwords.
AI
vLLM Multi-Model Serving on H100/H200
7B–72B models, AWQ 4-bit, continuous batching, P99 TTFT < 200 ms @ 1k+ concurrent. Production architecture + observability.
AILangGraph Multi-Agent SRE Automation
Supervisor topology, Qdrant RAG, kubectl/Ansible execution, HITL approval gates. −40% overnight on-call.
AIAir-Gapped Sovereign LLM Ops
Offline HF mirror → AWQ quantize → data diode → Harbor + MinIO → isolated K8s. GOVINT-grade pipeline.