Case Studies

Production AI systems — deep dives

Architecture notes and outcomes from real deployments. Engineered for recruiters and hiring managers who want depth, not just buzzwords.

7B–72B models, AWQ 4-bit, continuous batching, P99 TTFT < 200 ms @ 1k+ concurrent. Production architecture + observability.

Supervisor topology, Qdrant RAG, kubectl/Ansible execution, HITL approval gates. −40% overnight on-call.

Offline HF mirror → AWQ quantize → data diode → Harbor + MinIO → isolated K8s. GOVINT-grade pipeline.