← Case Studies · Sovereign AI
Air-Gapped Sovereign LLM Ops
End-to-end pipeline to deploy modern LLMs into fully isolated, sovereign environments — from internet-connected bastion through data diode transfer to an air-gapped Kubernetes cluster with its own model registry. Deployed for GOVINT clients.
- Air-Gapped
- Offline HF Mirror
- AWQ
- Data Diode
- Harbor
- MinIO Model Registry
- Kyverno
- Sovereign
The problem
Sovereign and GOVINT customers cannot reach the public internet — no direct HuggingFace pulls, no docker.io, no ghcr.io, no GitHub Releases. They still need the latest open-weight models, reproducible deploys, and the ability to audit what's running in production. The entire supply chain has to be transferred across an air gap with integrity guarantees.
Pipeline
┌──────────────────────────────────────────────┐
│ BASTION (internet-connected, staging) │
│ • HF snapshot_download (HF_HUB_OFFLINE=0) │
│ • AutoAWQ 4-bit quantization │
│ • MMLU / HumanEval validation gates │
│ • Package: weights + tokenizer + configs │
│ • Sign + checksum │
└────────────────────┬─────────────────────────┘
│ (one-way)
▼ data diode
┌──────────────────────────────────────────────┐
│ AIR-GAPPED SIDE │
│ • Harbor registry (nvcr.io/docker.io mirror)│
│ • MinIO model registry (versioned artefacts)│
│ • MLflow Registry (internal) │
│ • Kyverno admission policy: │
│ reject Deployments referencing │
│ unregistered model paths │
│ • vLLM / Triton on isolated K8s cluster │
└──────────────────────────────────────────────┘
Implementation highlights
- Offline HuggingFace mode: on the production side,
HF_HUB_OFFLINE=1andTRANSFORMERS_OFFLINE=1are enforced — every artefact (model, tokenizer, config, chat template) must be present locally or the pod refuses to start. - AWQ before transfer: quantizing on the bastion reduces the bytes that have to cross the diode (144GB → 36GB for a 72B model) and removes the need for GPU quantization tooling on the air-gapped side.
- Quality gates: quantized checkpoints run through MMLU and HumanEval benchmarks on the bastion; a quality regression above the configured threshold blocks transfer.
- Registry mirrors: Harbor mirrors
nvcr.io,docker.io, andghcr.io— internal deployments pull only from Harbor. - MinIO as model registry: versioned, checksummed artefact storage with MLflow Registry pointing at MinIO. Provenance is auditable.
- Kyverno enforcement: admission policy rejects any vLLM Deployment referencing a model path that isn't registered — closing the "someone sideloaded a model" audit gap.
Threat & compliance model
- Full provenance chain: model → quantization → validation → transfer → registry → deploy, all logged with immutable audit trail.
- Zero-trust NetworkPolicies inside the cluster; egress denied by default.
- Keycloak OIDC for human access; short-lived tokens; no shared kubeconfigs.
- Falco + Trivy + OPA Gatekeeper enforce runtime and build-time controls.
Outcomes
- Sovereign clients able to run current open-weight LLMs without exposure to the public internet.
- Reproducible, auditable pipeline — what's in production matches what was validated, byte for byte.
- Same developer experience on the air-gapped side as in cloud environments.
- Pattern reused across GOVINT, OSINT, and Smart Nation verticals at G42.