LLM Serving & Inference
- vLLM
- PagedAttention
- Tensor Parallelism
- AWQ / GPTQ
- NVIDIA Triton
- TensorRT
- ONNX Runtime
- LiteLLM Gateway
- Ollama
- Ray Serve
- BentoML
Senior SRE · DevOps Lead · MLOps / LLMOps · AI Infrastructure
Senior Site Reliability Engineer leading a 6-person SRE team at Group 42 (G42). 11+ years across bare-metal GPU clusters, production Kubernetes (AKS, RKE2, air-gapped, MIG-partitioned H100/H200), vLLM/Triton serving, MLOps pipelines, and sovereign LLM Ops delivered across multiple countries for government, intelligence, and smart-city workloads.
I build and run the infrastructure that makes AI work at scale — from bare-metal GPU clusters to production Kubernetes platforms serving government, intelligence, and smart-city workloads. Based in Abu Dhabi, UAE.
As Senior Site Reliability Engineer and DevOps Lead at Group 42 (G42), I lead a 6-person SRE team responsible for cloud and on-premises platforms across Azure, AWS, and air-gapped environments — delivering to AI, GOVINT, OSINT, RND, and Smart Nation verticals, with cross-border hypercare across multiple countries.
Over the past year I've expanded deep into MLOps and LLM infrastructure — designing production-grade LLM serving stacks on NVIDIA H100/H200 GPU clusters, including full-stack deployments of 72B-parameter models with vLLM tensor parallelism, observability via Prometheus + Grafana + DCGM Exporter, and enterprise API gateways using LiteLLM.
Core focus areas: Kubernetes platform engineering (AKS, RKE2, air-gapped, MIG-partitioned GPU nodes) · LLM / MLOps (vLLM, Triton, Ray Serve, MLflow, Kubeflow) · Cloud platforms (Azure, AWS) with FinOps and cost optimisation · GitOps & CI/CD (ArgoCD, Flux, GitLab CI, Azure DevOps) · SRE practices (SLO/SLA, incident management, chaos engineering) · AI agent platforms (OpenClaw, n8n, LangGraph, Qdrant RAG).
Group 42 (G42)
Group 42 (G42)
First Abu Dhabi Bank (FAB)
HCL Infosystems Ltd.
HCL Infosystems Ltd.
Polaris
Selected production AI/agentic systems I've built & operated. All run on Kubernetes with GPU scheduling, OIDC, NetworkPolicy isolation, and full observability. Three have full case studies.
Production vLLM on H100/H200 clusters with AWQ 4-bit quantization, tensor parallelism, continuous batching, prefix caching. HPA on DCGM GPU metrics; P99 TTFT < 200 ms @ 1k+ concurrent.
Stateful multi-agent workflow: supervisor routes Prometheus alerts to retrieval (Qdrant RAG), execution (kubectl/Ansible RBAC), and validation agents. Human-in-the-loop on destructive ops. −40% overnight on-call.
Model Context Protocol servers on Kubernetes exposing Kubernetes API, GitLab, Prometheus, and Jira as structured tool endpoints for LLM agents. Enables agent-driven triage & remediation.
Unified OpenAI-compatible API fronting multiple vLLM backends with load balancing, per-team rate limits, cost tracking, API keys, and fallback routing. Consumed by Open WebUI, n8n, LangChain, and Mattermost bots.
Complete offline pipeline: HF snapshot download + AWQ quantize on bastion → data diode transfer → Harbor + internal MinIO registry → isolated K8s deploy. Delivered for sovereign/GOVINT clients.
Kubeflow + Argo Workflows DAGs: ingestion → KubeRay fine-tuning → MMLU/HumanEval gates → MLflow registration → ArgoCD vLLM rollout. Canary via Argo Rollouts with Prometheus analysis gates. Weeks → hours.
NVIDIA Triton with ONNX Runtime + TensorRT (2–3× speedup), dynamic batching, ensembles for pre/post-processing. Sub-50 ms gRPC retrieval powering the SRE RAG knowledge base.
vLLM Prometheus metrics (TTFT, num_requests_running, KV-cache) + DCGM GPU metrics → 30+ Grafana dashboards. Alertmanager rules for TTFT degradation, KV-cache exhaustion, GPU hang. Distributed tracing across agents.
English (Full Professional) · Hindi (Full Professional) · Telugu (Native / Bilingual)
Pick the variant that best matches the role you're hiring for. Each PDF is a complete, one-page-style CV optimized for that role family.
vLLM, Triton, LangGraph, MCP, Kubeflow, MLflow, air-gapped LLM Ops.
SLI/SLO, incident response, Prometheus, Grafana, automation, on-call.
Team leadership, capacity planning, reliability roadmaps, mentoring.
CI/CD, GitOps (ArgoCD), Terraform, Ansible, Helm, Kubernetes platform.
Falco, Trivy, OPA/Kyverno, Keycloak, zero-trust NetworkPolicies, supply chain.
Azure, AWS, G42 Cloud, AKS/EKS, Terraform, cost & governance.
RKE2, Rancher, GPU Operator, MIG, Longhorn, cluster lifecycle, upgrades.
NVIDIA H100/H200, InfiniBand, NCCL, KubeRay, distributed training.
Open to Staff / Principal AI Platform & LLMOps roles, Senior SRE Engineer, Senior DevOps Engineer, and Senior Infrastructure Engineer — globally.