Deepak Inugala — Senior SRE · MLOps / LLMOps

About

I build and run the infrastructure that makes AI work at scale — from bare-metal GPU clusters to production Kubernetes platforms serving government, intelligence, and smart-city workloads. Based in Abu Dhabi, UAE.

As Senior Site Reliability Engineer and DevOps Lead at Group 42 (G42), I lead a 6-person SRE team responsible for cloud and on-premises platforms across Azure, AWS, and air-gapped environments — delivering to AI, GOVINT, OSINT, RND, and Smart Nation verticals, with cross-border hypercare across multiple countries.

Over the past year I've expanded deep into MLOps and LLM infrastructure — designing production-grade LLM serving stacks on NVIDIA H100/H200 GPU clusters, including full-stack deployments of 72B-parameter models with vLLM tensor parallelism, observability via Prometheus + Grafana + DCGM Exporter, and enterprise API gateways using LiteLLM.

Core focus areas: Kubernetes platform engineering (AKS, RKE2, air-gapped, MIG-partitioned GPU nodes) · LLM / MLOps (vLLM, Triton, Ray Serve, MLflow, Kubeflow) · Cloud platforms (Azure, AWS) with FinOps and cost optimisation · GitOps & CI/CD (ArgoCD, Flux, GitLab CI, Azure DevOps) · SRE practices (SLO/SLA, incident management, chaos engineering) · AI agent platforms (OpenClaw, n8n, LangGraph, Qdrant RAG).

Technical Skills

LLM Serving & Inference

vLLM
PagedAttention
Tensor Parallelism
AWQ / GPTQ
NVIDIA Triton
TensorRT
ONNX Runtime
LiteLLM Gateway
Ollama
Ray Serve
BentoML

Agentic AI & RAG

LangGraph
LangChain
AutoGen
Model Context Protocol (MCP)
n8n
OpenClaw
NemoClaw
Qdrant
Weaviate
Open WebUI
Tool-use pipelines
Human-in-the-loop

MLOps Toolchain

MLflow
Kubeflow Pipelines
Argo Workflows
Seldon Core
DVC
Weights & Biases
Feast
MinIO
HuggingFace Datasets
Kyverno Model Governance

Kubernetes & Cloud

AKS
EKS
RKE2
Rancher
Helm
ArgoCD
Argo Rollouts
Kustomize
Azure
AWS
G42 Cloud
Harbor
Docker / Containerd

GPU & HPC

NVIDIA H100 / H200 / A100
MIG (GPU Operator)
DCGM Exporter
KubeRay
CUDA / cuDNN
InfiniBand / NVLink
NCCL Tuning

Observability & Security

Prometheus
Grafana (30+ dashboards)
Loki
Fluent Bit
Alertmanager
Distributed Tracing
Falco
Trivy
OPA Gatekeeper
Kyverno
Keycloak OIDC
Zero-Trust NetworkPolicies

Air-Gapped LLM Ops

HF Hub Mirroring (offline)
Harbor Registry Mirrors
Data Diode Transfer
AWQ Size Reduction
Internal MinIO Model Registry
Sovereign Clusters

IaC & Scripting

Terraform
Ansible
GitLab CI/CD
GitOps
Python (LangGraph / MLflow / Kubeflow SDKs)
Bash

Experience

03/2023 – Present Abu Dhabi, UAE · International
Senior Site Reliability Engineer · DevOps Lead

Group 42 (G42)
- Lead a 6-member SRE team delivering cloud and on-premises platform engineering across AI, GOVINT, OSINT, RND, and Smart Nation verticals.
- Designed and deployed production LLM serving on NVIDIA H100/H200 GPU clusters — vLLM with tensor parallelism, 7B to 72B models, AWQ 4-bit quantization (144GB → 36GB), continuous batching, prefix caching; P99 TTFT < 200 ms at 1,000+ concurrent requests.
- Built a full GPU observability stack — Prometheus, Grafana, DCGM Exporter — with real-time visibility into GPU utilization, memory pressure, and inference throughput; 30+ dashboards.
- Architected and managed multi-cluster Kubernetes (AKS, RKE2) including air-gapped on-premises deployments with MIG-partitioned H100/H200 GPU nodes for multi-tenant ML workloads.
- Deployed enterprise AI platform components — LiteLLM API gateway, Open WebUI, OpenClaw AI agent platform, Mattermost, shared PostgreSQL backend — containerized on isolated bridge networks.
- Designed LangGraph multi-agent SRE automation (supervisor + retrieval + execution + validation agents) with human-in-the-loop gates — cut overnight on-call interventions by 40%; MCP tool servers reduced operational toil by 50%.
- Implemented GitOps-driven platform lifecycle with ArgoCD and Flux across dev/staging/prod; led SRE practice adoption — SLO/SLA definition, error budgets, incident runbooks, chaos engineering.
- Managed Azure Entra ID RBAC, service-principal governance, and least-privilege access for enterprise applications and CI/CD pipelines.
- Delivered cross-border infrastructure projects with international client teams from architecture through hypercare.
07/2019 – 03/2023 United Arab Emirates
Site Reliability Engineer

Group 42 (G42)
- Designed and implemented CI/CD pipelines on Kubernetes with Terraform and Docker — significantly reduced deployment times and improved release quality.
- Architected scalable, cost-effective cloud infrastructures on AWS and Azure, deploying AI/ML services with high availability and fault tolerance.
- Automated infrastructure provisioning and configuration with Ansible and Terraform; designed Azure security solutions aligned with compliance requirements.
- Deployed and managed the ELK stack (Elasticsearch, Logstash, Kibana) for centralized logging and real-time monitoring.
- Configured and managed AWS services — EC2, S3, RDS, Lambda, ECS, EKS, Load Balancers — with CloudWatch, CloudTrail, Prometheus, and Grafana for observability.
- Handled Linux system administration and database administration (backup/recovery, performance tuning, security) across production environments.
01/2018 – 07/2019 Abu Dhabi, UAE
Cloud Engineer

First Abu Dhabi Bank (FAB)
- Delivered DevOps and cloud engineering for the FGB–NBAD banks migration and integration — one of the UAE's largest banking mergers — supporting mission-critical financial infrastructure under strict regulatory and availability requirements.
- Managed enterprise Azure cloud infrastructure with Terraform and Ansible for repeatable, audit-compliant environments.
- Designed and maintained CI/CD pipelines for multiple development teams; implemented Docker + Kubernetes containerization strategies.
- Contributed to Linux hardening, security patching, and compliance alignment for regulated banking workloads; coordinated on RBAC, network policy, and secrets management.
05/2017 – 01/2018 Abu Dhabi, UAE
Technical Specialist

HCL Infosystems Ltd.
- Delivered a software-defined data centre transformation for Daman — Abu Dhabi's government-affiliated health insurance provider — including migration of existing Linux, Windows, and virtualization workloads to the new cloud-enabled data centre.
- Implemented, administered, and managed Oracle Virtual Manager (OVM); owned installation, operational management, and capacity expansion.
- Authored service reports covering executed tasks, findings, and solutions for the client.
08/2015 – 04/2017 Bengaluru, India
Linux Engineer

HCL Infosystems Ltd.
- Operated India's largest Biometric Data Centre for UIDAI (Unique Identification Authority of India) — 3,000+ physical servers (IBM, HP, Dell blade and rack) and 800+ virtual machines.
- Administered Linux across CentOS, Ubuntu, and Red Hat; fine-tuned system parameters for workload-specific performance; automated routine operations with Bash and Python.
- Hardened Linux servers — access controls, firewalls, kernel patching — and diagnosed network issues with Wireshark, tcpdump, and ncat.
- Managed VMware vSphere (vCenter, ESXi, clusters) — VM lifecycle, resource optimization, datastores, storage policies, and storage vMotion.
10/2012 – 06/2013 Hyderabad, India
Technical Support Engineer

Polaris
- Provided L2 technical support for Videocon d2h — an Indian DTH pay-TV operator — troubleshooting Linux-related incidents to keep services running smoothly.
- Administered Linux servers (CentOS, Ubuntu, Red Hat) — installation, configuration, user/access management, and application troubleshooting.
- Wrote shell scripts to automate routine tasks and maintained technical documentation, SOPs, and troubleshooting guides.

AI & Agents — Production Focus

Selected production AI/agentic systems I've built & operated. All run on Kubernetes with GPU scheduling, OIDC, NetworkPolicy isolation, and full observability. Three have full case studies.

vLLM Multi-Model Serving Platform

Production vLLM on H100/H200 clusters with AWQ 4-bit quantization, tensor parallelism, continuous batching, prefix caching. HPA on DCGM GPU metrics; P99 TTFT < 200 ms @ 1k+ concurrent.

vLLM
H100/H200
AWQ
HPA
DCGM

Read case study →

LangGraph SRE Automation Agents

Stateful multi-agent workflow: supervisor routes Prometheus alerts to retrieval (Qdrant RAG), execution (kubectl/Ansible RBAC), and validation agents. Human-in-the-loop on destructive ops. −40% overnight on-call.

LangGraph
Qdrant
RAG
HITL

Read case study →

MCP Tool Servers for Internal APIs

Model Context Protocol servers on Kubernetes exposing Kubernetes API, GitLab, Prometheus, and Jira as structured tool endpoints for LLM agents. Enables agent-driven triage & remediation.

MCP
Kubernetes
OAuth2

LiteLLM OpenAI-Compatible Gateway

Unified OpenAI-compatible API fronting multiple vLLM backends with load balancing, per-team rate limits, cost tracking, API keys, and fallback routing. Consumed by Open WebUI, n8n, LangChain, and Mattermost bots.

LiteLLM
Cost Tracking
Fallback

Air-Gapped Sovereign LLM Ops

Complete offline pipeline: HF snapshot download + AWQ quantize on bastion → data diode transfer → Harbor + internal MinIO registry → isolated K8s deploy. Delivered for sovereign/GOVINT clients.

Offline HF
Harbor
MinIO
Data Diode

Read case study →

End-to-End MLOps Pipeline

Kubeflow + Argo Workflows DAGs: ingestion → KubeRay fine-tuning → MMLU/HumanEval gates → MLflow registration → ArgoCD vLLM rollout. Canary via Argo Rollouts with Prometheus analysis gates. Weeks → hours.

Kubeflow
Argo
MLflow
ArgoCD
Seldon

Triton Embedding & Encoder Serving

NVIDIA Triton with ONNX Runtime + TensorRT (2–3× speedup), dynamic batching, ensembles for pre/post-processing. Sub-50 ms gRPC retrieval powering the SRE RAG knowledge base.

Triton
TensorRT
ONNX
gRPC

LLM Observability Stack

vLLM Prometheus metrics (TTFT, num_requests_running, KV-cache) + DCGM GPU metrics → 30+ Grafana dashboards. Alertmanager rules for TTFT degradation, KV-cache exhaustion, GPU hang. Distributed tracing across agents.

Prometheus
Grafana
DCGM
Loki

Certifications

Azure Administrator Associate (AZ-104) · Valid 02/2026 – 07/2027
Azure Solutions Architect Expert (AZ-305) · Valid 07/2025 – 07/2027
AWS Solutions Architect – Associate · ID PFR7HTQ25EFQQS9T
Certified Kubernetes Administrator (CKA) · Renewal in Progress
G42 Cloud Certified Engineer · ID G42C/SVD/CRT/0475
Red Hat Certified Engineer (RHCE) · ID 150-012-904
Red Hat Certified System Administrator (RHCSA)
Cloudera Hadoop Administrator
GitOps Certified Fundamentals
Cloud Architecture: Design Decisions

Education

Master's — Business Administration & Management (MBA) Jawaharlal Nehru Technological University · 10/2012 – 04/2015
Bachelor's — Computer Software Engineering Jawaharlal Nehru Technological University · 08/2008 – 06/2012

Languages

English (Full Professional) · Hindi (Full Professional) · Telugu (Native / Bilingual)

Resumes — Role-Tailored

Pick the variant that best matches the role you're hiring for. Each PDF is a complete, one-page-style CV optimized for that role family.

AI

Let's talk

Open to Staff / Principal AI Platform & LLMOps roles, Senior SRE Engineer, Senior DevOps Engineer, and Senior Infrastructure Engineer — globally.

Building & running the infrastructure that makes AI work at scale.

About

Technical Skills

LLM Serving & Inference

Agentic AI & RAG

MLOps Toolchain

Kubernetes & Cloud

GPU & HPC

Observability & Security

Air-Gapped LLM Ops

IaC & Scripting

Experience

Senior Site Reliability Engineer · DevOps Lead

Site Reliability Engineer

Cloud Engineer

Technical Specialist

Linux Engineer

Technical Support Engineer

AI & Agents — Production Focus

vLLM Multi-Model Serving Platform

LangGraph SRE Automation Agents

MCP Tool Servers for Internal APIs

LiteLLM OpenAI-Compatible Gateway

Air-Gapped Sovereign LLM Ops

End-to-End MLOps Pipeline

Triton Embedding & Encoder Serving

LLM Observability Stack

Certifications

Education

Languages

Resumes — Role-Tailored

LLMOps / MLOps Engineer

Senior SRE Engineer

Team Lead — SRE

Senior DevOps Engineer

Senior DevSecOps Engineer

Senior Cloud Engineer

Senior Kubernetes Administrator

Senior HPC Engineer

Let's talk