Deploy Applications to Azure Kubernetes Service
Introduction
5 minAzure Kubernetes Service (AKS) is a managed Kubernetes cluster β Azure runs the control plane, you manage workloads. Deploy AI inference APIs, vector search services, and background processors as containers that automatically scale, survive crashes, and expose public endpoints via Azure Load Balancer. Four concepts to master: Pods, Deployments, Services, kubectl.
Create Kubernetes Deployment Manifests
10 minAKS Architecture: User β LoadBalancer Service β Pods in Deployment β ACR
1. Deployment Manifest Structure
A Deployment tells Kubernetes: "run N copies of this container image." Kubernetes maintains that exact state β restarting crashed pods, scheduling on healthy nodes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-inference-api
spec:
replicas: 2
selector:
matchLabels:
app: inference-api # Must match template.labels below
template:
metadata:
labels:
app: inference-api # Must match selector above
spec:
containers:
- name: api
image: myregistry.azurecr.io/inference-api:v1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "2Gi"
cpu: "1000m" # 1 CPU = 1000 millicores
limits:
memory: "4Gi"
cpu: "2000m" 2. Replicas for High Availability
- 1 replica β single point of failure. App is down while Kubernetes restarts it after a crash.
- 2 replicas β basic resilience. One can crash, the other keeps serving. Minimum for production.
- 3+ replicas β enables rolling updates with zero downtime. One updates while two keep serving traffic.
3. Resource Requests and Limits
- Requests = minimum guaranteed resources (scheduler uses this to pick a node). Set too low = pod on overloaded node = poor performance.
- Limits = maximum allowed. Pod is killed (OOMKill) if memory limit exceeded. Set too low for model inference = constant restarts.
4. Injecting Secrets via Kubernetes Secrets
kubectl create secret generic api-secrets --from-literal=api-key=your-secret-key
# Reference in Deployment manifest:
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: api-secrets # Secret object name
key: api-key # Key within the secret Expose Applications with Kubernetes Services
10 min1. Service Types β When to Use Each
| # | Type | Accessible From | Use For |
|---|---|---|---|
| 1 | ClusterIP (default) | Inside cluster only | Backend microservices: vector DB, embeddings service, worker |
| 2 | NodePort | Node IP + high port | Dev/test access without load balancer |
| 3 | LoadBalancer | Internet (public IP) | Production AI APIs accessible externally |
| 4 | ExternalName | Cluster β external DNS | Represent external services as K8s services |
2. LoadBalancer Service Manifest
apiVersion: v1
kind: Service
metadata:
name: inference-api-service
spec:
type: LoadBalancer
selector:
app: inference-api # Routes to pods with this label
ports:
- protocol: TCP
port: 80 # External port (clients connect here)
targetPort: 8080 # Container port (app listens here) 3. The Selector-Label Contract (Most Common Failure)
Service selector must exactly match pod labels. One typo = Service has no endpoints = silent traffic drop.
kubectl get pods --show-labels
kubectl describe svc inference-api-service | grep Selector 4. ClusterIP Internal DNS
ClusterIP services are reachable via stable DNS: servicename.namespace.svc.cluster.local. Your API pod calls http://vector-search-svc.default.svc.cluster.local:8080 even as pod IPs change.
Deploy, Verify, and Troubleshoot
10 min1. Deploy with kubectl apply (Idempotent)
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ./k8s/ # Apply entire directory apply creates if not exists, updates if exists. Safe to run multiple times.
2. Verify Status
kubectl get deployments # READY 2/2 = both replicas healthy
kubectl get pods # STATUS Running/Pending/CrashLoopBackOff
kubectl get svc # EXTERNAL-IP for LoadBalancer (may show pending) 3. Common Failure Modes and Fixes
- ImagePullBackOff β wrong image tag, registry URL, or missing pull credentials. Fix:
kubectl describe podto see exact error. - CrashLoopBackOff β app starts then crashes. Fix:
kubectl logs pod-namefor stack trace. Do NOT blindly restart. - Pending β can't schedule. Insufficient node resources. Fix:
kubectl describe podβ Events section shows reason. - No Endpoints β Service selector doesn't match pod labels. Fix:
kubectl describe svcandkubectl get pods --show-labels.
kubectl describe pod my-pod # Events, conditions, image details
kubectl logs my-pod # App stdout/stderr
kubectl logs my-pod --previous # Logs from PREVIOUS crashed container β‘ AKS Master Cheatsheet
kubectl apply -f deployment.yamlkubectl get podskubectl logs pod-namekubectl logs pod-name --previouskubectl describe pod pod-namekubectl get pods --show-labelskubectl describe svc svc-namekubectl get svc β EXTERNAL-IP columnaz aks create --attach-acr myregistryExercise β Deploy AI Inference API to AKS
30 min- Create AKS cluster with ACR integration:
az aks create --attach-acr myregistry - Write Deployment manifest with 2 replicas, resource requests/limits, and secret injection
- Write LoadBalancer Service manifest with correct selector
- Apply with
kubectl apply -f - Verify
EXTERNAL-IPand test API endpoint - Intentionally mismatch labels and observe the "no endpoints" failure
Knowledge Check
5 min- Q: Vector search service should only be reachable by pods inside the cluster. Which Service type? A: ClusterIP
- Q: Pod shows ImagePullBackOff. Most likely cause? A: Wrong image tag, registry URL, or missing pull credentials
- Q: Service has no endpoints. Where to look? A: Compare pod labels (
kubectl get pods --show-labels) with Service selector (kubectl describe svc) - Q: A pod keeps restarting with OOMKill. Fix? A: Increase the memory limit in the Deployment manifest
- Q: You need 3 replicas for zero-downtime rolling updates. Which field? A:
spec.replicas: 3in the Deployment manifest
Summary
2 minAKS manifests define desired state β Kubernetes makes it real. Set replicas: 2+ for production. Match Service selector to pod labels exactly. ClusterIP for internal services, LoadBalancer for external APIs. Diagnose in order: kubectl describe pod β kubectl logs β check selector/labels.
π§ Memory Tricks
Diagnostic flow: "Describe then Logs" β describe shows WHAT (events), logs shows WHY (app error)
Service type quick rule: Backend microservice = ClusterIP. Public AI API = LoadBalancer.
Failure cheatsheet: ImagePullBackOff=bad image/creds | CrashLoopBackOff=app crash | Pending=no resources | No Endpoints=selector mismatch
Azure Kubernetes Service (AKS)
π Key Facts
- ClusterIP β Internal only β backend services, vector DBs
- LoadBalancer β Public IP β production AI API endpoints
- Selector = Labels β Service routes to pods by exact label match
- 1 CPU = 1000m β 500m = half core. requests = scheduler min, limits = hard cap
- --previous flag β Logs from last CRASHED container instance
- ImagePullBackOff β Wrong tag, registry URL, or missing pull credentials
- CrashLoopBackOff β App crashes on start β check kubectl logs pod --previous
- OOMKill β Memory limit too low β increase resources.limits.memory
π» Commands & Patterns
az aks create -n myaks -g rg --node-count 3 --attach-acr myacr az aks get-credentials -n myaks -g rg kubectl apply -f deployment.yaml kubectl get pods # Running/Pending/CrashLoop? kubectl describe pod my-pod # WHY did it fail? kubectl logs my-pod # App stdout kubectl logs my-pod --previous # Last crash logs kubectl get pods --show-labels # Check selector match kubectl scale deployment myapp --replicas=3
Monitor and Troubleshoot AKS Workloads
AKS Monitoring Overview
3 minContainer Insights (part of Azure Monitor) provides cluster-level metrics, pod logs, and node utilization for AKS. Combined with kubectl for live debugging, you get full visibility into AI workloads running on Kubernetes.
Container Insights and Log Analytics
7 minEnable Container Insights
# Enable Container Insights on existing cluster
az aks enable-addons \
--addons monitoring \
--name my-aks-cluster \
--resource-group rg \
--workspace-resource-id $LOG_ANALYTICS_ID
# KQL: pod restart count (CrashLoopBackOff signal)
KubePodInventory
| where TimeGenerated > ago(1h)
| where Namespace == "ai-apps"
| summarize restarts=sum(PodRestartCount) by PodUid, Name
| where restarts > 3
| order by restarts desc
# KQL: container CPU usage
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize avg_cpu=avg(CounterValue)
by bin(TimeGenerated, 5m), InstanceName kubectl Debugging Commands
8 minDiagnose Common Failures
# CrashLoopBackOff β app crashes on start
kubectl logs my-pod --previous # last crash logs
kubectl describe pod my-pod # events section = root cause
# Pending β stuck scheduling
kubectl describe pod my-pod # look for "Insufficient CPU"
kubectl get nodes # check node capacity
kubectl top nodes # live resource usage
# ImagePullBackOff β can't pull image
kubectl describe pod my-pod # shows registry/auth error
# Fix: ensure AcrPull role on managed identity
# OOMKilled β out of memory
kubectl describe pod my-pod | grep -A5 "Last State"
# Fix: increase resources.limits.memory in deployment YAML
# Exec into running pod for debugging
kubectl exec -it my-pod -- /bin/bash
kubectl port-forward my-pod 8080:8080 # local testing kubectl describe pod Events section first β it tells you exactly why scheduling failed. --previous flag shows the last crashed container's logs, not the current one. Workload Identity for Azure Services
7 minPods Access Azure Services Securely
# Enable OIDC + Workload Identity on cluster
az aks update --enable-oidc-issuer \
--enable-workload-identity \
--name my-aks --resource-group rg
# Create managed identity for the workload
az identity create --name ai-workload-id --resource-group rg
# Federate: allow pod SA to use the managed identity
az identity federated-credential create \
--name aks-federated \
--identity-name ai-workload-id \
--resource-group rg \
--issuer $OIDC_ISSUER \
--subject system:serviceaccount:ai-apps:ai-sa
# Grant managed identity access to Azure OpenAI
az role assignment create \
--role "Cognitive Services OpenAI User" \
--assignee $MI_CLIENT_ID \
--scope $OPENAI_RESOURCE_ID azure.workload.identity/use: "true". Summary
2 minAKS monitoring: Container Insights β Log Analytics KQL for cluster metrics. kubectl: describe (events/scheduling), logs --previous (crashes), top (live usage), exec (interactive debug). Failure map: CrashLoopBackOff=app crash, Pending=no resources, ImagePullBackOff=registry auth, OOMKill=raise memory limits. Workload Identity: pods access Azure services via managed identity β no secrets in YAML.