What percentage of the AI-200 exam covers Develop Containerized Solutions on Azure?

Domain 1 (Develop Containerized Solutions on Azure) accounts for 20–25% of the AI-200 exam. Deploy and Monitor Applications on Azure Kubernetes Service topics like Azure Kubernetes Service and kubectl are actively tested.

Is Azure Kubernetes Service on the AI-200 exam?

Yes. Deploy and Monitor Applications on Azure Kubernetes Service is part of Domain 1 in the official AI-200 skill outline, weighted at 20–25%. The key services tested are Azure Kubernetes Service, kubectl, Helm.

How do I practice Azure Kubernetes Service hands-on?

Create a free Azure account and follow the code examples in this module step-by-step. The official Microsoft Learn sandbox for Course AI-200T00-A also provides free lab environments for Azure Kubernetes Service and related services.

Module 3: Azure Kubernetes Service — AI-200 Study Notes

Module

Deploy Applications to Azure Kubernetes Service

units

🎬 Unit 1

Introduction

5 min

Azure Kubernetes Service (AKS) is a managed Kubernetes cluster — Azure runs the control plane, you manage workloads. Deploy AI inference APIs, vector search services, and background processors as containers that automatically scale, survive crashes, and expose public endpoints via Azure Load Balancer. Four concepts to master: Pods, Deployments, Services, kubectl.

💡 Exam Tip

Exam focus: 1) Deployment manifest fields (replicas, resources, secretKeyRef) 2) Service types (ClusterIP vs LoadBalancer) 3) selector/label matching 4) kubectl troubleshooting commands and failure modes.

📘 Unit 2

Create Kubernetes Deployment Manifests

10 min

AKS Architecture: User → LoadBalancer Service → Pods in Deployment → ACR

1. Deployment Manifest Structure

A Deployment tells Kubernetes: "run N copies of this container image." Kubernetes maintains that exact state — restarting crashed pods, scheduling on healthy nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-inference-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inference-api       # Must match template.labels below
  template:
    metadata:
      labels:
        app: inference-api     # Must match selector above
    spec:
      containers:
      - name: api
        image: myregistry.azurecr.io/inference-api:v1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"       # 1 CPU = 1000 millicores
          limits:
            memory: "4Gi"
            cpu: "2000m"

2. Replicas for High Availability

1 replica — single point of failure. App is down while Kubernetes restarts it after a crash.
2 replicas — basic resilience. One can crash, the other keeps serving. Minimum for production.
3+ replicas — enables rolling updates with zero downtime. One updates while two keep serving traffic.

3. Resource Requests and Limits

Requests = minimum guaranteed resources (scheduler uses this to pick a node). Set too low = pod on overloaded node = poor performance.
Limits = maximum allowed. Pod is killed (OOMKill) if memory limit exceeded. Set too low for model inference = constant restarts.

⚠️ Common Gotcha

OOMKill = Out-of-Memory Kill. Happens when a pod exceeds its memory limit. Fix: raise the memory limit, not restart the pod. The exam presents this as "pod keeps restarting" — answer is always increase memory limit.

4. Injecting Secrets via Kubernetes Secrets

kubectl create secret generic api-secrets --from-literal=api-key=your-secret-key

# Reference in Deployment manifest:
env:
- name: API_KEY
  valueFrom:
    secretKeyRef:
      name: api-secrets   # Secret object name
      key: api-key        # Key within the secret

💡 Exam Tip

Kubernetes Secrets are base64-encoded (NOT encrypted by default). For production on AKS, use Azure Key Vault with the Secrets Store CSI driver. The exam tests this distinction.

📘 Unit 3

Expose Applications with Kubernetes Services

10 min

1. Service Types — When to Use Each

#	Type	Accessible From	Use For
1	ClusterIP (default)	Inside cluster only	Backend microservices: vector DB, embeddings service, worker
2	NodePort	Node IP + high port	Dev/test access without load balancer
3	LoadBalancer	Internet (public IP)	Production AI APIs accessible externally
4	ExternalName	Cluster → external DNS	Represent external services as K8s services

💡 Exam Tip

AI scenario: vector DB / embeddings = ClusterIP (internal). Public inference API = LoadBalancer. Internal = ClusterIP. External = LoadBalancer. This exact pattern appears on the exam.

2. LoadBalancer Service Manifest

apiVersion: v1
kind: Service
metadata:
  name: inference-api-service
spec:
  type: LoadBalancer
  selector:
    app: inference-api    # Routes to pods with this label
  ports:
  - protocol: TCP
    port: 80              # External port (clients connect here)
    targetPort: 8080      # Container port (app listens here)

3. The Selector-Label Contract (Most Common Failure)

Service selector must exactly match pod labels. One typo = Service has no endpoints = silent traffic drop.

kubectl get pods --show-labels
kubectl describe svc inference-api-service | grep Selector

⚠️ Common Gotcha

Selector mismatch is the #1 "why is my service not routing traffic" scenario. Always check selector vs pod labels first.

4. ClusterIP Internal DNS

ClusterIP services are reachable via stable DNS: servicename.namespace.svc.cluster.local. Your API pod calls http://vector-search-svc.default.svc.cluster.local:8080 even as pod IPs change.

📘 Unit 4

Deploy, Verify, and Troubleshoot

10 min

1. Deploy with kubectl apply (Idempotent)

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ./k8s/   # Apply entire directory

apply creates if not exists, updates if exists. Safe to run multiple times.

2. Verify Status

kubectl get deployments                    # READY 2/2 = both replicas healthy
kubectl get pods                           # STATUS Running/Pending/CrashLoopBackOff
kubectl get svc                            # EXTERNAL-IP for LoadBalancer (may show pending)

3. Common Failure Modes and Fixes

ImagePullBackOff — wrong image tag, registry URL, or missing pull credentials. Fix: kubectl describe pod to see exact error.
CrashLoopBackOff — app starts then crashes. Fix: kubectl logs pod-name for stack trace. Do NOT blindly restart.
Pending — can't schedule. Insufficient node resources. Fix: kubectl describe pod → Events section shows reason.
No Endpoints — Service selector doesn't match pod labels. Fix: kubectl describe svc and kubectl get pods --show-labels.

kubectl describe pod my-pod     # Events, conditions, image details
kubectl logs my-pod             # App stdout/stderr
kubectl logs my-pod --previous  # Logs from PREVIOUS crashed container

⚡ AKS Master Cheatsheet

Deploy manifestkubectl apply -f deployment.yaml

Check pod statuskubectl get pods

View logskubectl logs pod-name

Previous crash logskubectl logs pod-name --previous

Describe pod (events)kubectl describe pod pod-name

Check labelskubectl get pods --show-labels

Check service endpointskubectl describe svc svc-name

External IPkubectl get svc → EXTERNAL-IP column

Attach AKS to ACRaz aks create --attach-acr myregistry

Internal service (backend)ClusterIP (default)

External service (prod API)LoadBalancer

1 CPU core1000m (millicores)

🧪 Unit 5

Exercise — Deploy AI Inference API to AKS

30 min

Create AKS cluster with ACR integration: az aks create --attach-acr myregistry
Write Deployment manifest with 2 replicas, resource requests/limits, and secret injection
Write LoadBalancer Service manifest with correct selector
Apply with kubectl apply -f
Verify EXTERNAL-IP and test API endpoint
Intentionally mismatch labels and observe the "no endpoints" failure

✅ Unit 6

Knowledge Check

5 min

Q: Vector search service should only be reachable by pods inside the cluster. Which Service type? A: ClusterIP
Q: Pod shows ImagePullBackOff. Most likely cause? A: Wrong image tag, registry URL, or missing pull credentials
Q: Service has no endpoints. Where to look? A: Compare pod labels (kubectl get pods --show-labels) with Service selector (kubectl describe svc)
Q: A pod keeps restarting with OOMKill. Fix? A: Increase the memory limit in the Deployment manifest
Q: You need 3 replicas for zero-downtime rolling updates. Which field? A: spec.replicas: 3 in the Deployment manifest

🏁 Unit 7

Summary

2 min

AKS manifests define desired state — Kubernetes makes it real. Set replicas: 2+ for production. Match Service selector to pod labels exactly. ClusterIP for internal services, LoadBalancer for external APIs. Diagnose in order: kubectl describe pod → kubectl logs → check selector/labels.

🧠 Memory Tricks

Diagnostic flow: "Describe then Logs" — describe shows WHAT (events), logs shows WHY (app error)

Service type quick rule: Backend microservice = ClusterIP. Public AI API = LoadBalancer.

Failure cheatsheet: ImagePullBackOff=bad image/creds | CrashLoopBackOff=app crash | Pending=no resources | No Endpoints=selector mismatch

☸️

Module Cheatsheet

Azure Kubernetes Service (AKS)

20–25% PDF

🔑 Key Facts

ClusterIP — Internal only — backend services, vector DBs
LoadBalancer — Public IP — production AI API endpoints
Selector = Labels — Service routes to pods by exact label match
1 CPU = 1000m — 500m = half core. requests = scheduler min, limits = hard cap
--previous flag — Logs from last CRASHED container instance
ImagePullBackOff — Wrong tag, registry URL, or missing pull credentials
CrashLoopBackOff — App crashes on start — check kubectl logs pod --previous
OOMKill — Memory limit too low — increase resources.limits.memory

💻 Commands & Patterns

az aks create -n myaks -g rg   --node-count 3 --attach-acr myacr
az aks get-credentials -n myaks -g rg
kubectl apply -f deployment.yaml
kubectl get pods          # Running/Pending/CrashLoop?
kubectl describe pod my-pod     # WHY did it fail?
kubectl logs my-pod             # App stdout
kubectl logs my-pod --previous  # Last crash logs
kubectl get pods --show-labels  # Check selector match
kubectl scale deployment myapp --replicas=3

Module

Monitor and Troubleshoot AKS Workloads

units

Monitor AKS — Microsoft Learn

🎬 Unit 1

AKS Monitoring Overview

3 min

Container Insights (part of Azure Monitor) provides cluster-level metrics, pod logs, and node utilization for AKS. Combined with kubectl for live debugging, you get full visibility into AI workloads running on Kubernetes.

💡 Exam Tip

AKS monitoring exam pillars: 1) Container Insights for cluster metrics + Log Analytics 2) kubectl describe/logs/exec for live debugging 3) CrashLoopBackOff → check logs --previous 4) Pending → describe pod for resource/scheduling issues 5) Workload Identity for Azure service access from pods.

📘 Unit 2

Container Insights and Log Analytics

7 min

Enable Container Insights

# Enable Container Insights on existing cluster
az aks enable-addons \
  --addons monitoring \
  --name my-aks-cluster \
  --resource-group rg \
  --workspace-resource-id $LOG_ANALYTICS_ID

# KQL: pod restart count (CrashLoopBackOff signal)
KubePodInventory
| where TimeGenerated > ago(1h)
| where Namespace == "ai-apps"
| summarize restarts=sum(PodRestartCount) by PodUid, Name
| where restarts > 3
| order by restarts desc

# KQL: container CPU usage
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize avg_cpu=avg(CounterValue)
    by bin(TimeGenerated, 5m), InstanceName

📘 Unit 3

kubectl Debugging Commands

8 min

Diagnose Common Failures

# CrashLoopBackOff — app crashes on start
kubectl logs my-pod --previous        # last crash logs
kubectl describe pod my-pod           # events section = root cause

# Pending — stuck scheduling
kubectl describe pod my-pod           # look for "Insufficient CPU"
kubectl get nodes                     # check node capacity
kubectl top nodes                     # live resource usage

# ImagePullBackOff — can't pull image
kubectl describe pod my-pod           # shows registry/auth error
# Fix: ensure AcrPull role on managed identity

# OOMKilled — out of memory
kubectl describe pod my-pod | grep -A5 "Last State"
# Fix: increase resources.limits.memory in deployment YAML

# Exec into running pod for debugging
kubectl exec -it my-pod -- /bin/bash
kubectl port-forward my-pod 8080:8080  # local testing

⚠️ Common Gotcha

Always check kubectl describe pod Events section first — it tells you exactly why scheduling failed. --previous flag shows the last crashed container's logs, not the current one.

📘 Unit 4

Workload Identity for Azure Services

7 min

Pods Access Azure Services Securely

# Enable OIDC + Workload Identity on cluster
az aks update --enable-oidc-issuer \
  --enable-workload-identity \
  --name my-aks --resource-group rg

# Create managed identity for the workload
az identity create --name ai-workload-id --resource-group rg

# Federate: allow pod SA to use the managed identity
az identity federated-credential create \
  --name aks-federated \
  --identity-name ai-workload-id \
  --resource-group rg \
  --issuer $OIDC_ISSUER \
  --subject system:serviceaccount:ai-apps:ai-sa

# Grant managed identity access to Azure OpenAI
az role assignment create \
  --role "Cognitive Services OpenAI User" \
  --assignee $MI_CLIENT_ID \
  --scope $OPENAI_RESOURCE_ID

💡 Exam Tip

Workload Identity = pods authenticate to Azure (Cosmos, OpenAI, KV) using managed identity — no secrets in YAML or environment variables. Label pod spec with azure.workload.identity/use: "true".

🏁 Unit 5

Summary

2 min

AKS monitoring: Container Insights → Log Analytics KQL for cluster metrics. kubectl: describe (events/scheduling), logs --previous (crashes), top (live usage), exec (interactive debug). Failure map: CrashLoopBackOff=app crash, Pending=no resources, ImagePullBackOff=registry auth, OOMKill=raise memory limits. Workload Identity: pods access Azure services via managed identity — no secrets in YAML.

Deploy and Monitor Applications on Azure Kubernetes Service

Deploy Applications to Azure Kubernetes Service

Introduction

Create Kubernetes Deployment Manifests

AKS Architecture: User → LoadBalancer Service → Pods in Deployment → ACR

Expose Applications with Kubernetes Services

Deploy, Verify, and Troubleshoot

⚡ AKS Master Cheatsheet

Exercise — Deploy AI Inference API to AKS

Knowledge Check

Summary

Azure Kubernetes Service (AKS)

Monitor and Troubleshoot AKS Workloads

AKS Monitoring Overview

Container Insights and Log Analytics

kubectl Debugging Commands

Workload Identity for Azure Services

Summary

Quick Quiz

Related Modules — Develop Containerized Solutions on Azure

Frequently Asked Questions