๐Ÿ”
Module 9 of 9 20โ€“25% 2 sub-modules ยท 18 units Domain 4: Secure, Monitor, and Troubleshoot Azure Solutions

Observe and Troubleshoot Apps on Azure

Trace distributed systems by using OpenTelemetry SDKs. Write KQL queries to analyze logs and metrics. Configure Application Insights, set up alerts, and build observability pipelines.

OpenTelemetryKQLAzure MonitorApplication Insights

Last updated: ยท Aligned with Course AI-200T00-A

Module

Instrument AI Applications with OpenTelemetry

units
๐ŸŽฌ Unit 1

Introduction

3 min

Distributed AI pipelines span multiple services โ€” API gateway, embedding service, vector search, LLM orchestrator. When a user reports slow responses, which service is the bottleneck? OpenTelemetry answers this with distributed traces showing the full request journey across all services, linked by a single trace ID. Azure Monitor Application Insights stores, queries, and visualizes this telemetry.

๐Ÿ’ก Exam Tip
Exam pillars: 1) Trace, span, context propagation (W3C traceparent) 2) Azure Monitor Distro setup (one function call) 3) Cloud role names for Application Map 4) OpenTelemetry โ†’ App Insights term mapping 5) KQL queries (requests vs dependencies tables).
๐Ÿ“˜ Unit 2

Observability Concepts

7 min

Distributed Trace: One trace ID links all spans across services โ€” find the bottleneck instantly

TraceId: abc123HTTP GET /api/chat โ€” 420msCosmos DB query โ€” 180msRedis cache set โ€” 90msOpenAI embedding โ€” 120msโ† time (ms) โ†’

1. The Three Pillars of Observability

  1. Distributed Traces โ€” full path of a request through all services with timing. Answers: "Where is the latency?" Primary AI-200 focus.
  2. Metrics โ€” aggregate numbers over time (request rate, error rate, p95 latency). Answers: "Is something trending wrong?"
  3. Logs โ€” timestamped discrete events. Answers: "Why did this specific operation fail?"

Together: Metrics tell you something changed, Traces tell you where, Logs tell you why.

2. Traces and Spans Anatomy

A trace = the complete record of one request across all services. A span = one named, timed unit of work within that trace.

  1. Trace ID โ€” 128-bit hex, shared by ALL spans in one request journey
  2. Span ID โ€” 64-bit hex, unique to this specific operation
  3. Parent Span ID โ€” links this span to its caller (forms a tree/waterfall)
  4. Name โ€” operation name: HTTP GET /api/chat, vector_search, llm_inference
  5. Start/End timestamps โ€” exact duration in milliseconds
  6. Attributes โ€” key-value metadata: HTTP method, model name, token count
  7. Status โ€” OK or ERROR

3. W3C traceparent โ€” How Spans Connect Across Services

When Service A calls Service B via HTTP, it passes the trace context in a header so B's spans link to A's span in the same trace:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^^ version  ^^ 32-char trace ID (shared!)    ^^ 16-char span ID  ^^ flags

Without this header: each service creates a disconnected trace. With it: all services' spans form one unified waterfall.

๐Ÿ’ก Exam Tip
W3C traceparent = the thread that stitches spans across services. Question: "What links spans from different services into a single trace?" Answer: W3C traceparent header propagation.

4. OpenTelemetry โ†’ Application Insights Term Mapping

#OpenTelemetry ConceptApp Insights TermKQL Table
1Trace IDOperation ID (operation_Id)All tables
2Server span (SpanKind.SERVER)Requestrequests table
3Client/Internal spanDependencydependencies table
4span.set_attribute("key", val)customDimensionscustomDimensions column
5Span eventsTracestraces table
โš ๏ธ Common Gotcha
Incoming HTTP requests = requests table. Outgoing calls (OpenAI API, Cosmos DB, Key Vault) = dependencies table. KQL query "find slow OpenAI calls" โ†’ query the dependencies table where name contains "openai".
๐Ÿ“˜ Unit 3

Configure the Azure Monitor OpenTelemetry Distro

8 min

1. Install and Initialize

pip install azure-monitor-opentelemetry
from azure.monitor.opentelemetry import configure_azure_monitor

# One call โ€” sets up traces, metrics, and logs
# Reads APPLICATIONINSIGHTS_CONNECTION_STRING from environment variable
configure_azure_monitor()

This single call initializes trace, metric, and log providers โ€” all exporting to Application Insights. No separate configuration needed for each.

2. Connection String

# Recommended: environment variable
export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=...;IngestionEndpoint=..."

# Code-based (avoid in production)
configure_azure_monitor(connection_string="InstrumentationKey=...")

3. What the Distro Auto-Instruments (No Code Changes)

  1. requests / urllib3 โ€” outgoing HTTP calls to OpenAI, vector DBs โ†’ appear as dependency spans
  2. Flask / Django / FastAPI โ€” incoming HTTP requests โ†’ appear as server spans (requests table)
  3. psycopg2 โ€” PostgreSQL queries โ†’ dependencies table
  4. Azure SDK โ€” Key Vault, Storage, Service Bus calls โ†’ dependencies table
  5. Python logging module โ€” standard log calls flow to traces table automatically
๐Ÿ’ก Exam Tip
For common AI calls (HTTP to OpenAI, psycopg2 to PostgreSQL, Azure SDK to Key Vault), the Distro captures telemetry automatically. Custom spans are only needed for business-logic operations (chunking, ranking, re-ranking).

4. Cloud Role Name โ€” Multi-Service Identification

Without unique cloud role names, all services appear as a single node on the Application Map. You lose visibility into inter-service latency.

from opentelemetry.sdk.resources import Resource

configure_azure_monitor(
    resource=Resource.create({
        "service.name": "embedding-service",
        "service.namespace": "rag-pipeline"
    })
)
# App Map shows node: "rag-pipeline.embedding-service"
# Or via env var (no code change):
export OTEL_SERVICE_NAME=embedding-service
export OTEL_RESOURCE_ATTRIBUTES=service.namespace=rag-pipeline
โš ๏ธ Common Gotcha
Cloud role name = service.name (+ service.namespace). Without unique names per service, the Application Map shows one node for all services โ€” you cannot see which service is slow.
๐Ÿ“˜ Unit 4

Create Custom Spans for AI Operations

10 min

1. Custom Span for Model Inference

from opentelemetry import trace

tracer = trace.get_tracer("rag-pipeline")

def run_inference(prompt: str) -> dict:
    with tracer.start_as_current_span("llm_inference") as span:
        span.set_attribute("model", "gpt-4o")
        span.set_attribute("prompt_tokens", len(prompt.split()))

        result = call_openai_api(prompt)

        span.set_attribute("completion_tokens", result.usage.completion_tokens)
        span.set_attribute("total_tokens", result.usage.total_tokens)
        return result

2. Record Errors in Spans

from opentelemetry.trace import StatusCode

with tracer.start_as_current_span("vector_search") as span:
    try:
        results = vector_db.search(embedding, top_k=5)
        span.set_attribute("results_count", len(results))
    except Exception as e:
        span.set_status(StatusCode.ERROR, str(e))
        span.record_exception(e)   # Adds stack trace to span
        raise

3. Span Kinds and Their Mappings

  1. SpanKind.SERVER โ€” incoming request handler โ†’ requests table in App Insights
  2. SpanKind.CLIENT โ€” outgoing call to external service โ†’ dependencies table
  3. SpanKind.INTERNAL โ€” internal processing (chunking, ranking) โ†’ dependencies table
๐Ÿ“˜ Unit 5

Analyze Telemetry in Application Insights

7 min

1. Application Map

Visualizes the topology of your AI pipeline. Shows each service as a node, with call volume and failure rates on edges. Requires each service to have a unique cloud role name. Immediately reveals which service is the bottleneck or source of errors.

2. KQL Queries for AI Pipeline Analysis

// 1. Find slowest operations in last hour
requests
| where timestamp > ago(1h)
| summarize avg(duration), percentile(duration, 95) by name
| order by percentile_duration_95 desc

// 2. Find slow OpenAI calls
dependencies
| where name contains "openai"
| where duration > 5000    // over 5 seconds
| project timestamp, name, duration, customDimensions

// 3. Query by custom span attribute (model name)
requests
| where customDimensions.model == "gpt-4o"
| summarize count(), avg(duration) by bin(timestamp, 5m)

// 4. Error rate by service (cloud_RoleName)
requests
| where timestamp > ago(1h)
| summarize total=count(), errors=countif(success==false) by cloud_RoleName
| extend error_rate = round(100.0 * errors / total, 2)

โšก OpenTelemetry + App Insights Master Cheatsheet

Setup (1 line)configure_azure_monitor()
Connection string env varAPPLICATIONINSIGHTS_CONNECTION_STRING
Trace โ†’ App InsightsTrace ID = operation_Id column
Server span โ†’ KQL tablerequests table
Client/Internal span โ†’ KQLdependencies table
Custom attribute โ†’ KQLcustomDimensions column
Service identity on App Mapservice.name + service.namespace resource attrs
Propagation headerW3C traceparent
Auto-instrumentedrequests, FastAPI, psycopg2, Azure SDK, logging
Record error in spanspan.set_status(ERROR) + span.record_exception(e)
๐Ÿงช Unit 6

Exercise โ€” Instrument a RAG Pipeline

30 min
  1. Install azure-monitor-opentelemetry and call configure_azure_monitor()
  2. Set unique cloud role names for each service (API, embeddings, vector search)
  3. Create custom spans for embedding generation and LLM inference with token attributes
  4. Add error recording with record_exception()
  5. Send requests and observe the Application Map in Application Insights
  6. Write KQL to find the slowest operation across the pipeline
๐Ÿ Unit 7

Summary

2 min

OpenTelemetry provides vendor-neutral observability: Traces (where), Metrics (what), Logs (why). The Azure Monitor Distro sets everything up in one call. W3C traceparent links spans across services. Set unique service.name per service for Application Map. Create custom spans for AI-specific operations. Query Application Insights with KQL: incoming requests โ†’ requests table, outgoing calls โ†’ dependencies table.

๐Ÿง  Memory Tricks

Three pillars mnemonic โ€” TML: Traces (where), Metrics (what trend), Logs (why it failed)

Table mapping: "My server receives Requests. My client creates Dependencies." SERVER = requests. CLIENT = dependencies.

traceparent: "The GPS coordinate that finds your span in the global trace timeline."

๐Ÿ”ญ
Module Cheatsheet

OpenTelemetry + Application Insights

15โ€“20% PDF

๐Ÿ”‘ Key Facts

  • Three pillars (TML) โ€” Traces (where) | Metrics (what trend) | Logs (why)
  • W3C traceparent โ€” HTTP header linking spans across service boundaries into one trace
  • Server span โ†’ KQL โ€” requests table (incoming HTTP to your service)
  • Client span โ†’ KQL โ€” dependencies table (outgoing: OpenAI, Cosmos, Key Vault)
  • Trace ID โ†’ App Insights โ€” operation_Id column โ€” joins all spans in a request
  • Custom attr โ†’ KQL โ€” span.set_attribute() โ†’ customDimensions column
  • Cloud role name โ€” service.name + service.namespace โ†’ Application Map node
  • configure_azure_monitor() โ€” One call sets up ALL three pillars (traces + metrics + logs)

๐Ÿ’ป Commands & Patterns

pip install azure-monitor-opentelemetry
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry.sdk.resources import Resource
# One call โ€” reads APPLICATIONINSIGHTS_CONNECTION_STRING env var
configure_azure_monitor(resource=Resource.create({
  "service.name": "embedding-service",
  "service.namespace": "rag-pipeline"
}))
# Custom span for AI operation
from opentelemetry import trace
from opentelemetry.trace import StatusCode
tracer = trace.get_tracer("rag-pipeline")
def run_inference(prompt):
  with tracer.start_as_current_span("llm_inference") as span:
    span.set_attribute("model", "gpt-4o")
    span.set_attribute("prompt_tokens", len(prompt.split()))
    try:
      return call_openai(prompt)
    except Exception as e:
      span.set_status(StatusCode.ERROR, str(e))
      span.record_exception(e); raise
Module

Analyze Logs and Set Up Alerts with KQL and Azure Monitor

units
๐ŸŽฌ Unit 1

Introduction to KQL

3 min

Kusto Query Language (KQL) is the query language for Azure Monitor Logs and Application Insights. Use it to query telemetry tables, find errors, calculate latency, and set up metric-based alerts โ€” all critical for AI app observability.

๐Ÿ’ก Exam Tip
KQL exam pillars: 1) Core operators: where, project, summarize, extend, join, render 2) Key tables: requests, dependencies, exceptions, traces, customMetrics 3) Alert rule: signal โ†’ condition โ†’ action group โ†’ notification 4) ago() for time ranges 5) bin() for time-series aggregation.
๐Ÿ“˜ Unit 2

Core KQL Operators

10 min

Essential Query Patterns

// Failed requests in last hour
requests
| where timestamp > ago(1h)
| where success == false
| project timestamp, name, resultCode, duration

// P95 latency by operation (bin = time bucket)
requests
| where timestamp > ago(24h)
| summarize p95=percentile(duration, 95) by
    bin(timestamp, 1h), name
| render timechart

// Join requests with exceptions
requests
| where timestamp > ago(1h) and success == false
| join kind=leftouter exceptions
    on operation_Id
| project timestamp, name, type, outerMessage

// Track OpenAI call latency via dependencies
dependencies
| where timestamp > ago(6h)
| where target contains "openai"
| summarize avg_ms=avg(duration), calls=count()
    by bin(timestamp, 30m)
| render timechart
๐Ÿ“˜ Unit 3

Azure Monitor Alerts

8 min

Alert Rule โ†’ Action Group โ†’ Notification

# Create action group (email notification)
az monitor action-group create \
  --name ai-alerts-ag \
  --resource-group rg \
  --short-name aialerts \
  --email-receiver name=oncall email=team@example.com

# Create log alert โ€” fires when error rate exceeds threshold
az monitor scheduled-query create \
  --name high-error-rate \
  --resource-group rg \
  --scopes $APP_INSIGHTS_ID \
  --condition-query "requests | where success==false
    | summarize count() by bin(timestamp,5m)
    | where count_ > 10" \
  --condition-threshold 0 \
  --condition-operator GreaterThan \
  --evaluation-frequency 5m \
  --window-size 5m \
  --action-groups $ACTION_GROUP_ID
๐Ÿ’ก Exam Tip
Alert flow: Signal (KQL query) โ†’ Condition (threshold) โ†’ Action Group (who to notify) โ†’ Notification (email/webhook/SMS). Action group is reusable across multiple alert rules.
๐Ÿ“˜ Unit 4

Custom Metrics and Dashboards

6 min

Track AI-Specific Metrics

from applicationinsights import TelemetryClient

tc = TelemetryClient("YOUR_INSTRUMENTATION_KEY")

# Track token usage as custom metric
tc.track_metric("openai_tokens_used", total_tokens,
    properties={"model": model, "operation": "embed"})

# Track cache hit/miss ratio
tc.track_metric("semantic_cache_hit", 1 if cache_hit else 0)
tc.flush()

# Query in KQL:
# customMetrics
# | where name == "openai_tokens_used"
# | summarize sum(value) by bin(timestamp, 1h)
๐Ÿ’ก Exam Tip
Custom metrics appear in the customMetrics KQL table. Use them to track AI-specific KPIs: token cost, cache hit rate, embedding latency, RAG relevance scores.
๐Ÿ Unit 5

Summary

2 min

KQL: pipe-based queries on telemetry tables (requests, dependencies, exceptions, traces, customMetrics). Core operators: where โ†’ summarize โ†’ project โ†’ render. Alerts: KQL signal + threshold โ†’ action group โ†’ notification. Custom metrics for AI KPIs (tokens, cache hits). bin() for time-series, ago() for relative time windows, percentile() for latency percentiles.

๐Ÿง 

Quick Quiz

5 questions โ€” test your understanding before moving on

Finished reading this module? Mark it complete to track your progress.

Related Modules โ€” Secure, Monitor, and Troubleshoot Azure Solutions

Frequently Asked Questions

What percentage of the AI-200 exam covers Secure, Monitor, and Troubleshoot Azure Solutions? +

Domain 4 (Secure, Monitor, and Troubleshoot Azure Solutions) accounts for 20โ€“25% of the AI-200 exam. Observe and Troubleshoot Apps on Azure topics like OpenTelemetry and KQL are actively tested. Study all official skill objectives listed in the module header above.

Is Azure Monitor & KQL on the AI-200 exam? +

Yes. Observe and Troubleshoot Apps on Azure is part of Domain 4 in the official AI-200 skill outline, weighted at 20โ€“25%. The key services tested are OpenTelemetry, KQL, Azure Monitor, Application Insights. Review the code examples and exam tips in this module for targeted prep.

How do I practice Azure Monitor & KQL hands-on? +

The best approach is to create a free Azure account and follow the code examples in this module step-by-step. The official Microsoft Learn sandbox for Course AI-200T00-A also provides free lab environments for OpenTelemetry and related services.