Back to Blog
Published:
Last Updated:
Fresh Content

AgentCore Observability: Monitoring AI Agents in Production

9 min read
1,800 words
high priority
M

Muhammad Mudassir

Founder & CEO, Cognilium AI

AWS CloudWatch dashboard displaying AI agent metrics including latency, errors, and distributed traces
Monitor and debug AI agents in production with AWS CloudWatch. Distributed tracing, custom metrics, alerting, and cost tracking for AgentCore deployments.
AI agent monitoringCloudWatch agentsagent tracingLLM observabilityagent debugging productionAWS agent metrics

Your agent gave a wrong answer. A user complained. How do you debug it? Without observability, you're guessing. With proper tracing, you can see exactly which tool failed, which context was missing, and where the reasoning broke down—in under 5 minutes. Here's how to set it up.

What is Agent Observability?

Agent observability is the ability to understand what your AI agent did, why it did it, and how long it took—for every request. It includes logging (what happened), metrics (how often and how fast), and tracing (the path through your system). For agents, this extends to tool calls, memory retrieval, and model reasoning.

Why Agent Observability is Different

Traditional API monitoring tracks request/response. Agent monitoring must track:

Traditional APIAI Agent
Request receivedRequest received
Business logic executedPrompt constructed
Response returnedModel called (latency varies)
Tool calls (0 to N)
Memory retrieved
Guardrails checked
Response returned

An agent request might involve 5+ internal operations, each with its own latency and failure modes.

The Three Pillars: Logs, Metrics, Traces

Logs: What Happened

{
    "timestamp": "2025-01-15T10:23:45Z",
    "level": "INFO",
    "agent": "FinancialAdvisor",
    "session_id": "user-123-session-456",
    "event": "TOOL_CALL",
    "tool": "calculate_budget",
    "input": {"income": 8000, "expenses": 5000},
    "output": {"savings": 3000},
    "latency_ms": 45
}

Metrics: How Often and How Fast

agent_requests_total{agent="FinancialAdvisor"} 1542
agent_latency_p99{agent="FinancialAdvisor"} 3.2s
agent_errors_total{agent="FinancialAdvisor", error_type="guardrail"} 34
token_usage_total{model="claude-3-sonnet"} 2450000

Traces: The Path Through Your System

[Trace ID: abc-123]
├── [Span] Request Received (0ms)
├── [Span] Memory Retrieved (45ms)
│   └── DynamoDB Query
├── [Span] Prompt Constructed (12ms)
├── [Span] Model Invocation (2,340ms)
│   └── Claude 3 Sonnet
├── [Span] Tool: calculate_budget (48ms)
├── [Span] Guardrail Check (23ms)
└── [Span] Response Sent (2,468ms total)

Setting Up CloudWatch Integration

Enable Observability in AgentCore

# config.yaml
observability:
  enabled: true
  log_level: INFO  # DEBUG for development
  
  logs:
    destination: cloudwatch
    log_group: /agentcore/financial-advisor
    retention_days: 30
  
  metrics:
    enabled: true
    namespace: AgentCore/FinancialAdvisor
    dimensions:
      - agent_name
      - session_id
      - tool_name
  
  tracing:
    enabled: true
    service: financial-advisor
    sample_rate: 1.0  # 100% in dev, reduce in prod

Deploy with Observability

agentcore deploy --observability-enabled

Verify Logs Are Flowing

aws logs tail /agentcore/financial-advisor --follow

Distributed Tracing for Multi-Agent Systems

When agents call other agents, you need distributed tracing to see the full picture.

The Problem Without Tracing

User: "Research AI trends and write a blog post"

Log Entry 1: ResearchAgent received request
Log Entry 2: WriterAgent received request
Log Entry 3: Response sent

# ❓ Which request? How are they connected? What was the timing?

The Solution: Trace Context Propagation

from bedrock_agentcore import Agent, trace_context

@agent.handler
def handle_request(request, context):
    # Trace ID propagates automatically
    trace_id = context.trace_id  # abc-123
    
    # When calling another agent, trace continues
    research_result = research_agent.invoke(
        message=request.message,
        trace_context=context  # Propagate trace
    )
    
    # All spans linked under same trace
    return process(research_result)

Viewing Traces in CloudWatch

# Query for a specific trace
aws logs filter-log-events \
    --log-group-name /agentcore/financial-advisor \
    --filter-pattern '{ $.trace_id = "abc-123" }' \
    --query 'events[*].message'

Trace Visualization

Architecture Diagram

Essential Metrics to Track

Request Metrics

# Custom metrics to emit
metrics = [
    # Volume
    {"name": "RequestCount", "unit": "Count"},
    {"name": "ActiveSessions", "unit": "Count"},
    
    # Latency
    {"name": "Latency", "unit": "Milliseconds"},
    {"name": "ModelLatency", "unit": "Milliseconds"},
    {"name": "ToolLatency", "unit": "Milliseconds"},
    
    # Errors
    {"name": "ErrorCount", "unit": "Count"},
    {"name": "GuardrailTriggerCount", "unit": "Count"},
    {"name": "TimeoutCount", "unit": "Count"},
    
    # Tokens
    {"name": "InputTokens", "unit": "Count"},
    {"name": "OutputTokens", "unit": "Count"},
    {"name": "TotalTokens", "unit": "Count"},
]

Publishing Custom Metrics

import boto3
from datetime import datetime

cloudwatch = boto3.client('cloudwatch')

def publish_metric(name, value, unit, dimensions):
    cloudwatch.put_metric_data(
        Namespace='AgentCore/FinancialAdvisor',
        MetricData=[{
            'MetricName': name,
            'Value': value,
            'Unit': unit,
            'Timestamp': datetime.utcnow(),
            'Dimensions': [
                {'Name': k, 'Value': v} for k, v in dimensions.items()
            ]
        }]
    )

# Example: track tool latency
publish_metric(
    name='ToolLatency',
    value=48,
    unit='Milliseconds',
    dimensions={'tool_name': 'calculate_budget', 'agent': 'FinancialAdvisor'}
)

Building a Production Dashboard

Essential Dashboard Widgets

{
    "widgets": [
        {
            "title": "Request Volume",
            "type": "metric",
            "metrics": [["AgentCore/FinancialAdvisor", "RequestCount"]]
        },
        {
            "title": "P99 Latency",
            "type": "metric",
            "metrics": [["AgentCore/FinancialAdvisor", "Latency", {"stat": "p99"}]]
        },
        {
            "title": "Error Rate",
            "type": "metric",
            "metrics": [
                [{"expression": "errors/requests*100", "label": "Error %"}]
            ]
        },
        {
            "title": "Token Usage",
            "type": "metric",
            "metrics": [["AgentCore/FinancialAdvisor", "TotalTokens"]]
        },
        {
            "title": "Guardrail Triggers",
            "type": "metric",
            "metrics": [["AgentCore/FinancialAdvisor", "GuardrailTriggerCount"]]
        },
        {
            "title": "Active Sessions",
            "type": "metric",
            "metrics": [["AgentCore/FinancialAdvisor", "ActiveSessions"]]
        }
    ]
}

Create Dashboard via CLI

aws cloudwatch put-dashboard \
    --dashboard-name AgentCore-Production \
    --dashboard-body file://dashboard.json

Alerting Strategy

Critical Alerts (Page On-Call)

# Error rate > 5%
aws cloudwatch put-metric-alarm \
    --alarm-name "AgentCore-HighErrorRate" \
    --metric-name ErrorCount \
    --namespace AgentCore/FinancialAdvisor \
    --statistic Sum \
    --period 300 \
    --threshold 10 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2 \
    --alarm-actions arn:aws:sns:us-east-1:123456789:pagerduty

# P99 latency > 10 seconds
aws cloudwatch put-metric-alarm \
    --alarm-name "AgentCore-HighLatency" \
    --metric-name Latency \
    --namespace AgentCore/FinancialAdvisor \
    --extended-statistic p99 \
    --period 300 \
    --threshold 10000 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2 \
    --alarm-actions arn:aws:sns:us-east-1:123456789:pagerduty

Warning Alerts (Slack Notification)

# Guardrail trigger rate increasing
aws cloudwatch put-metric-alarm \
    --alarm-name "AgentCore-GuardrailSpike" \
    --metric-name GuardrailTriggerCount \
    --namespace AgentCore/FinancialAdvisor \
    --statistic Sum \
    --period 3600 \
    --threshold 50 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 1 \
    --alarm-actions arn:aws:sns:us-east-1:123456789:slack-alerts

Alert Escalation Matrix

MetricWarningCritical
Error Rate> 2%> 5%
P99 Latency> 5s> 10s
Guardrail Triggers> 50/hour> 100/hour
Token Burn Rate> 2x normal> 5x normal

Cost Tracking and Attribution

Track Cost Per Request

# Pricing (approximate)
COST_PER_1K_INPUT_TOKENS = 0.003  # Claude 3 Sonnet
COST_PER_1K_OUTPUT_TOKENS = 0.015

def calculate_cost(input_tokens, output_tokens):
    input_cost = (input_tokens / 1000) * COST_PER_1K_INPUT_TOKENS
    output_cost = (output_tokens / 1000) * COST_PER_1K_OUTPUT_TOKENS
    return input_cost + output_cost

# Log cost per request
cost = calculate_cost(input_tokens=450, output_tokens=380)
logger.info(f"Request cost: ${cost:.4f}")
publish_metric('RequestCost', cost, 'None', {'agent': 'FinancialAdvisor'})

Cost Attribution by User/Tenant

# Track costs by tenant for billing
publish_metric(
    name='TenantTokens',
    value=total_tokens,
    unit='Count',
    dimensions={
        'tenant_id': request.tenant_id,
        'agent': 'FinancialAdvisor'
    }
)

Monthly Cost Report Query

-- CloudWatch Logs Insights
fields @timestamp, tenant_id, tokens, cost
| filter agent = "FinancialAdvisor"
| stats sum(cost) as total_cost by tenant_id
| sort total_cost desc
| limit 20

Common Debugging Patterns

Pattern 1: "Why did the agent give a wrong answer?"

-- Find the trace for a specific session
fields @timestamp, @message
| filter session_id = "user-123-session-456"
| sort @timestamp asc
| limit 100

Look for:

  • Memory retrieved (was context correct?)
  • Prompt constructed (was information included?)
  • Tool calls (did they return expected results?)

Pattern 2: "Why is latency high?"

-- Find slowest components
fields @timestamp, span_name, duration_ms
| filter trace_id = "abc-123"
| sort duration_ms desc
| limit 10

Common culprits:

  • Model cold start (first request after idle)
  • Large context windows (too many tokens)
  • External tool calls (API latency)

Pattern 3: "Why did guardrail trigger?"

-- Find guardrail events
fields @timestamp, input_text, guardrail_result, reason
| filter event = "GUARDRAIL_TRIGGERED"
| sort @timestamp desc
| limit 50

Production Observability Checklist

Logging

  • All agent events logged (request, response, tools, errors)
  • Session ID included in every log
  • Trace ID propagated across agents
  • Log retention configured (30+ days)
  • Sensitive data redacted

Metrics

  • Request volume tracked
  • Latency percentiles (p50, p95, p99)
  • Error counts by type
  • Token usage tracked
  • Cost per request calculated

Tracing

  • Distributed tracing enabled
  • All spans named meaningfully
  • Tool calls traced
  • Memory operations traced

Alerting

  • Critical alerts page on-call
  • Warning alerts to Slack
  • Escalation matrix documented
  • Alert fatigue reviewed monthly

Dashboards

  • Production dashboard exists
  • Key metrics visible at glance
  • Team has access

Next Steps

  1. Getting Started with AgentCore → Set up observability from the start

  2. Multi-Agent Orchestration → Trace across multiple agents

  3. AgentCore Memory Layer → Debug memory issues

  4. AgentCore vs ADK → Compare observability capabilities


Need help with production agent monitoring?

At Cognilium, we run agents with 99.9% uptime and 5-minute MTTR. Let's discuss your observability needs →

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

Mudassir Marwat is the Founder & CEO of Cognilium AI, where he leads the design and deployment of pr...

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us