← Back to Blog
February 24, 2026·14 min read

Tracing Multi-Agent Systems: A Practical Guide

When your research agent hands off to your analysis agent, which spawns a fact-checking agent, and somewhere the answer goes wrong — where do you start looking?

Why Multi-Agent Tracing Is Different

In a single-agent system, you have one execution thread: prompt → reasoning → tool calls → response. The trace is a linear sequence or a shallow tree.

Multi-agent systems are directed graphs. Agent A calls Agent B and Agent C in parallel. Agent B calls Agent D. Agent D's result feeds back into Agent A's next decision. The execution path looks less like a call stack and more like a distributed system — because it is one.

The same problems that made distributed systems tracing hard (correlation, causality, fan-out, async boundaries) apply to multi-agent systems, plus a new one: semantic dependencies. Agent B's output isn't just data flowing to Agent A — it's information that changes Agent A's reasoning in unpredictable ways.

The Trace Structure

A multi-agent trace needs three things: sessions, spans, and parent-child relationships.

Session: "Analyze quarterly report"
├── Span: Coordinator Agent
│   ├── LLM Call: Plan decomposition (GPT-4o, 3.2K tokens)
│   ├── Span: Data Extraction Agent
│   │   ├── LLM Call: Parse document (GPT-4o-mini, 8.1K tokens)
│   │   ├── Tool Call: pdf_extract(q3_report.pdf)
│   │   └── LLM Call: Structure data (GPT-4o-mini, 2.4K tokens)
│   ├── Span: Analysis Agent
│   │   ├── LLM Call: Identify trends (GPT-4o, 6.7K tokens)
│   │   ├── Tool Call: query_database("revenue by quarter")
│   │   ├── LLM Call: Compare to benchmarks (GPT-4o, 4.1K tokens)
│   │   └── Span: Fact-Check Agent
│   │       ├── LLM Call: Verify claims (GPT-4o-mini, 3.3K tokens)
│   │       └── Tool Call: web_search("Q3 2025 industry benchmarks")
│   └── LLM Call: Synthesize final report (GPT-4o, 9.8K tokens)

Total: 5 agents, 7 LLM calls, 3 tool calls, 37.6K tokens, $0.34. Without tracing, you'd see none of this structure.

Implementation: Trace Context Propagation

The critical piece is propagating trace context across agent boundaries. When the coordinator spawns a sub-agent, the sub-agent's trace needs to be linked as a child of the coordinator's span.

Pattern 1: Framework-Level Propagation

If you're using a framework like CrewAI or AutoGen, the framework manages agent handoffs. You instrument at the framework level:

import agentops
from crewai import Agent, Task, Crew

agentops.init()

researcher = Agent(
    role="Research Analyst",
    goal="Find relevant market data",
    tools=[search_tool, database_tool],
)

analyst = Agent(
    role="Financial Analyst",
    goal="Analyze trends and generate insights",
    tools=[calculator_tool, chart_tool],
)

crew = Crew(
    agents=[researcher, analyst],
    tasks=[research_task, analysis_task],
    verbose=True,
)

# Canary automatically captures the full multi-agent trace
result = crew.kickoff()

Pattern 2: Manual Context Passing

If you're building your own orchestration, propagate trace context explicitly:

from agentops import Session, Span

session = Session(tags=["quarterly-analysis"])

async def coordinator(query: str):
    with session.span("coordinator") as coord_span:
        plan = await planner.decompose(query)
        
        tasks = []
        for subtask in plan.subtasks:
            agent = select_agent(subtask.type)
            tasks.append(
                run_sub_agent(agent, subtask, parent_span=coord_span)
            )
        
        results = await asyncio.gather(*tasks)
        return await synthesizer.compile(results)

async def run_sub_agent(agent, task, parent_span):
    with parent_span.child(f"agent:{agent.name}") as agent_span:
        return await agent.execute(task, trace_span=agent_span)

Pattern 3: Cross-Process Tracing

For agents running as separate services (microservice architecture), propagate trace context via headers:

# Coordinator service
import httpx

async def call_research_agent(query: str, trace_context: dict):
    response = await httpx.post(
        "http://research-agent/analyze",
        json={"query": query},
        headers={
            "X-Trace-ID": trace_context["trace_id"],
            "X-Parent-Span-ID": trace_context["span_id"],
        }
    )
    return response.json()

# Research agent service
from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/analyze")
async def analyze(request: Request):
    trace_id = request.headers.get("X-Trace-ID")
    parent_span = request.headers.get("X-Parent-Span-ID")
    
    with agentops.continue_trace(trace_id, parent_span) as span:
        result = await research_agent.run(request.json()["query"])
        return {"result": result}

Debugging Multi-Agent Failures

With tracing in place, debugging becomes systematic:

Failure Pattern 1: Wrong Agent Selection

The coordinator picked the wrong sub-agent for a task. In the trace, you see:

Coordinator → LLM Call: "Route task to appropriate agent"
  → Decision: sent "calculate revenue growth" to TextSummarizer
  → TextSummarizer produced narrative instead of calculations
  → Coordinator used incorrect data in final synthesis

Fix: Inspect the coordinator's routing prompt. Add examples of correct routing. Test with eval set.

Failure Pattern 2: Context Loss at Handoff

Sub-agent didn't receive enough context from the coordinator:

Coordinator → passes "analyze the data" to AnalysisAgent
  → AnalysisAgent has no idea what "the data" refers to
  → Hallucinates analysis of generic data
  → Coordinator incorporates hallucinated analysis into report

Fix: Trace shows the exact prompt sent to the sub-agent. Add the missing context to the handoff.

Failure Pattern 3: Cascading Retries

One slow or failing tool causes a cascade:

ResearchAgent → web_search("Q3 benchmarks") → timeout (30s)
  → retry → timeout (30s)
  → retry → partial results
  → AnalysisAgent waiting on ResearchAgent → stalled
  → Coordinator timeout → retries entire flow
  → Total: 4 minutes, $2.80, still incomplete

Fix: Trace reveals the bottleneck immediately. Add tool timeouts, fallback strategies, and circuit breakers.

Key Metrics for Multi-Agent Systems

Per-agent metrics:

  • Cost contribution (what % of session cost does each agent consume?)
  • Success rate (how often does this agent produce usable output?)
  • Latency distribution (is one agent the bottleneck?)
  • Token efficiency (tokens consumed vs. useful output produced)

System-level metrics:

  • Agent fan-out depth (how many levels of sub-agents?)
  • Inter-agent retry rate (are agents redoing each other's work?)
  • End-to-end latency breakdown (where does time actually go?)
  • Cost per completed task (across all agents involved)
trace = session.get_trace()

for agent_span in trace.agent_spans:
    print(f"{agent_span.agent_name}:")
    print(f"  Cost: ${agent_span.total_cost:.4f}")
    print(f"  Tokens: {agent_span.total_tokens}")
    print(f"  LLM calls: {agent_span.llm_call_count}")
    print(f"  Duration: {agent_span.duration_ms}ms")
    print(f"  Children: {len(agent_span.child_spans)}")

Practical Tips

  • Start with the coordinator. If you can only trace one agent, trace the orchestrator. It shows you the full task decomposition.
  • Log agent-to-agent messages. The data passed between agents is where most bugs hide. Capture it in full.
  • Trace in development too. Multi-agent bugs surface during development. If you only add tracing in production, you'll debug in production.
  • Set up trace-based alerts. "Any session with >5 agent handoffs" or "any session where a sub-agent retried >3 times."
  • Visualize the trace tree. A good trace viewer shows the full agent graph — worth 100x more than grepping logs.

Getting Started

Multi-agent tracing doesn't have to be a research project. Canary supports multi-agent trace capture out of the box — automatic span propagation for CrewAI, AutoGen, and LangGraph, with manual context passing for custom orchestration.

import agentops
agentops.init()

# Multi-agent traces are captured automatically
crew.kickoff()  # Full trace visible in Canary dashboard

Start tracing your multi-agent systems for free →

Your agents are collaborating. Make sure you can see the conversation.