How to Monitor LLM Agent Costs Before They Bankrupt Your Startup

Why AI Agent Costs Are Uniquely Hard to Predict

Traditional API costs are roughly linear — more requests, more cost, predictable curve. Agents break this model completely. Here's why:

Recursive tool use. An agent deciding to call a tool, which triggers another LLM call, which calls another tool — you're looking at 5-20x the tokens you'd expect from a single prompt. A ReAct loop that should take 3 steps might take 15 if the agent gets confused.

Context window bloat. Every step in an agent's reasoning appends to the context. By step 10, you're sending 50K tokens per call instead of 2K. And you're paying for every one of those input tokens.

Retry storms. Agent hits a parsing error? It retries. Gets a rate limit? Backs off and retries. Gets a bad tool response? Retries with a bigger prompt. Each retry costs money.

Model selection drift. Your agent framework might silently upgrade from GPT-4o-mini to GPT-4o for "complex" queries. That's a 10x cost multiplier you didn't ask for.

The Naive Approach (And Why It Fails)

Most teams start here:

# "We'll just check the dashboard"
# - Every startup, 2 weeks before the incident

import openai

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
)

# Maybe log this somewhere?
print(f"Tokens used: {response.usage.total_tokens}")

This tells you nothing useful. You know tokens were used. You don't know which agent session consumed them, what task the agent was performing, whether this was a normal run or a runaway loop, or how this compares to yesterday's costs for the same task.

Building Real Cost Visibility

Step 1: Instrument Every LLM Call

You need per-call cost attribution, tagged to the agent session, task type, and user:

from agentops import track_cost

@track_cost(session_tags=["support-agent", "tier-1"])
def handle_support_ticket(ticket_id: str):
    # Your agent logic here
    agent.run(f"Resolve support ticket {ticket_id}")

Every LLM call inside handle_support_ticket now gets tagged with the session, the task type, and the cost is calculated automatically based on the model's pricing.

Step 2: Track Cost Per Session, Not Per Call

Individual API calls are noise. What matters is: how much did it cost to complete this task?

from agentops import Session

session = Session(tags=["document-analysis", "customer-acme"])

# Agent does its thing — maybe 12 LLM calls, 3 tool uses
result = agent.analyze_document(doc, session=session)

session.end()
print(f"Session cost: ${session.total_cost:.4f}")
print(f"LLM calls: {session.llm_call_count}")
print(f"Avg cost per call: ${session.avg_cost_per_call:.4f}")

Now you can answer: "Our document analysis agent costs $0.12 per document on average, but the P99 is $0.89." That's actionable.

Step 3: Set Budget Guardrails

Once you know your baselines, set limits:

from agentops import Session, BudgetExceeded

session = Session(
    budget_limit=0.50,  # $0.50 max per session
    on_budget_warning=lambda s: alert_slack(
        f"Agent approaching budget: ${s.total_cost:.2f}"
    ),
    warning_threshold=0.8,  # Alert at 80%
)

try:
    agent.run(task, session=session)
except BudgetExceeded:
    log.warning(f"Session {session.id} killed at ${session.total_cost:.2f}")
    # Graceful fallback — return partial result, escalate to human

This is the difference between "we lost $2,000 on Saturday" and "the agent hit its $0.50 limit and gracefully degraded."

Step 4: Monitor Aggregate Trends

Per-session tracking feeds into dashboards. You want to see:

Daily cost by agent type — is your support agent costing more than last week?
Cost per task completion — are you paying more per resolved ticket?
Cost anomaly detection — a 3x spike at 2 AM means something is looping
Model cost breakdown — how much goes to GPT-4o vs. embeddings vs. tool calls?

The Runaway Agent Problem

The scariest cost scenario isn't high usage — it's a stuck loop. An agent that keeps retrying, or that enters a recursive chain where each step generates more work.

Step 1: Agent calls search tool → 2K tokens
Step 2: Agent processes results → 8K tokens (growing context)
Step 3: Agent calls search again ("not good enough") → 14K tokens
Step 4: Agent processes results → 22K tokens
...
Step 15: Agent sends 120K tokens per call, still not satisfied

This is a $15-$30 session that should have cost $0.10. Without monitoring, you don't catch it until the invoice.

Detection heuristics:

Session duration > 5 minutes for a typically-30-second task
Token count growing super-linearly across steps
Same tool called > 5 times in one session
Total session tokens > 3x the P95 for that task type

Cost Optimization Levers

Once you have visibility, you can actually optimize:

Model routing. Use GPT-4o-mini for classification, GPT-4o for synthesis. Save 80% on the easy calls.
Context pruning. Don't send the full conversation history every step. Summarize older steps.
Early stopping. If the agent hasn't made progress in 3 steps, stop and escalate.
Caching. Same tool query with same inputs? Cache the result.
Prompt compression. Shorter system prompts, fewer examples. Every token in the system prompt multiplies across every call.

Getting Started

If you're running agents without cost monitoring, start today. Not next sprint, not after the refactor — today. The difference between a sustainable AI product and one that bleeds money is visibility.

import agentops
agentops.init(api_key="your-key")

# That's it. Every LLM call is now tracked.

Start monitoring your agent costs for free →

The agents are going to keep running. The question is whether you know what they're spending.