How to Monitor LLM Agent Costs Before They Bankrupt Your Startup
LLM cost monitoring isn't optional anymore — it's survival. Here's how to track agent costs in real time and avoid bill shock.
Why AI Agent Costs Are Uniquely Hard to Predict
Traditional API costs are roughly linear — more requests, more cost, predictable curve. Agents break this model completely. Here's why:
Recursive tool use. An agent deciding to call a tool, which triggers another LLM call, which calls another tool — you're looking at 5-20x the tokens you'd expect from a single prompt. A ReAct loop that should take 3 steps might take 15 if the agent gets confused.
Context window bloat. Every step in an agent's reasoning appends to the context. By step 10, you're sending 50K tokens per call instead of 2K. And you're paying for every one of those input tokens.
Retry storms. Agent hits a parsing error? It retries. Gets a rate limit? Backs off and retries. Gets a bad tool response? Retries with a bigger prompt. Each retry costs money.
Model selection drift. Your agent framework might silently upgrade from GPT-4o-mini to GPT-4o for "complex" queries. That's a 10x cost multiplier you didn't ask for.
The Naive Approach (And Why It Fails)
Most teams start here:
# "We'll just check the dashboard"
# - Every startup, 2 weeks before the incident
import openai
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
# Maybe log this somewhere?
print(f"Tokens used: {response.usage.total_tokens}")This tells you nothing useful. You know tokens were used. You don't know which agent session consumed them, what task the agent was performing, whether this was a normal run or a runaway loop, or how this compares to yesterday's costs for the same task.
Building Real Cost Visibility
Step 1: Instrument Every LLM Call
You need per-call cost attribution, tagged to the agent session, task type, and user:
from agentops import track_cost
@track_cost(session_tags=["support-agent", "tier-1"])
def handle_support_ticket(ticket_id: str):
# Your agent logic here
agent.run(f"Resolve support ticket {ticket_id}")Every LLM call inside handle_support_ticket now gets tagged with the session, the task type, and the cost is calculated automatically based on the model's pricing.
Step 2: Track Cost Per Session, Not Per Call
Individual API calls are noise. What matters is: how much did it cost to complete this task?
from agentops import Session
session = Session(tags=["document-analysis", "customer-acme"])
# Agent does its thing — maybe 12 LLM calls, 3 tool uses
result = agent.analyze_document(doc, session=session)
session.end()
print(f"Session cost: ${session.total_cost:.4f}")
print(f"LLM calls: {session.llm_call_count}")
print(f"Avg cost per call: ${session.avg_cost_per_call:.4f}")Now you can answer: "Our document analysis agent costs $0.12 per document on average, but the P99 is $0.89." That's actionable.
Step 3: Set Budget Guardrails
Once you know your baselines, set limits:
from agentops import Session, BudgetExceeded
session = Session(
budget_limit=0.50, # $0.50 max per session
on_budget_warning=lambda s: alert_slack(
f"Agent approaching budget: ${s.total_cost:.2f}"
),
warning_threshold=0.8, # Alert at 80%
)
try:
agent.run(task, session=session)
except BudgetExceeded:
log.warning(f"Session {session.id} killed at ${session.total_cost:.2f}")
# Graceful fallback — return partial result, escalate to humanThis is the difference between "we lost $2,000 on Saturday" and "the agent hit its $0.50 limit and gracefully degraded."
Step 4: Monitor Aggregate Trends
Per-session tracking feeds into dashboards. You want to see:
- Daily cost by agent type — is your support agent costing more than last week?
- Cost per task completion — are you paying more per resolved ticket?
- Cost anomaly detection — a 3x spike at 2 AM means something is looping
- Model cost breakdown — how much goes to GPT-4o vs. embeddings vs. tool calls?
The Runaway Agent Problem
The scariest cost scenario isn't high usage — it's a stuck loop. An agent that keeps retrying, or that enters a recursive chain where each step generates more work.
Step 1: Agent calls search tool → 2K tokens
Step 2: Agent processes results → 8K tokens (growing context)
Step 3: Agent calls search again ("not good enough") → 14K tokens
Step 4: Agent processes results → 22K tokens
...
Step 15: Agent sends 120K tokens per call, still not satisfiedThis is a $15-$30 session that should have cost $0.10. Without monitoring, you don't catch it until the invoice.
Detection heuristics:
- Session duration > 5 minutes for a typically-30-second task
- Token count growing super-linearly across steps
- Same tool called > 5 times in one session
- Total session tokens > 3x the P95 for that task type
Cost Optimization Levers
Once you have visibility, you can actually optimize:
- Model routing. Use GPT-4o-mini for classification, GPT-4o for synthesis. Save 80% on the easy calls.
- Context pruning. Don't send the full conversation history every step. Summarize older steps.
- Early stopping. If the agent hasn't made progress in 3 steps, stop and escalate.
- Caching. Same tool query with same inputs? Cache the result.
- Prompt compression. Shorter system prompts, fewer examples. Every token in the system prompt multiplies across every call.
Getting Started
If you're running agents without cost monitoring, start today. Not next sprint, not after the refactor — today. The difference between a sustainable AI product and one that bleeds money is visibility.
import agentops
agentops.init(api_key="your-key")
# That's it. Every LLM call is now tracked.Start monitoring your agent costs for free →
The agents are going to keep running. The question is whether you know what they're spending.