OpenAI Bill Audit in 45 Minutes: Token Spend Decomposition (Retries, Tool Loops, Context Bloat)
Source: Dev.to
Key Idea
Stop thinking in terms of cost per request. Instead, measure cost per successful task, and break total spend into four buckets:
- Base generation
- Context bloat
- Retries & timeouts
- Tool/agent loops
By identifying which bucket dominates your spend, you know what to fix first. 1
How to Run the Audit
Gather whichever of these you have:
- Option A (best): per‑request logs with model name, tokens, status, timestamp
- Option B: OpenAI usage export + partial app logs
- Option C: Total cost per model/day (estimate)
Even with limited data, you can still discover the biggest cost drivers. 2
Define a Successful Task
Examples of a successful task:
- Grounded answer with no fallback
- No retries/timeouts
- Tool workflow completes without loop
Compute Cost per Successful Task
cost per successful task = total tokens / successful tasks
This gives actionable grounding for the rest of the audit. 3
Break Total Spending into Buckets
| Bucket | Description |
|---|---|
| Base generation tokens | Prompt + normal output |
| Context bloat tokens | System prompt, history, RAG context |
| Retries & timeouts waste | Tokens burned on failed attempts |
| Tool/agent loop waste | Unnecessary repeated calls |
Rank these buckets to see which drives most spend. 4
Sample Analysis (200–500 Requests)
- Compute input token breakdown: system + history + RAG + tool tokens
- Tally output token totals
- Measure retries/timeouts waste
Even rough estimates reveal outsized drivers. 5
Sort Requests by
- Cost per request
- Highest input tokens
- Retry rates
- Tool loop counts
Typical patterns include:
- Context bloat
- Retry storms
- Agent/tool loops
- Model misrouting
- Over‑generation 6
Break Costs Down by Cohort
- Intent category
- Customer tier
- Product surface (chat vs. agent)
- Language
This uncovers specific areas leaking spend. 7
Prioritized Fix Order
- Stop waste – cap retries, add circuit breakers
- Cap context – limit history + RAG context
- Route smart – use cheaper model for low‑risk intents 8
Even these simple changes can cut cost without reducing quality.
Expected Outcomes (After 45 Minutes)
- A spend pie showing the four buckets
- Top cohorts by cost per success
- Top 5 “silent spender” patterns
- A ranked list of 3 practical fixes
- Validation checks & alerts for future regressions 9
Common Pitfalls
- Don’t shorten system prompts blindly – evaluate first
- Don’t cap tokens globally – cap by risk or intent tier
- Don’t switch models without evaluation guards – cost cuts shouldn’t break accuracy 10
Resources
- AI Audit (full pipeline) – measure quality, latency, cost, and safety across your AI system
- LLM & RAG Audit Hub – framework, baselines, and troubleshooting for LLM production reliability
- OptyxStack – services for production AI reliability and optimization
Audit your spend before you optimize – waste often hides where you least expect it.