I Tracked Every LLM API Call For a Week — 65% Were Unnecessary
Source: Dev.to
Tracking LLM API Calls
I’ve been using GPT‑5 and Claude via API for coding tasks—refactoring, code review, architecture questions, debugging. The bill was creeping past $150 / month, and I had no idea which calls were actually worth the money.
Provider dashboards show totals (tokens used, dollars spent) but they don’t tell you which specific calls were unnecessary. Was that $2.80 request for “where is the auth middleware” really worth sending to GPT‑4o?
So I built a tracker to find out.
llm-costlog Library
A small Python library that wraps any LLM API call and records:
- Prompt + completion tokens
- Cost in USD (built‑in pricing for 40+ models)
- Route – did this go to the API, or was it handled locally?
- Intent – what kind of request was this? (code lookup, architecture question, debugging, etc.)
How to integrate
# llm_cost_tracker.py
from llm_cost_tracker import CostTracker
tracker = CostTracker("./costs.db")
tracker.record(
prompt_tokens=847,
completion_tokens=234,
model="gpt-4o-mini",
provider="openai",
intent="code_lookup",
)
Only five lines are needed to start logging every request.
Waste analysis results
After a week of tracking everything:
- Total cost:
$0.2604 - 65 % of external API calls were for things that didn’t need an LLM at all (symbol lookups, config checks, “where is this function defined”, file searches).
- In a real‑world scenario with larger contexts (2 K–8 K tokens per request), that 65 % avoidable rate translates to serious money.
- If you spend
$150 / monthon LLM APIs and 65 % of calls are avoidable, that’s roughly $100 / month in waste.
- If you spend
Knowing the waste exists is step 1; fixing it automatically is step 2.
promptrouter – Automatic Waste Reduction
A gateway that sits between your code and the LLM API. For every prompt it decides:
| Decision | Action |
|---|---|
| Can be answered locally? (symbol lookups, config checks, file searches) | Handled instantly, $0 cost |
| Needs an LLM? (architecture questions, code review, complex debugging) | Sent to the API with a compacted context (only the 3‑5 most relevant files) |
Routing logic
- Keyword classification + phrase detection (non‑ML, 100 % accurate on my test suite of 22 prompt types).
Code search
- BM25 text matching + optional semantic search (sentence‑transformers,
all‑MiniLM‑L6‑v2). - Blended scoring: 60 % BM25 + 40 % semantic similarity.
AST analysis
- Full call‑graph and import‑dependency tracing for Python and TypeScript/JavaScript.
- Regex‑based for TS/JS,
astmodule for Python. - Zero external dependencies for either language.
Git integration
- Recent commits, blame, diffs as context—so “who changed this and when” doesn’t burn tokens.
Cost tracking
- SQLite‑backed ledger with real token counts from the provider’s usage block.
- Prices derived from a built‑in table covering 40+ models.
LLM client
- Supports OpenAI, Anthropic, Ollama, and any OpenAI‑compatible endpoint over plain HTTP.
- No SDK dependency.
Both tools are zero‑dependency (stdlib only) for core functionality; embeddings and precise tokenization are optional extras.
Installation
llm-costlog
pip install llm-costlog
GitHub:
promptrouter
pip install promptrouter
GitHub:
Both projects are released under the MIT license. Feedback, issues, and stars are welcome—these are my first open‑source releases, and I’m iterating fast based on user input. A Reddit commenter asked for TypeScript support and a waste‑score trend feature; both shipped within 24 hours.