I Tracked Every LLM API Call For a Week — 65% Were Unnecessary

Published: (April 20, 2026 at 08:23 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Tracking LLM API Calls

I’ve been using GPT‑5 and Claude via API for coding tasks—refactoring, code review, architecture questions, debugging. The bill was creeping past $150 / month, and I had no idea which calls were actually worth the money.

Provider dashboards show totals (tokens used, dollars spent) but they don’t tell you which specific calls were unnecessary. Was that $2.80 request for “where is the auth middleware” really worth sending to GPT‑4o?

So I built a tracker to find out.

llm-costlog Library

A small Python library that wraps any LLM API call and records:

  • Prompt + completion tokens
  • Cost in USD (built‑in pricing for 40+ models)
  • Route – did this go to the API, or was it handled locally?
  • Intent – what kind of request was this? (code lookup, architecture question, debugging, etc.)

How to integrate

# llm_cost_tracker.py
from llm_cost_tracker import CostTracker

tracker = CostTracker("./costs.db")

tracker.record(
    prompt_tokens=847,
    completion_tokens=234,
    model="gpt-4o-mini",
    provider="openai",
    intent="code_lookup",
)

Only five lines are needed to start logging every request.

Waste analysis results

After a week of tracking everything:

  • Total cost: $0.2604
  • 65 % of external API calls were for things that didn’t need an LLM at all (symbol lookups, config checks, “where is this function defined”, file searches).
  • In a real‑world scenario with larger contexts (2 K–8 K tokens per request), that 65 % avoidable rate translates to serious money.
    • If you spend $150 / month on LLM APIs and 65 % of calls are avoidable, that’s roughly $100 / month in waste.

Knowing the waste exists is step 1; fixing it automatically is step 2.

promptrouter – Automatic Waste Reduction

A gateway that sits between your code and the LLM API. For every prompt it decides:

DecisionAction
Can be answered locally? (symbol lookups, config checks, file searches)Handled instantly, $0 cost
Needs an LLM? (architecture questions, code review, complex debugging)Sent to the API with a compacted context (only the 3‑5 most relevant files)

Routing logic

  • Keyword classification + phrase detection (non‑ML, 100 % accurate on my test suite of 22 prompt types).
  • BM25 text matching + optional semantic search (sentence‑transformers, all‑MiniLM‑L6‑v2).
  • Blended scoring: 60 % BM25 + 40 % semantic similarity.

AST analysis

  • Full call‑graph and import‑dependency tracing for Python and TypeScript/JavaScript.
  • Regex‑based for TS/JS, ast module for Python.
  • Zero external dependencies for either language.

Git integration

  • Recent commits, blame, diffs as context—so “who changed this and when” doesn’t burn tokens.

Cost tracking

  • SQLite‑backed ledger with real token counts from the provider’s usage block.
  • Prices derived from a built‑in table covering 40+ models.

LLM client

  • Supports OpenAI, Anthropic, Ollama, and any OpenAI‑compatible endpoint over plain HTTP.
  • No SDK dependency.

Both tools are zero‑dependency (stdlib only) for core functionality; embeddings and precise tokenization are optional extras.

Installation

llm-costlog

pip install llm-costlog

GitHub:

promptrouter

pip install promptrouter

GitHub:

Both projects are released under the MIT license. Feedback, issues, and stars are welcome—these are my first open‑source releases, and I’m iterating fast based on user input. A Reddit commenter asked for TypeScript support and a waste‑score trend feature; both shipped within 24 hours.

0 views
Back to Blog

Related posts

Read more »