Your prompt is getting longer without you knowing it (and it's killing your margins)

Published: (May 12, 2026 at 05:39 PM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Problem Overview

I’ve been looking at LLM billing patterns lately, and there’s a silent killer that creeps up on almost every team: prompt inflation.

When you first build an AI feature, your prompt is tight—maybe 500 tokens for the system instructions and 100 for the user query. The math looks great: “This will cost us fractions of a cent per call,” you tell the team.

Causes

  • Conversation history added to make the bot “smarter.”
  • Massive RAG context block added after a single hallucination.
  • Formatting instructions requested by product, expanding the system prompt into a 2,000‑word essay.

These changes can push a baseline request to 8 k tokens.

Impact on Costs

  • User value doesn’t scale linearly with prompt size, but the OpenAI bill does.
  • At scale, you can go from $0.005 per request to $0.05+ per request.
  • Monthly dashboards may only show increased usage, masking the fact that margins are being eroded.
  • Without tracking, you might think “growth is good” until the Stripe payout reveals vanished margins.
  1. Track cost per user and cost per feature, not just total spend.
  2. Identify specific users driving high costs; they are likely accumulating massive context windows that need truncation.
  3. Regularly monitor prompt size; assume it changes over time.

Tool: LLMeter

I ran into this exact issue, which is why I built LLMeter. It’s an open‑source, proxy‑free way to track costs down to the user‑ID level, allowing you to see who is dragging around a 10 k token history.

Stop assuming your prompt is the same size it was on day one. Track it.

0 views
Back to Blog

Related posts

Read more »

Evaluation: Prove it before you ship it

Monitoring vs. Evaluation > “Monitoring tells you what’s happening — evaluation tells you how good it is.” You can build an agent that responds instantly, neve...

Evaluating LLMs for Under a Dollar

Why Evals Matter Training a model is only half the job. Without a systematic way to measure what it can actually do, you are flying blind. Evaluation is easy t...