Your prompt is getting longer without you knowing it (and it's killing your margins)
Source: Dev.to
Problem Overview
I’ve been looking at LLM billing patterns lately, and there’s a silent killer that creeps up on almost every team: prompt inflation.
When you first build an AI feature, your prompt is tight—maybe 500 tokens for the system instructions and 100 for the user query. The math looks great: “This will cost us fractions of a cent per call,” you tell the team.
Causes
- Conversation history added to make the bot “smarter.”
- Massive RAG context block added after a single hallucination.
- Formatting instructions requested by product, expanding the system prompt into a 2,000‑word essay.
These changes can push a baseline request to 8 k tokens.
Impact on Costs
- User value doesn’t scale linearly with prompt size, but the OpenAI bill does.
- At scale, you can go from $0.005 per request to $0.05+ per request.
- Monthly dashboards may only show increased usage, masking the fact that margins are being eroded.
- Without tracking, you might think “growth is good” until the Stripe payout reveals vanished margins.
Recommended Actions
- Track cost per user and cost per feature, not just total spend.
- Identify specific users driving high costs; they are likely accumulating massive context windows that need truncation.
- Regularly monitor prompt size; assume it changes over time.
Tool: LLMeter
I ran into this exact issue, which is why I built LLMeter. It’s an open‑source, proxy‑free way to track costs down to the user‑ID level, allowing you to see who is dragging around a 10 k token history.
Stop assuming your prompt is the same size it was on day one. Track it.