Your prompt is getting longer without you knowing it (and it's killing your margins)

Published: 0 month ago (May 12, 2026 at 05:39 PM EDT)

2 min read

Source: Dev.to

Problem Overview

I’ve been looking at LLM billing patterns lately, and there’s a silent killer that creeps up on almost every team: prompt inflation.

When you first build an AI feature, your prompt is tight—maybe 500 tokens for the system instructions and 100 for the user query. The math looks great: “This will cost us fractions of a cent per call,” you tell the team.

Causes

Conversation history added to make the bot “smarter.”
Massive RAG context block added after a single hallucination.
Formatting instructions requested by product, expanding the system prompt into a 2,000‑word essay.

These changes can push a baseline request to 8 k tokens.

Impact on Costs

User value doesn’t scale linearly with prompt size, but the OpenAI bill does.
At scale, you can go from $0.005 per request to $0.05+ per request.
Monthly dashboards may only show increased usage, masking the fact that margins are being eroded.
Without tracking, you might think “growth is good” until the Stripe payout reveals vanished margins.

Recommended Actions

Track cost per user and cost per feature, not just total spend.
Identify specific users driving high costs; they are likely accumulating massive context windows that need truncation.
Regularly monitor prompt size; assume it changes over time.

Tool: LLMeter

I ran into this exact issue, which is why I built LLMeter. It’s an open‑source, proxy‑free way to track costs down to the user‑ID level, allowing you to see who is dragging around a 10 k token history.

Stop assuming your prompt is the same size it was on day one. Track it.

Your prompt is getting longer without you knowing it (and it's killing your margins)

Problem Overview

Causes

Impact on Costs

Recommended Actions

Tool: LLMeter

Related posts

Evaluation: Prove it before you ship it

Evaluating LLMs for Under a Dollar

Prompt Engineering: How to Get Better Results From AI (Without Writing More Prompts)

RLHF trained Claude to be verbose. Here's the proof