Stop Wasting Tokens: How to Cut Your LLM Costs by 97%
Source: Dev.to
The hidden tax in your AI pipeline
If you’re building with GPT or Claude, you’ve probably done this:
- Call an API
- Get a big JSON response
- Send the whole thing to your LLM
Seems harmless, right? It’s not. You’re quietly burning money on data you don’t even use.
Example payload
{
"order": {
"id": 123,
"user": {
"name": "Midhun",
"email": "midhun@email.com"
},
"items": [ /* 100 objects */ ],
"metadata": { /* tons of fields */ }
}
}What the LLM actually needs
{
"name": "Midhun",
"email": "midhun@email.com"
}LLMs charge you for everything you send:
| Payload | Tokens | Approx. Cost (per 1 k calls) |
|---|---|---|
| Full JSON | ~1500 | ~$45 |
| Useful data only | ~60 | ~$1 |
You’re paying ~25× more than necessary, and this happens on every request.
Common workarounds (and their drawbacks)
user = data.get("order", {}).get("user", {})
email = user.get("email")- 10+ fields
- Deeply nested structures
- Multiple APIs
Result: defensive null checks, brittle parsing logic, repeated boilerplate everywhere. It’s not hard, just annoying and error‑prone.
Clean the payload before sending it to the LLM
Extraction step
Define the fields you need using a simple query format:
{
"data": { /* raw payload */ },
"queries": {
"email": ".order.user.email",
"name": ".order.user.name"
}
}Output
{
"email": "midhun@email.com",
"name": "Midhun"
}Cost impact
| Payload | Tokens | Approx. Cost (per 1 k calls) |
|---|---|---|
| Raw JSON | 1500 | ~$45 |
| Cleaned JSON | 60 | ~$1 |
Result: ~97 % reduction in token usage. Multiply that by daily requests at production scale, and you move from optimization to real cost control.
Implementation options
- Use JSONPath libraries – integrate a query engine in your code.
- Build a preprocessing layer – a small service that accepts raw JSON and a query, then returns a minimal payload.
I built a lightweight “JSON query engine as a service” that works like this:
- Input: raw JSON + query
- Output: clean, minimal payload
No setup, no heavy dependencies.
Use cases
- Reduce token usage before sending data to LLMs
- Clean payloads from services like Stripe, Shopify, GitHub
- Extract only relevant fields from large log or analytics datasets
Most developers focus on optimizing prompts and model selection, but the data they send often remains a hidden source of waste. In the AI era, efficiency = profit. Before tweaking prompts, try optimizing your input.
Try it yourself
JSON PowerExtract (available on RapidAPI) – a simple API that extracts the fields you need. A free tier (500 requests/month) lets you test the token savings in your own pipeline today.