How I cut my multi-turn LLM API costs by 90% (O(N ) O(N))

Published: (May 4, 2026 at 03:20 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

How I cut my multi-turn LLM API costs by 90% (O(N²) → O(N))

If you build multi‑turn AI agents, you know the pain: API costs don’t grow linearly, they grow quadratically.
Every turn in a standard agent loop replays the full conversation history. Token cost on turn N is proportional to N, so total cost across N turns is Θ(N²). I hit a wall where a single heavy day of coding consumed 97 % of my weekly Anthropic quota.

Burnless – an open protocol and orchestration layer

Burnless flips the cost curve from O(N²) to O(N). The result: a real‑world API consumption reduction of ~16×, and benchmarks showing a 90.3 % cost saving against a naïve Claude Opus call.

Burnless – Intent‑compressed intelligence orchestration.
A maestro that orchestrates any LLM from any vendor. Multi‑turn agent loops normally cost O(N²); Burnless makes them O(N).

  • Vendor‑agnostic orchestration for multi‑agent workflows.
  • Choose the model that conducts the orchestra (the Maestro/Brain) – Claude, GPT, Gemini, Mistral, a local Llama, etc. – and the models that execute each task (the Workers).
  • Tiers are quality/cost bands (gold/silver/bronze), not tied to a specific provider.
  • Run encoder and decoder on a local Ollama model for zero marginal cost on cheap stages.

View on GitHub

How the cost curve is flattened

The math relies on two mechanisms working together:

  1. Shared Prefix Cache – The persistent system prompt (often 20 k+ tokens) is cached using Anthropic’s prompt‑caching feature (TTL = 1 h). Switching models from the same provider mid‑session does not invalidate this cache if the prefix is byte‑identical.

  2. Capsule History – Instead of storing raw transcripts, the Maestro model keeps ~80‑character compressed “capsules” of prior turns.

Together, the quadratic history term collapses into a tiny linear one, while the massive system prompt is billed at cache‑read prices (≈10× cheaper than fresh input on Anthropic).

If you want the formal derivation, I published a reproducible benchmark that uses the Anthropic SDK directly and reads response.usage.

Benchmark (Claude 3 Opus, 10‑turn session)

ConfigurationCost
Standalone (no cache)$4.66
Standalone (+ cache)$0.65
Burnless Maestro$0.45 (‑90.3 %)

The same math applies to any provider that exposes prompt caching and charges per input token.

Configuration example

agents:
  gold:    { command: "claude --model claude-sonnet-4-6 -p" }   # The Brain
  silver:  { command: "codex exec --sandbox workspace-write" } # Execution
  bronze:  { command: "ollama run qwen2.5-coder" }               # Local, zero marginal cost

Installation & quick start

pip install burnless
burnless setup

If you’re building agents and burning through tokens, give Burnless a try. I’d love to hear your thoughts on the architecture, especially if you’re working on local encoding/decoding for privacy!

0 views
Back to Blog

Related posts

Read more »

How to Use the Claude API with Python

You Have a Python Script. You Want It to Think. That’s the whole premise. This tutorial shows you how to connect your code to Claude — Anthropic’s AI model — s...