How I built a Go proxy that keeps your LLM conversation alive when cloud quota runs out
Source: Dev.to
Introduction
What is Trooper
The real problem: context loss on fallback
The solution: three‑layer context compaction
Anchor
The first two turns of the conversation are always preserved. These establish the original intent and set the tone.
SITREP
The middle turns get compressed into a structured summary called a SITREP. It extracts intent, entities, open loops, recent actions, and resolved items. The local model gets situational awareness, not raw history.
Tail
The most recent turns are preserved within a configurable token budget.
Example SITREP log
📦 Context compaction triggered — 538 tokens exceeds 500 budget
📦 Context compaction complete
Total turns : 7
Anchor turns : 2 (~43 tokens)
Middle turns : 2 → SITREP (~71 tokens)
Recent turns : 3 (~323 tokens)
Tokens used : 437 / 500
SITREP : intent="trooper" stage=unclear confidence=0.60 open=1 actions=0 resolved=0
The local model knows what you were working on, what’s broken, what’s been resolved, and what the last few exchanges were. That’s enough to keep the conversation coherent.
Why Go
Provider support
What’s next
Try it
Would love feedback on the context compaction approach — especially from anyone running larger local models. What’s your cold‑start latency on fallback?