How I built a Go proxy that keeps your LLM conversation alive when cloud quota runs out

Published: 2 days ago (May 2, 2026 at 09:23 PM EDT)

2 min read

Source: Dev.to

Source: Dev.to

Introduction

What is Trooper

The real problem: context loss on fallback

The solution: three‑layer context compaction

Anchor

The first two turns of the conversation are always preserved. These establish the original intent and set the tone.

SITREP

The middle turns get compressed into a structured summary called a SITREP. It extracts intent, entities, open loops, recent actions, and resolved items. The local model gets situational awareness, not raw history.

Tail

The most recent turns are preserved within a configurable token budget.

Example SITREP log

📦  Context compaction triggered — 538 tokens exceeds 500 budget
📦  Context compaction complete
    Total turns    : 7
    Anchor turns   : 2 (~43 tokens)
    Middle turns   : 2 → SITREP (~71 tokens)
    Recent turns   : 3 (~323 tokens)
    Tokens used    : 437 / 500
    SITREP         : intent="trooper" stage=unclear confidence=0.60 open=1 actions=0 resolved=0

The local model knows what you were working on, what’s broken, what’s been resolved, and what the last few exchanges were. That’s enough to keep the conversation coherent.

Why Go

Provider support

What’s next

Try it

github.com/shouvik12/trooper

Would love feedback on the context compaction approach — especially from anyone running larger local models. What’s your cold‑start latency on fallback?