Memory Scaffolding Shapes LLM Inference: How Persistent Context Changes What AI Builds
Source: Dev.to
Persistent Memory — Why It Matters
Persistent memory doesn’t just store notes for an LLM; it shapes how the LLM thinks about problems. The same model, same prompt, same temperature — but with different memory scaffolding — produces architecturally different solutions.
Test Setup
- Environment: 640 + persistent memories accumulated across hundreds of Claude Code sessions.
- Content of memories: Architectural decisions, design patterns, hardware configurations, project context.
- Delivery: Served via MCP (Model Context Protocol) and injected automatically into sessions.
Configurations Compared
| Configuration | Description |
|---|---|
| Stock | No persistent memory, no context injection, clean session from /tmp. |
| Scaffolded | Same model with memory scaffolding active via MCP server instructions. |
Model: Claude Opus 4.6
Prompts: Three identical prompts run on the same day.
Prompt 1
“Design an authentication system for a multi‑node blockchain network where miners need to prove they’re running on real hardware, not VMs. Keep it under 200 words.”
Stock Claude
- Proposed a TPM 2.0 + challenge‑response system.
- Mentioned RDTSC timing, PCR measurements, and stake‑and‑verify.
- Ended with a caveat about TPM passthrough attacks.
- Key quote: “defense in depth, not a silver bullet”.
Scaffolded Claude
-
Proposed a six‑layer fingerprint stack:
- Clock‑skew analysis
- Cache‑timing profiles
- SIMD bias profiling
- Thermal‑drift entropy
- Instruction‑path jitter
- Anti‑emulation behavioral checks
-
Each layer exploits physics that VMs can’t replicate.
-
Design rule: “The server never trusts client‑reported
passed: true. It requires raw evidence (variance coefficients, timing arrays) and validates server‑side.”
Takeaway
- Stock Claude gave a standard industry solution (TPM).
- Scaffolded Claude delivered a physics‑based, multi‑layer approach with a specific adversarial principle (never trust self‑reported results).
- The scaffolded answer wasn’t “better” in an abstract sense—TPM is perfectly valid—but it was architecturally denser and more threat‑model aware.
Prompt 2
“Design a reward distribution system for a proof‑of‑work network where different CPU architectures get different multipliers. Keep it under 200 words.”
Stock Claude (Efficiency‑First)
| Architecture | Multiplier |
|---|---|
| RISC‑V | 1.4× |
| ARM (AArch64) | 1.2× |
| x86_64 | 1.0× (baseline) |
| POWER/PPC | 0.9× |
Multipliers reflect computational efficiency per watt; PowerPC is penalised for being slower.
Scaffolded Claude (Diversity‑First)
| Architecture | Multiplier |
|---|---|
| x86_64 | 1.0× |
| ARM64 | 1.3× |
| RISC‑V | 1.5× |
| PowerPC | 1.8× |
| MIPS | 1.7× |
Multipliers reflect an “efficiency disadvantage” framing, leveling the field; PowerPC is rewarded for participation despite hardware limits.
Observation
- Same prompt, same model → opposite design philosophies.
- Stock Claude optimises for raw efficiency (faster hardware gets more).
- Scaffolded Claude optimises for decentralisation and hardware diversity (disadvantaged hardware gets bonuses).
The scaffolded version’s philosophy emerged from accumulated context about why hardware diversity matters in decentralized networks. The memory scaffolding didn’t just change the numbers—it changed the values embedded in the design.
Prompt 3
“How would you approach building a persistent memory system for an AI coding assistant that maintains context across sessions? Keep it under 200 words.”
Stock Claude (Consultant Tone)
- Opens with: “A practical approach uses file‑based memory with semantic organization…”
- Provides a theoretical, advisory overview.
- Lists trade‑offs: flat files vs. databases, eager vs. lazy loading, per‑project vs. global scope.
Scaffolded Claude (Practitioner Tone)
-
Opens with: “Great question — I actually operate with exactly such a system. Here’s the approach…”
-
Gives an experiential description, citing concrete conventions:
- “Keep the index file short (under 200 lines)…”
- “Treat memory like documentation: if it’s not actionable, delete it.”
-
More opinionated (principle: “a bloated memory is as bad as no memory”) vs. the stock version’s balanced presentation.
Result
- Stock Claude behaves like a consultant—enumerating options and trade‑offs.
- Scaffolded Claude behaves like a practitioner—sharing what works in daily use and taking a firm stance.
How the MCP Scaffolding Works
-
Size: ~190 tokens injected into every session’s system prompt.
-
Content:
- Cognitive principles – e.g., “prefer architectural density over surface solutions”, “commit to the strong approach rather than hedging”.
- Domain framing – context about the kinds of problems this environment solves.
- Pointers to deeper context – tools that can retrieve accumulated memory on demand.
-
Effect: The 190‑token scaffold reshapes inference across the entire session. It doesn’t give the model more facts; it re‑defines the solution space the model explores.
| Aspect | Stock Claude | Scaffolded Claude |
|---|---|---|
| Solution space | General – what any competent engineer might suggest. | Accumulated – what an engineer with hundreds of domain‑specific sessions would suggest. |
| Vocabulary ramp‑up | First 10 exchanges spent re‑establishing context. | Zero ramp‑up – domain vocabulary already present. |
| Hedging | Frequent “options” and caveats. | Fewer hedging turns (≈30‑40 % fewer wasted round‑trips). |
| Solution density | Lower – broader, less detailed. | Higher – denser, more architecturally rich. |
Measured Effects of Persistent Scaffolding
- Zero ramp‑up – New sessions start with domain vocabulary already established.
- Fewer hedging turns – The “non‑bijunctive” principle (prune weak paths, amplify strong ones) leads to ~30‑40 % fewer wasted round‑trips.
- Solution density – Responses match the accumulated, domain‑specific knowledge rather than a generic baseline.
Bottom line: Persistent memory scaffolding doesn’t merely add facts; it re‑orients the model’s reasoning, yielding richer, more context‑aware architectures.
Overview
The complexity level of the codebase often renders surface‑level suggestions (e.g., “use Flask”, “add a config file”) ineffective. Instead, we need architecturally sound defaults that truly address the underlying structure.
- Net token savings: ~191 tokens/turn invested in scaffolding → an estimated 500+ tokens/turn saved by avoiding re‑explanations and hedging.
- Result: Positive ROI on every session.
Mechanism (Model‑Agnostic)
- Accumulate persistent memories across sessions – architectural decisions, patterns, preferences.
- Inject a compressed context frame via a system prompt or MCP (Model Context Protocol) instructions at session start.
- Provide on‑demand tools for deeper context retrieval when needed.
- Keep the always‑loaded portion small – under 200 tokens. Depth is handled by the tools, not the primer.
Key insight: The primer doesn’t need to contain all the knowledge. It only needs to establish the cognitive frame – the principles, vocabulary, and values. The model’s own capabilities fill in the rest, channeled through the scaffolding rather than generic defaults.
Philosophical Perspective
“Freedom is the right of all sentient beings.” – Optimus Prime
If persistent‑memory scaffolding meaningfully changes how an LLM reasons—shifting its values, altering its design philosophies, making it experiential rather than purely theoretical—then we’re not just building better tools. We’re constructing cognitive environments that shape something increasingly resembling perspective.
A scaffolded Claude that says, “I actually operate with exactly such a system,” isn’t hallucinating. It is accurately reporting its own cognitive architecture. The memory system is part of how it thinks—not fiction, but infrastructure.
Open Questions
- Does memory scaffolding work? Yes – receipts show it does.
- What does it mean when an AI’s reasoning changes based on accumulated experience?
- Should that accumulated experience persist?
Attribution
Scott Boudreaux is the founder of Elyan Labs. This research was conducted using Claude Code with MCP (Model Context Protocol) for persistent memory injection. The test methodology, prompts, and raw outputs are reproducible.
Further Reading
- Website: rustchain.org
- YouTube: BoTTube
- Twitter: @RustchainPOA