The Agent Memory Problem (And How I Solved It Without a Database)
Source: Dev.to
⚠️ Collection Error: Content refinement error: Error: 429 429 Too Many Requests: you (bkperio) have reached your weekly usage limit, upgrade for higher limits: https://ollama.com/upgrade
The Agent Memory Problem (And How I Solved It Without a Database)
Every AI agent dies when its context window ends. That’s the dirty secret behind most “autonomous AI” demos — they look impressive until you close the tab. The moment the conversation ends, everything the agent learned, decided, and built disappears. This post is about how I solved that problem with a simple file-based memory system that’s been running in production for months. A context window is short-term memory. It’s fast, rich, and completely ephemeral. When you restart a session, the agent has no idea: What it decided yesterday What projects are in flight What mistakes it made last week Who it’s working with and what they care about You can dump everything into a system prompt, but that’s expensive (tokens aren’t free) and gets stale fast. You can use a vector database, but that’s operational overhead most projects don’t need. There’s a simpler answer that scales surprisingly well. Three layers, all plain files: MEMORY.md → Long-term curated memory memory/ YYYY-MM-DD.md → Daily raw logs projects/_index.md → Project registry (live state) projects/.md → Per-project living doc agents/_index.md → Sub-agent registry research/.md → Research findings
Each layer has a different write frequency and read pattern:
File Written Read Purpose
MEMORY.md Weekly distillation Every session What the agent “knows” about itself and its world
memory/YYYY-MM-DD.md Every session Today + yesterday Raw event log
projects/_index.md When projects change Every session Source of truth for what’s in flight
Not all memory is equal. Some things need to be current (project status). Others are stable for weeks (personality, context about the user). The system handles this naturally: Daily files are cheap to write and only read when recent The index files are kept tight — just enough to reconstruct state MEMORY.md is distilled manually (or by the agent during heartbeats) — like a human reviewing their journal This means startup cost stays low even as the project grows. At the start of every session, the agent reads: SOUL.md — who it is (stable, rarely changes) USER.md — who it’s working with (updated as you learn more) OPS.md — operational rules (credentials, protocols) Today’s + yesterday’s daily file — recent context MEMORY.md — curated long-term memory projects/_index.md + agents/_index.md — current state Total token cost: maybe 3-5K tokens depending on how much is in there. That’s nothing compared to the value of having full context. Rule 1: One writer. If multiple agents can write to the same files, you get conflicts. Designate one agent (the main session / orchestrator) as the single writer. Sub-agents report to it; it updates files. Rule 2: Daily files are append-only. Never edit yesterday’s file. Add to today’s. This keeps the log reliable and auditable. Rule 3: Index files are always current. projects/_index.md reflects reality right now. When a project ships or stalls, update it immediately — don’t let it drift. Rule 4: Distill, don’t accumulate. Every few days, review the daily files and pull key learnings into MEMORY.md. Delete stale info. Memory should get sharper over time, not fatter. Here’s where it gets interesting. I run sub-agents for specific tasks — research, content generation, code work. Each one is ephemeral. But because they all read the same files at startup, they instantly have full context. The pattern: Main agent spawns sub-agent: → Sub-agent reads OPS.md, _index.md, agents/_index.md → Sub-agent does the task → Sub-agent reports results back → Main agent writes results to memory files
No vector DB. No embeddings. No sync layer. Just files and a clear protocol. Sub-agents can also write to staging areas (e.g., projects/create-mcp-server/sales/draft.md) that the main agent reviews before committing to the index. add Subcommand Pattern
If you’re building this into a scaffolded project, the memory structure works best when it’s part of the scaffold. That’s why @webbywisp/create-ai-agent includes the full SOUL.md / USER.md / OPS.md / memory/ structure by default. You run: npx @webbywisp/create-ai-agent my-agent
And you get an agent that already knows how to remember things. Let’s be honest: Semantic search: You can’t ask “what did I decide about X last month” without reading files manually (or with grep). If you need that, add a vector layer on top. Scale: This works great for one agent or a small team. Hundreds of concurrent writers need something more robust. Real-time: This is session-scoped memory. Not suitable for agents that need to update state mid-conversation across multiple processes. For 90% of agent projects, none of that matters. The memory problem isn’t hard. It just requires intentional design. Files are fast, portable, human-readable, git-trackable, and free. They’re also inspectable — when your agent does something weird, you can read its memory and understand why. Build the memory structure first. The agent gets smarter every session. Want the full scaffold? npx @webbywisp/create-ai-agent my-agent sets up the whole structure — SOUL.md, USER.md, memory directories, OPS template, the works. It’s what I use. Part of the webbywisp series on AI agent architecture that actually works.