[Update] VAC: A Memory Layer That Makes LLMs Remember You
Source: Dev.to
Introduction
What if your LLM could actually remember who you are — across sessions, projects, and time? Existing systems either rely entirely on input context (limited length) or suffer from issues like hallucinations and relevance loss.
The VAC Memory System is a unique Retrieval‑Augmented Generation (RAG) architecture that provides persistent memory for LLMs. LLMs inherently maintain a static statistical “memory” embedded in their parameters. VAC’s objective is to enable dynamic memory retrieval, extracting accurate data without modifying the model.
Key VAC Advantages
- MCA (Candidate Filtering) – The Multi‑Candidate Assessment addresses the false‑positive problem found in traditional vector databases (e.g., FAISS), achieving entity‑level precision filtering before expensive computations.
- Physics‑Inspired Ranking – Text documents are conceptualized as “planets” with “mass” and “gravity,” enabling novel retrieval mechanisms.
- Modular Orchestration – VAC minimizes reliance on LLMs beyond the answer‑generation phase.
Limitations of Current LLMs
- Cannot retain long‑term context.
- Cannot remember past conversations.
- Cannot update their “understanding.”
- Cannot store evolving user profiles.
- Operate in a stateless bubble.
Problems with Conventional Retrieval
- Vector search retrieves semantically similar docs, not logically correct ones.
- Important memories get buried.
- Retrieval is non‑deterministic.
- Noise increases with dataset growth.
- No notion of priority or recency.
Architecture
The VAC Memory System pipeline consists of eight steps:
- MCA‑PreFilter – Filter candidates by entity coverage to reduce computational costs.
- Vector Processing with FAISS – Embedding and semantic search through 1024‑dim vectors (BGE‑Large).
- BM25 Search – Traditional exact‑matching method.
- Cross‑Encoder Reranking – Precision optimization for the top N candidates.
- … (remaining orchestration steps omitted for brevity)
Example Ranking Code (Python)
def calculate_query_coverage(query_keywords: set, memory_keywords: set) -> float:
intersection = len(query_keywords & memory_keywords)
return intersection / len(query_keywords)
def calculate_force(query_mass, memory_mass, distance):
G = 6.67430e-11 # gravitational constant (placeholder)
DELTA = 1e-6 # stability term
force = G * (query_mass * memory_mass) / (distance ** 2 + DELTA)
return force
def rank_memories(query, memories):
query_keywords = extract_keywords_simple(query)
scored_mem = [
calculate_mass(mem, query_keywords) # assumes calculate_mass returns a dict with 'force'
for mem in memories
]
return sorted(scored_mem, key=lambda x: x['force'], reverse=True)
Full Architecture Overview (8 Steps)
Query example: “Where did I meet Alice?”
(A diagram is referenced in the original article; replace with your own visual if needed.)
Evaluation
Benchmark Results
| Aspect | VAC Memory | Mem0 | Letta/MemGPT | Zep |
|---|---|---|---|---|
| LoCoMo Accuracy | 80.1 % | 66.9 % | 74.0 % | 75.1 % |
| Architecture | MCA + FAISS + BM25 + Cross‑Encoder | LLM extraction + Graph | OS‑like paging + Archive search | Summarize + Vector |
| Entity Protection | ✅ MCA pre‑filter | ❌ Semantic only | ❌ Semantic only | ❌ Semantic only |
| Latency | 2.5 s/query | ~3‑5 s | ~2‑4 s | ~2‑3 s |
| Cost / 1M tokens | <$0.10 | ~$0.50+ | ~$0.30+ | ~$0.20+ |
| Reproducibility | 100 % (seed‑locked) | Variable | Variable | Variable |
| Conversation Isolation | 100 % | Partial | Partial | Partial |
Validation Details
- Runs: 10 conversations × 10 seeds = 100 runs (1,540 total questions).
- Question Types: Single‑hop (87 %), Multi‑hop (78 %), Temporal (72 %), Commonsense (87 %).
- Component Recall (ground‑truth coverage):
- MCA alone: 40‑50 %
- FAISS alone: 65‑70 %
- BM25 alone: 50 %
- Union Recall (MCA + FAISS + BM25): 85‑95 %
Key insight: No single retrieval method is sufficient; the union catches what each individual method misses.
Getting Started
- GitHub repository: VAC Memory System (replace with actual URL)
- Simple CLI‑based integration.
Example Run
# Run with a fixed seed for reproducibility
SEED=2001 LOCOMO_CONV_INDEX=0 python orchestrator.py
- Using the same seed yields identical results across runs.
- 100 runs have been validated.
Feedback & Discussion
I’d love feedback from anyone building memory systems for LLMs or experimenting with LoCoMo benchmarks.
- What do you think about combining MCA + BM25 + FAISS?
- Any ideas for further improvements?
Feel free to open issues or pull requests on the GitHub repo.