[Update] VAC: A Memory Layer That Makes LLMs Remember You

Published: 1 week ago (December 6, 2025 at 11:38 PM EST)

3 min read

Source: Dev.to

Introduction

What if your LLM could actually remember who you are — across sessions, projects, and time? Existing systems either rely entirely on input context (limited length) or suffer from issues like hallucinations and relevance loss.

The VAC Memory System is a unique Retrieval‑Augmented Generation (RAG) architecture that provides persistent memory for LLMs. LLMs inherently maintain a static statistical “memory” embedded in their parameters. VAC’s objective is to enable dynamic memory retrieval, extracting accurate data without modifying the model.

Key VAC Advantages

MCA (Candidate Filtering) – The Multi‑Candidate Assessment addresses the false‑positive problem found in traditional vector databases (e.g., FAISS), achieving entity‑level precision filtering before expensive computations.
Physics‑Inspired Ranking – Text documents are conceptualized as “planets” with “mass” and “gravity,” enabling novel retrieval mechanisms.
Modular Orchestration – VAC minimizes reliance on LLMs beyond the answer‑generation phase.

Limitations of Current LLMs

Cannot retain long‑term context.
Cannot remember past conversations.
Cannot update their “understanding.”
Cannot store evolving user profiles.
Operate in a stateless bubble.

Problems with Conventional Retrieval

Vector search retrieves semantically similar docs, not logically correct ones.
Important memories get buried.
Retrieval is non‑deterministic.
Noise increases with dataset growth.
No notion of priority or recency.

Architecture

The VAC Memory System pipeline consists of eight steps:

MCA‑PreFilter – Filter candidates by entity coverage to reduce computational costs.
Vector Processing with FAISS – Embedding and semantic search through 1024‑dim vectors (BGE‑Large).
BM25 Search – Traditional exact‑matching method.
Cross‑Encoder Reranking – Precision optimization for the top N candidates.
… (remaining orchestration steps omitted for brevity)

Example Ranking Code (Python)

def calculate_query_coverage(query_keywords: set, memory_keywords: set) -> float:
    intersection = len(query_keywords & memory_keywords)
    return intersection / len(query_keywords)

def calculate_force(query_mass, memory_mass, distance):
    G = 6.67430e-11   # gravitational constant (placeholder)
    DELTA = 1e-6      # stability term
    force = G * (query_mass * memory_mass) / (distance ** 2 + DELTA)
    return force

def rank_memories(query, memories):
    query_keywords = extract_keywords_simple(query)
    scored_mem = [
        calculate_mass(mem, query_keywords)   # assumes calculate_mass returns a dict with 'force'
        for mem in memories
    ]
    return sorted(scored_mem, key=lambda x: x['force'], reverse=True)

Full Architecture Overview (8 Steps)

Query example: “Where did I meet Alice?”

(A diagram is referenced in the original article; replace with your own visual if needed.)

Evaluation

Benchmark Results

Aspect	VAC Memory	Mem0	Letta/MemGPT	Zep
LoCoMo Accuracy	80.1 %	66.9 %	74.0 %	75.1 %
Architecture	MCA + FAISS + BM25 + Cross‑Encoder	LLM extraction + Graph	OS‑like paging + Archive search	Summarize + Vector
Entity Protection	✅ MCA pre‑filter	❌ Semantic only	❌ Semantic only	❌ Semantic only
Latency	2.5 s/query	~3‑5 s	~2‑4 s	~2‑3 s
Cost / 1M tokens	<$0.10	~$0.50+	~$0.30+	~$0.20+
Reproducibility	100 % (seed‑locked)	Variable	Variable	Variable
Conversation Isolation	100 %	Partial	Partial	Partial

Validation Details

Runs: 10 conversations × 10 seeds = 100 runs (1,540 total questions).
Question Types: Single‑hop (87 %), Multi‑hop (78 %), Temporal (72 %), Commonsense (87 %).
Component Recall (ground‑truth coverage):
- MCA alone: 40‑50 %
- FAISS alone: 65‑70 %
- BM25 alone: 50 %
- Union Recall (MCA + FAISS + BM25): 85‑95 %

Key insight: No single retrieval method is sufficient; the union catches what each individual method misses.

Getting Started

GitHub repository: VAC Memory System (replace with actual URL)
Simple CLI‑based integration.

Example Run

# Run with a fixed seed for reproducibility
SEED=2001 LOCOMO_CONV_INDEX=0 python orchestrator.py

Using the same seed yields identical results across runs.
100 runs have been validated.

Feedback & Discussion

I’d love feedback from anyone building memory systems for LLMs or experimenting with LoCoMo benchmarks.

What do you think about combining MCA + BM25 + FAISS?
Any ideas for further improvements?

Feel free to open issues or pull requests on the GitHub repo.