[Update] VAC: A Memory Layer That Makes LLMs Remember You

Published: (December 6, 2025 at 11:38 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

What if your LLM could actually remember who you are — across sessions, projects, and time? Existing systems either rely entirely on input context (limited length) or suffer from issues like hallucinations and relevance loss.

The VAC Memory System is a unique Retrieval‑Augmented Generation (RAG) architecture that provides persistent memory for LLMs. LLMs inherently maintain a static statistical “memory” embedded in their parameters. VAC’s objective is to enable dynamic memory retrieval, extracting accurate data without modifying the model.

Key VAC Advantages

  • MCA (Candidate Filtering) – The Multi‑Candidate Assessment addresses the false‑positive problem found in traditional vector databases (e.g., FAISS), achieving entity‑level precision filtering before expensive computations.
  • Physics‑Inspired Ranking – Text documents are conceptualized as “planets” with “mass” and “gravity,” enabling novel retrieval mechanisms.
  • Modular Orchestration – VAC minimizes reliance on LLMs beyond the answer‑generation phase.

Limitations of Current LLMs

  • Cannot retain long‑term context.
  • Cannot remember past conversations.
  • Cannot update their “understanding.”
  • Cannot store evolving user profiles.
  • Operate in a stateless bubble.

Problems with Conventional Retrieval

  • Vector search retrieves semantically similar docs, not logically correct ones.
  • Important memories get buried.
  • Retrieval is non‑deterministic.
  • Noise increases with dataset growth.
  • No notion of priority or recency.

Architecture

The VAC Memory System pipeline consists of eight steps:

  1. MCA‑PreFilter – Filter candidates by entity coverage to reduce computational costs.
  2. Vector Processing with FAISS – Embedding and semantic search through 1024‑dim vectors (BGE‑Large).
  3. BM25 Search – Traditional exact‑matching method.
  4. Cross‑Encoder Reranking – Precision optimization for the top N candidates.
  5. (remaining orchestration steps omitted for brevity)

Example Ranking Code (Python)

def calculate_query_coverage(query_keywords: set, memory_keywords: set) -> float:
    intersection = len(query_keywords & memory_keywords)
    return intersection / len(query_keywords)
def calculate_force(query_mass, memory_mass, distance):
    G = 6.67430e-11   # gravitational constant (placeholder)
    DELTA = 1e-6      # stability term
    force = G * (query_mass * memory_mass) / (distance ** 2 + DELTA)
    return force
def rank_memories(query, memories):
    query_keywords = extract_keywords_simple(query)
    scored_mem = [
        calculate_mass(mem, query_keywords)   # assumes calculate_mass returns a dict with 'force'
        for mem in memories
    ]
    return sorted(scored_mem, key=lambda x: x['force'], reverse=True)

Full Architecture Overview (8 Steps)

Query example: “Where did I meet Alice?”

(A diagram is referenced in the original article; replace with your own visual if needed.)

Evaluation

Benchmark Results

AspectVAC MemoryMem0Letta/MemGPTZep
LoCoMo Accuracy80.1 %66.9 %74.0 %75.1 %
ArchitectureMCA + FAISS + BM25 + Cross‑EncoderLLM extraction + GraphOS‑like paging + Archive searchSummarize + Vector
Entity Protection✅ MCA pre‑filter❌ Semantic only❌ Semantic only❌ Semantic only
Latency2.5 s/query~3‑5 s~2‑4 s~2‑3 s
Cost / 1M tokens<$0.10~$0.50+~$0.30+~$0.20+
Reproducibility100 % (seed‑locked)VariableVariableVariable
Conversation Isolation100 %PartialPartialPartial

Validation Details

  • Runs: 10 conversations × 10 seeds = 100 runs (1,540 total questions).
  • Question Types: Single‑hop (87 %), Multi‑hop (78 %), Temporal (72 %), Commonsense (87 %).
  • Component Recall (ground‑truth coverage):
    • MCA alone: 40‑50 %
    • FAISS alone: 65‑70 %
    • BM25 alone: 50 %
    • Union Recall (MCA + FAISS + BM25): 85‑95 %

Key insight: No single retrieval method is sufficient; the union catches what each individual method misses.

Getting Started

  • GitHub repository: VAC Memory System (replace with actual URL)
  • Simple CLI‑based integration.

Example Run

# Run with a fixed seed for reproducibility
SEED=2001 LOCOMO_CONV_INDEX=0 python orchestrator.py
  • Using the same seed yields identical results across runs.
  • 100 runs have been validated.

Feedback & Discussion

I’d love feedback from anyone building memory systems for LLMs or experimenting with LoCoMo benchmarks.

  • What do you think about combining MCA + BM25 + FAISS?
  • Any ideas for further improvements?

Feel free to open issues or pull requests on the GitHub repo.

Back to Blog

Related posts

Read more »