How to Build Long-Term Memory for LLMs (RAG + FAISS Tutorial)

Published: 9 hours ago (February 1, 2026 at 03:43 AM EST)

5 min read

Source: Dev.to

Cover image for How to Build Long-Term Memory for LLMs (RAG + FAISS Tutorial)

Build a memory system that lets LLMs remember user preferences across conversations using Python, LangChain, FAISS, and SQLite.

tags: ai, python, langchain, machinelearning

The Problem: Context Window vs. Long‑Term Memory

LLMs have a context window — a limited amount of text they can process at once. You can stuff user history into this window, but it quickly becomes expensive and eventually runs out of space. Moreover, it’s inefficient to re‑read the entire conversation history just to recall a user’s name or favorite programming language.

We need a system that acts like a human brain:

Short‑term memory – the current conversation.
Long‑term memory – important facts stored away and retrieved only when relevant.

The Solution: RAG + Semantic Search

We’ll build a specialized Retrieval‑Augmented Generation (RAG) pipeline. Instead of retrieving generic documents, we retrieve personal memories about the user.

Key Components

Component	Purpose
Memory Extractor	An LLM agent that “listens” to the chat and identifies facts worth saving.
Vector Store (FAISS)	Stores the meaning (embedding) of each memory for fuzzy search.
SQL Database	Stores structured data (content, timestamp, category) for reliability.
Retrieval System	Fetches relevant memories based on the current user query.

Step 1: Defining a Memory

A memory isn’t just raw text; it also carries metadata.

from dataclasses import dataclass
from typing import List, Dict, Optional

@dataclass
class Memory:
    id: str
    content: str
    category: str          # e.g., 'tools', 'personal', 'work'
    importance: float      # 0.0 to 1.0
    timestamp: str
    embedding: Optional[List[float]] = None
    metadata: Optional[Dict] = None

Step 2: Extracting Memories with LangChain

We only keep information that is useful later (e.g., “I use VS Code for Python development”). A carefully crafted system prompt guides the LLM to output structured JSON.

# memory_system.py (Simplified)

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert at extracting factual information.
Focus on preferences, tools, personal info, habits.
Return a list of memories with an importance score (0‑1)."""),
    ("human", "Message: {message}")
])

# Example input:
# User: "I mostly code in Python but use Rust for side projects."
# Expected output:
# [
#   {"content": "Codes primarily in Python", "category": "skills", "importance": 0.9},
#   {"content": "Uses Rust for side projects", "category": "skills", "importance": 0.7}
# ]

Step 3: The “Brain” (Vector Store + Database)

Why FAISS?

Vector search can answer queries like “What tools do I use?” even when the memory is phrased as “I work with NeoVim.” Keyword search would miss this, but embeddings capture the semantic similarity.

Why SQLite?

Vectors are great for similarity search but not for reliable reads/updates. SQLite stores the actual text, timestamps, IDs, and other structured fields.

import faiss
import numpy as np
from langchain.embeddings import OpenAIEmbeddings

class VectorStore:
    def __init__(self, openai_api_key: str):
        self.embeddings = OpenAIEmbeddings(
            model="text-embedding-3-small",
            openai_api_key=openai_api_key
        )
        self.index = faiss.IndexFlatIP(1536)  # Inner Product for Cosine Similarity

    def add_memory(self, text: str):
        vector = self.embeddings.embed_query(text)
        self.index.add(np.array([vector]))

Step 4: Connecting the Dots

The main loop orchestrates the workflow:

User sends a message.
Extractor identifies any new facts and stores them.
Updater checks for corrections (e.g., “Actually, I switched to Java”).
Retriever fetches relevant memories based on the current query.
LLM generates a response using the retrieved memories as context.

def answer_with_memory(self, question: str) -> str:
    # 1. Search vector DB for similar memories
    relevant_memories = self.vector_store.search_similar(question)

    # 2. Build context from those memories
    context = "\n".join([m.content for m in relevant_memories])

    # 3. Prompt the LLM
    prompt = f"Based on these memories:\n{context}\n\nAnswer: {question}"
    return self.llm.invoke(prompt)

Live Demo

I built a Streamlit app to visualise this “brain”. You can watch memories form in real‑time, search through them, and see how the system categorises your life.

👉 Try the Live Demo

This Matters

This isn’t just about remembering names. It’s about personalization.

A coding assistant that remembers your preferred libraries.
A tutor that remembers what you struggled with last week.
A therapist bot that remembers your long‑term goals.

Future Improvements

Graph Database – Linking memories (e.g., “Paris” is related to “France”).
Local LLMs – Running Llama 3 for privacy.
Time Decay – Slowly “forgetting” unimportant memories over time.

Check out the Code

The full code is available on GitHub. It includes the complete detailed implementation of the memory extractor, vector‑store management, and the Streamlit UI.

Repository: Devparihar5/llm-long-term-memory

A sophisticated memory storage and retrieval system that provides LLMs with persistent, searchable long‑term memory capabilities. This system can extract, store, update, and retrieve memories from conversations, enabling AI agents to maintain context across multiple sessions.

🧠 Advanced Long-Term Memory System for LLM Agents

A sophisticated memory storage and retrieval system that provides LLMs with persistent, searchable long‑term memory capabilities. This system can:

Extract memories from conversations.
Store them securely for future reference.
Update existing memories as new information becomes available.
Retrieve relevant memories to maintain context across multiple sessions.

By integrating this system, AI agents can maintain continuity, improve personalization, and enhance decision‑making over time.

✨ Features

Intelligent Memory Extraction – Automatically extracts factual information from conversations using OpenAI GPT.
Semantic Search – Vector‑based similarity search with OpenAI embeddings and FAISS.
Memory Management – Add, update, and delete memories with conflict resolution.
Persistent Storage – SQLite database for reliable memory persistence.
Category Organization – Automatic categorization of memories (tools, preferences, personal, habits, etc.).
Importance Scoring – Weighted importance system for memory prioritization.
Real‑time Updates – Detect and process memory updates and deletions from natural language.
Web Interface – Comprehensive Streamlit‑based testing and management interface.
LangChain Integration – Built with LangChain for robust LLM interactions.
Modular Architecture – Clean separation of concerns.

How do you handle state in your LLM apps? Drop a comment below! 👇

Originally published on Medium.

How to Build Long-Term Memory for LLMs (RAG + FAISS Tutorial)

The Problem: Context Window vs. Long‑Term Memory

The Solution: RAG + Semantic Search

Key Components

Step 1: Defining a Memory

Step 2: Extracting Memories with LangChain

Step 3: The “Brain” (Vector Store + Database)

Why FAISS?

Why SQLite?

Step 4: Connecting the Dots

Live Demo

This Matters

Future Improvements

Check out the Code

🧠 Advanced Long-Term Memory System for LLM Agents

✨ Features

Related posts

You Probably Don’t Need a Vector Database for Your RAG — Yet

I spent 3 nights fighting AI hallucinations. Then I found this. 🕵️‍♂️🧩

Where I'm at with AI

Get Started With Image Classification in Kaggle using Python

The Problem: Context Window vs. Long‑Term Memory

The Solution: RAG + Semantic Search

Key Components

Step 1: Defining a Memory

Step 2: Extracting Memories with LangChain

Step 3: The “Brain” (Vector Store + Database)

Why FAISS?

Why SQLite?

Step 4: Connecting the Dots

Live Demo

This Matters

Future Improvements

Check out the Code

🧠 Advanced Long-Term Memory System for LLM Agents

✨ Features

Related posts

You Probably Don’t Need a Vector Database for Your RAG — Yet

I spent 3 nights fighting AI hallucinations. Then I found this. 🕵️‍♂️🧩

Where I'm at with AI

Get Started With Image Classification in Kaggle using Python

Step 1: Defining a Memory

Step 2: Extracting Memories with LangChain

Step 3: The “Brain” (Vector Store + Database)

Step 4: Connecting the Dots