Retrieval-Augmented Generation: Connecting LLMs to Your Data

Published: 2 months ago (December 6, 2025 at 10:00 PM EST)

4 min read

Source: Dev.to

Tech Acronyms Reference

Acronym	Meaning
API	Application Programming Interface
BERT	Bidirectional Encoder Representations from Transformers
FAISS	Facebook AI Similarity Search
GPU	Graphics Processing Unit
JSON	JavaScript Object Notation
LLM	Large Language Model
RAG	Retrieval‑Augmented Generation
ROI	Return on Investment
SQL	Structured Query Language
VRAM	Video Random Access Memory

Why LLMs Need External Data

Large Language Models (LLMs) have a fundamental limitation: their knowledge is frozen at training time.

Ask GPT‑4 about:

“What did our Q3 sales look like?” → ❌ Doesn’t know your data
“What’s in our employee handbook?” → ❌ Doesn’t have your docs
“Show me tickets from yesterday” → ❌ No real‑time access
“What did the customer say in ticket #45632?” → ❌ Can’t see your database

The LLM has no knowledge of your specific data.

Solutions Overview

Approach	Pros	Cons
Fine‑tuning	Tailors model to your data	Expensive, slow, static
Long context	Simple prompt‑only solution	Limited by context window, costly
Retrieval‑Augmented Generation (RAG)	Retrieve relevant data then generate	Flexible, scalable, cost‑effective

This article focuses on RAG, the most practical approach for production systems.

What Is Retrieval‑Augmented Generation (RAG)?

RAG connects LLMs to proprietary data at scale. It consists of three stages:

Indexing (offline) – Process documents into vector embeddings and store them in a vector database.
Retrieval (query time) – Embed the user query, search the vector store, and return the top‑k most relevant chunks.
Generation – Feed the retrieved chunks plus the original query to the LLM to produce a final answer.

Real‑Life Analogy: The Research Assistant

Stage	What the assistant does
Indexing	Reads all company documents, creates organized notes, files them for quick retrieval.
Retrieval	When you ask a question, searches the notes and pulls out the most relevant documents.
Generation	Reads the retrieved documents, formulates an answer, and responds.

RAG Workflow Diagram

┌─────────────────────────────────────────────────────────┐
│                    INDEXING (Offline)                    │
├─────────────────────────────────────────────────────────┤
│ Documents → Chunking → Embeddings → Vector Database       │
│ "handbook.pdf" → paragraphs → vector representations      │
│ "policies.docx" → paragraphs → vector representations      │
│ "faqs.md"      → paragraphs → vector representations      │
└─────────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────┐
│                  RETRIEVAL (Query Time)                  │
├─────────────────────────────────────────────────────────┤
│ User Query → Embed Query → Search Vector DB → Top‑K      │
│ "What's the return policy?" → vector → find similar chunks │
│ → return 5 most relevant chunks                           │
└─────────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────┐
│                  GENERATION (Response)                   │
├─────────────────────────────────────────────────────────┤
│ Retrieved Docs + Query → LLM → Final Answer               │
│ Context: [5 relevant chunks about returns]                │
│ Question: "What is the return policy?"                    │
│ LLM Output: "Our return policy allows returns within 30   │
│ days of purchase. Items must be in original condition..." │
└─────────────────────────────────────────────────────────┘

Installation

pip install langchain
pip install chromadb      # Vector database
pip install sentence-transformers  # Embeddings
pip install litellm      # LLM interface
pip install pypdf        # PDF processing

Python Example: Loading and Chunking Documents

from typing import List
import re

def load_documents(file_paths: List[str]) -> List[str]:
    """Load plain‑text documents from a list of file paths."""
    documents = []
    for path in file_paths:
        with open(path, "r", encoding="utf-8") as f:
            documents.append(f.read())
    return documents

def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
    """
    Split `text` into overlapping chunks.

    Parameters
    ----------
    text : str
        Input text to chunk.
    chunk_size : int, default 500
        Target size of each chunk (characters).
    overlap : int, default 50
        Number of characters to overlap between consecutive chunks.
    """
    # Simple sentence‑aware chunking
    sentences = re.split(r"(? chunk_size and current_chunk:
            chunks.append(" ".join(current_chunk))

            # Preserve overlap for the next chunk
            overlap_sentences = []
            overlap_len = 0
            for s in reversed(current_chunk):
                if overlap_len + len(s)  List[dict]:
        """
        Retrieve the `top_k` most similar chunks for `query_text`.

        Returns
        -------
        List[dict] with keys `id`, `document`, `metadata`, `distance`.
        """
        query_emb = self.embedding_model.encode([query_text]).tolist()
        results = self.collection.query(
            query_embeddings=query_emb,
            n_results=top_k,
            include=["documents", "metadatas", "distances", "ids"]
        )
        # Re‑format results for easier consumption
        hits = []
        for i in range(len(results["ids"][0])):
            hits.append({
                "id": results["ids"][0][i],
                "document": results["documents"][0][i],
                "metadata": results["metadatas"][0][i],
                "distance": results["distances"][0][i],
            })
        return hits

You can now combine the chunking logic with VectorStore to build a full RAG pipeline:

Load raw documents.
Chunk them with chunk_text.
Insert the chunks into VectorStore.
At query time, embed the user question, retrieve the top‑k chunks, and pass the concatenated context plus the original question to your LLM (e.g., via litellm or langchain).

End of article.

Retrieval-Augmented Generation: Connecting LLMs to Your Data

Tech Acronyms Reference

Why LLMs Need External Data

Solutions Overview

What Is Retrieval‑Augmented Generation (RAG)?

Real‑Life Analogy: The Research Assistant

RAG Workflow Diagram

Installation

Python Example: Loading and Chunking Documents

Related posts

🔍 Multi-Query Retriever RAG: How to Dramatically Improve Your AI's Document Retrieval Accuracy

RAG vs Fine-Tuning vs Prompt Engineering: The Ultimate Guide to Choosing the Right AI Strategy

Think Like HATEOAS: How Agentic RAG Dynamically Navigates Knowledge

Chunk Boundary and Metadata Alignment: The Hidden Source of RAG Instability